|
How exactly are Relevant Vector Machines different from Logistic Regression and SVMs' for classification purposes? Concretely, in the case of Logistic Regression, we are looking for a hyper-plane that divides the two classes, and with SVMs', we are looking for optimal large margin Hyper-plane. What exactly are we doing with RVMs'? |
|
RVM is a Bayesian model where the goal is to find the posterior and predictive probabilities rather than making decisions as in the case of SVM. So, learning the model is analogues to finding posterior probabilities over the weights and finding the label for unknown input is equivalent to find the predictive probabilities over the labels given the model weights. To differentiate with SVM, I will make crude analogies here; for more details refer to Bishop, 2006 [Chapter 7]. Similar to SVM, RVM "cost" contains to two terms: likelihood function and prior over the weights. The likelihood term contains simple linear projection of the input features (for regression) or logistic function (for classification) and the error between the desired is computed using the least squares, instead of hinge loss as is the case with SVM. This comes from the assumption that the error between the desired and the output has a Gaussian distribution. The second term, more importantly, takes the form of a Gaussian-Gamma distribution with a set of hyper-parameters $alpha$. These $alpha$ are sparse and, analogues to the Lagrangian variables in SVM, determine which set of inputs for the "support" vectors. With these two terms, solving for the weights is analogues to solving for the posterior distribution over the weights (which turns out to be again a simple Gaussian distribution because of the complementary priors). There are several methods to solve for the weights and the hyper-parameters, and is popularly known as Bayesain Sparse Coding. Refer to work by David Wipf for this. |