For using structured SVM with binary loss one needs to define a combined feature representation $psi(x, y)$ of inputs $x$ and output $y$. For binary output $y in {-1, 1}$.

While computing the most violated constraint we maximize the loss augmented score over $y$, i.e, $max_{y} Delta(y, y_i) + w^T.psi(x, y)$ where $Delta()$ is 0-1 loss and $y_i$ is the the ground truth.

My doubt is how does one select the right $psi()$. I have seen some people use $psi(x, y) = x.y/2$ and some use $psi(x, y) = x.y$. But the selection of the most violated constraint shouldn't get affected by the choice of $psi()$. For example, if $psi()$ is defined as say $1000*x.y$, then the selection of the most violated constraint would be dominated just by the second term and the loss term would be ignored. Any ideas, am I missing something?

asked Aug 18 '12 at 11:36

aseembehl's gravatar image

aseembehl
56101115

edited Aug 18 '12 at 15:09

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.