|
For using structured SVM with binary loss one needs to define a combined feature representation $psi(x, y)$ of inputs $x$ and output $y$. For binary output $y in {-1, 1}$. While computing the most violated constraint we maximize the loss augmented score over $y$, i.e, $max_{y} Delta(y, y_i) + w^T.psi(x, y)$ where $Delta()$ is 0-1 loss and $y_i$ is the the ground truth. My doubt is how does one select the right $psi()$. I have seen some people use $psi(x, y) = x.y/2$ and some use $psi(x, y) = x.y$. But the selection of the most violated constraint shouldn't get affected by the choice of $psi()$. For example, if $psi()$ is defined as say $1000*x.y$, then the selection of the most violated constraint would be dominated just by the second term and the loss term would be ignored. Any ideas, am I missing something? |