|
Hi, I'm using a structural svm to learn the parameters of a graphical model with pairwise and higher order potentials. When generating negative examples for the svm learning I perform loss augmented inference (using message passing algorithm to find the MAP solution) which amounts to finding a parameter setting the is highly likely AND has high loss. I am wondering what the best way to trade off these two desires is. I tend to always find examples that have a very high loss, but are not very probable. Naively, I have tried to scale down the contribution the loss function is having on MAP inference, but I am wondering if there is some standard way to deal with this problem? I haven't been able to find and details about this. I thought that as the weights of the model were increased, the loss function would have less of an effect on the loss augmented inference, but they never seem to get large enough to make a great difference. I am hoping this is a common problem, and not an indication that there's something wrong in my code! Any thoughts or re-directions to papers covering this topic would be greatly appreciated! |
In fact, this is not a problem. Multiplying the loss by a constant has in theory the same effect as multiplying the regularization parameter C by the same constant (easily checked). In practice, it holds as long as your optimization converges up to a negligibly small epsilon and you don’t have numerical problems. So, if you cross-validate over C, you don’t need additionally tune the loss power. |