This can be a vague question, but what i really want to know are as follows:

  1. What is the role of Loss function in structure prediction?

  2. How does the selection of Loss function in structure prediction plays role in its prediction accuracy?

  3. How to make good selection of Loss function?I mean, what are the key features that need to consider while selecting LOSS function.

Any useful reference for these doubts will also be very useful.

asked Jan 16 '12 at 08:38

Thetna's gravatar image

Thetna
1016611


2 Answers:

A reference I quite like is this: "Structured Learning and Prediction in Computer Vision". It is a book about structured prediction that is available for free. Obviously this will be most helpful if you are into computer vision, but might also be interesting otherwise.

I'll try and give some personal answers maybe.

  1. The role is to say what you want from a solution. Usually, the label space in structured prediction is large and it is unlikely that you will predict "the right" label for any input. Therefore using the zero-one loss is not that helpful. Still, some solutions are better than others. With the loss you can specify which solutions are closer to the real solution than others. What "closer" means here is application specific and your preferences should be expressed using the loss. Example: Let's say you want to segment an object from an image. The output space is the set of all possible pixel masks of the image, so has 2*(widthheight) elements. That you predict the exact right one is more or less impossible. Still some solutions are better than others. A common choice would be the Hamming loss, saying "For how many pixels did I predict correctly whether they belong to the object?" But maybe you care about something else. Maybe you don't want pixels that are very far from the actual object to be labeled as object. You could also incorporate that into your loss function.

  2. Accuracy is usually formulated with respect to the zero-one loss afaik. As said above, that doesn't make much sense in structured prediction in general. What one is actually interest in is the loss. And obviously the selection of the loss function has a strong influence on the value of the loss ;)

  3. I think there are two aspects of the loss that you should consider: a) Does the loss actually formulate what you care about? - Only you can know what that is b) Is it possible to minimize the loss? In particular, is the loss convex? How efficient is log-augmented inference?

Usually, during learning you have to solve the log-augmented inference many times. If this is very inefficient, this will cost you time. If you loss function specifies exactly what you are interested in, but it is infeasible to solve exactly, you can only get an approximate solution. And then you don't really know what the relation is to your task at hand.

Btw you might be interested in this paper by Joachims.

Hope that helped.

This answer is marked "community wiki".

answered Jan 16 '12 at 16:07

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

edited Jan 16 '12 at 16:08

Sometimes, you want to evaluate on a metric that is different from accuracy. Examples include, ROC-Area, F1-score, precision@10, etc.

In this case, you can use the structured loss function to be exactly those loss metrics. Thus, you are training a model to minimize the loss function that you care about during evaluation.

Examples: http://www.cs.cornell.edu/People/tj/publications/joachims_05a.pdf http://www.cs.cornell.edu/People/tj/publications/yue_etal_07a.pdf

answered Feb 16 '12 at 12:51

Yisong%20Yue's gravatar image

Yisong Yue
58631020

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.