I would like to do precision-recall break-even point (BEP) and my classfier is LIBSVM .

I know that I need to set a threshold " confidence value" so we can achieve a different value for recall and precision . I am confused how I can do that !!

when we set b to 1 we will get probability information and predict for test data with probability estimates for each class .

and when we apply different C with gamma we will get different Accuracy, is that how we can get BEP ? but I thought "confidence value" is the margin width ?

thank you for your help

Alaa

asked May 20 '13 at 06:16

alaa's gravatar image

alaa
6112


2 Answers:

Generally people use held out data to perform a grid search. Different libraries expose different parameters, but the basic idea is the same. I believe LibSVM uses C and gamma.

1) For each parameter, define a range of reasonable values. Sometimes these are probabilities {0.0, 0.1, 0.2, ..., 1.0}. Other times the values will be powers of ten {1E-5, 1E-4, 1E-3, ..., 1E3, 1E4, 1E5}.

2) For each assignment in the Cartesian product of these values, perform a test (usually a cross-validation) on held-out data.

3) Look at the results of the tests. Generally you take the highest perform set of values unless it appears spurious. Ideally, the scores will form a smooth surface with a single peak. That peak is then the best balance of precision and recall for your data. Note that on some data sets, this won't result in precision and recall being close, but often in practice they tend to be simply because that maximizes F-measure.

answered May 20 '13 at 11:29

Kirk%20Roberts's gravatar image

Kirk Roberts
4612410

edited May 20 '13 at 11:29

I'll gratuitously throw in that breakeven point is a terrible effectiveness measure. And I should know because, regrettably, I invented it (in an HLT '91 paper, I think).

The big problem is that it isn't a classification effectiveness measure, since it isn't defined for a binary classifier. It's really a summary statistic for an entire recall precision curve, and a much inferior one to, say, area under the curve. My original rationale for inventing it was that you could put a ruler down on a published recall-precision curve and find the value.

answered Jul 18 '13 at 10:52

Dave%20Lewis's gravatar image

Dave Lewis
890202846

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.