|
A simple question about terminology. In many types of classifiers (logistic regression, SVMs, neural networks) one classifies by first computing a "soft" real-valued function f(x). In a binary setting, f(x) would be very high if there is a high probability that y=1 and very low if there is a high probability that y=-1. Now, my question is, is there a name for f(x)? In the svm setting, it would be reasonable to call it a "margin", but that doesn't seem to fit for other types of classifiers. |
|
In SVMs this is called the margin. In logistic regression this is called the log-odds of the positive class. Usually, if you say "score" or "margin" people will understand, I think. "Score" may be ambiguous with the output of the classifier.
(Nov 01 '10 at 19:49)
rm999
@Ravi Moody: how exactly? Isn't he talking precisely about the output of the classifier before tresholding? In structured learning this is usually referred to as a score (or probability, or energy).
(Nov 01 '10 at 19:50)
Alexandre Passos ♦
In my field (predictive analytics) "scores" are the output of a classifier e.g. credit scores attempt to predict credit default. Sounds like it has become a fairly overloaded term, sadly.
(Nov 01 '10 at 21:52)
rm999
Alright, "margin" is the least-offensive option, I guess!
(Nov 03 '10 at 13:30)
John Southland
|
|
I think the word "score" is the most immediately understandable as the output of any continuous valued predictor. You want positives to score high and negatives to score low. Common credit scores like FICO are very close to being linear in log-odds. The use of "score" for the equivalent quantity in logistic regression (original poster's f(x)) is justifiable as the same, not just analogous, usage. In practice credit scores are calibrated to a user-friendly scale but that's unimportant here. If a score is calibrated to true log-odds, I call it "the logodds". Why make life difficult? I prefer "score" to "margin", as I think it's more generic. If I heard "SVM score", I would know what it means, but if I heard "Naive Bayes or regression margin" it wouldn't be immediately obvious what that meant. |
|
I don't see anything wrong with calling it a "classifier". Classifiers do not have to be binary; they can be real-valued, where the value gives a probability or confidence of the binary prediction. Maybe I should clarify. I am trying to find a term to refer to the number. For example, in a logistic regression, if we have f(x)=0, I could say there is a 50% probability of y=1 and a 50% probability of y=-1. However, that is a statement not about the quantity f(x), but about the quantity 1/(1+exp(-f(x))). I would like to somehow directly discuss f(x), so I needn't commit to this probabilistic interpretation.
(Nov 01 '10 at 15:43)
John Southland
OK I see what you mean. You are trying to separate out the intermediate step, e.g. in logistic regression you want the raw value of the linear function before passing it through the logistic activation function. I've never heard of a general term for this. I don't think there should be one because that value has a different meaning and purpose in different classifiers. In general I consider that final step to the intermediate value an integral part of the classifier, i.e. not something that should be considered a modular attachment to it. In generalized linear models I believe it is called the "linear predictor".
(Nov 01 '10 at 16:19)
rm999
|
I'd just like to say thanks for asking for the term, rather then using your own. Too much of machine learning suffers from having people re-invent terms for common things, making it a difficult area to search in.