|
I know this might be a basic question but I am still getting introduced into machine learning. I have a dataset with my training examples labeled in two classes. I want to train a machine learning algorithm to predict the probability that a new example is on one of those classes, instead of just the class. Which ML algorithms do this kind of thing? To give a bit more of information:
|
|
As you discovered, logistic regression generates directly a probability. The other algorithm that directly generates probabilities is Naive Bayes http://en.wikipedia.org/wiki/Naive_Bayes_classifier But a large number of "traditional" classification algorithms can generate probabilities as their answers. For example SVM http://en.wikipedia.org/wiki/Support_vector_machine can transform the distance to the separating hyperplane into probabilities of belonging to the class. For example look at the probability parameter of the SVM call in Sklearn http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html. K-nn family of algorithms http://en.wikipedia.org/wiki/KNN can also generate probabilities as result of the classification - sklearn does not allow that in their implementation of KNN but, for example, see this R implementation http://www.inside-r.org/packages/cran/knnflex/docs/knn.probability among others. It is also possible to return probabilities on the decision trees and randm forests. See the predic_proba method of the random forest in sklearn http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.predict_proba All of these last algorithms, SVM, KNN, random forest have the sibling property you are looking for - they can also be use for standard regression. Naive Bayes does not, as far as I know. |
I found I could use Logistic Regression, but I don't know if it can be used to find non-linear hypothesis