Can the algorithm that's used to build a maximum entropy classifier be modified to function incrementally, or is it by definition a batch process? I've found a few open source implementations, but they're all batch processes, and even after inspecting the code, I can't envision how I'd modify them to function incrementally. Is it even possible?

asked Jul 06 '11 at 21:37

Cerin's gravatar image


edited Jul 06 '11 at 21:38

I imagine that some sort of SGD could be adopted to this: http://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf a paper by the guys who do LibLinear (which might already do it as online)

The code is open source too.

(Jul 06 '11 at 21:46) Jacob Jensen

4 Answers:

Multinomial Logistic Regression is an alternative name for the Maximum Entropy model. Multinomial Logistic Regression is often trained with Stochastic Gradient Descent. Finding a good learning rate for online learning is not trivial though (you can use grid search and cross validation though).

answered Jul 06 '11 at 23:16

ogrisel's gravatar image


Also, there are free implementations in, for example, vowpal wabbit http://hunch.net/~vw/ . if you're willing to do a bit of fiddling with the API, scikits.learn supports SGD for logistic regression classifiers but there is no online API yet.

(Jul 07 '11 at 03:11) Alexandre Passos ♦

It depends on what you mean by "incrementally".

If you are worried about dataset size, memory usage, etc., there are definitely optimization methods and implementations (e.g. SGD and the Vowpal Wabbit implementation as mentioned in the other answers) that process one example at a time in an "online" fashion.

If, on the other hand, you want the process to improve over time without needing to retrain, you could probably use these methods as well, but it is not very clear if they will work well unless you really have a constant stream of almost unlimited data. This is because in practice the online algorithms require performing several passes over the training data in order to converge (unless the dataset is really huge), and it is not clear that just adding the new example will work that well in practice (from the theoretical side of things this should work just fine, though).

answered Jul 07 '11 at 08:26

yoavg's gravatar image


While SGD can certainly be used to train a maximum entropy classifier, I don't know of any simple implementations of this. (LingPipe, which is a pretty large system, does support this.) Maximum entropy, aka discrete choice analysis, is equivalent to polytomous logistic regression with a particular form of parameter tying. I did a talk on this a couple years back which Bob Carpenter was kind enough to summarize.

The representation of input data, and the optimal internal data structures, are quite a bit different than for standard PLR.

answered Jul 08 '11 at 22:37

Dave%20Lewis's gravatar image

Dave Lewis

edited Jul 08 '11 at 22:40

This paper provides a fairly thorough treatment of sgd optimization for multinomial logistic regression. Also, check out this page on the Stanford UFLDL wiki for a gentler introduction.

answered Jul 08 '11 at 19:21

alto's gravatar image


Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.