I have an online learning algorithm which I want to evaluate. Since the online learning algorithms are typically evaluated by the mistake bounds (number of mistakes during training), should I just report the number of mistake during training? Would this be fair? Or should I report accuracy on the test data?

asked Aug 08 '11 at 11:35

ebony's gravatar image

ebony
18181014


3 Answers:

It depends on who your audience is, and what you're trying to convince them that your algorithm is good for. It's very common (again) to use online algorithms to do external memory batch learning by taking multiple passes over a huge dataset. If that's your main interest then evaluating on held out test data like you would with any batch algorithm is appropriate. On the other hand, if your main interest is in settings where the model is constantly updated as it is used, then traditional online evaluation of looking at number of mistakes (or more generally some measure of prediction error) during training is more appropriate. (Though frankly I've never seen a real situation where you get a label immediately after a prediction.)

answered Aug 09 '11 at 22:52

Dave%20Lewis's gravatar image

Dave Lewis
890202846

The rate of mistakes during training is known as the progressive validation error, and as long as you don't reuse examples it is an unbiased estimate of the error on an i.i.d. test set, so you should report that.

answered Aug 09 '11 at 10:55

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

It would be a good idea to report both numbers, if you do not have a page limit for a paper, both results would add credibility to your research.

You can also try and report the training rate against the errors in the test data.

answered Aug 09 '11 at 03:26

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.