Hello everyone.

Let us consider a comparison of two machine learning algorithms (A and B) on some dataset. Results (RMSE/F1) of both algorithms depend on randomly generated initial approximation (parameters).

Questions:

  1. When I use the same parameters for both algorithms, "usually" A slightly outperforms B. How many different experiments have I to perform to make "sure" that A is better than B?
  2. How to measure significance of my results? (To what extent I am "sure"?)

Relevant links are welcome!

PS. I've seen papers in which authors use t-test and p-value; but i'm not sure if it is ok to use them in a such situation.

asked Oct 27 '10 at 08:30

bijey's gravatar image

bijey
31226

if you are interested, there is some discussion here: http://stats.stackexchange.com/questions/4019/measuring-statistical-significance-of-machine-learning-algorithms-comparison

(Oct 27 '10 at 09:25) bijey

One Answer:

Any test you make will only tell you that A outperforms B on those datasets you test it on. You might think that with enough different datasets you could maybe derive a p-value, but you must remember that while individual data points sometimes can be assumed IID, different datasets most certainly can't.

Proving that algorithm A is better than algorithm B in general is a lost cause, per the no free lunch theorem. You can, however, use learning-theoretical generalisation bounds to compare expected generalisation ability, although this is also problematic since the bounds are often uncomparable or unpractical.

To say that algorithm A is better than algorithm B on a specific dataset the trivial way is to separate uniformly at random a training and a test sets, train on the training set and use the test-set bound (essentially a confidence interval assuming a fixed but unknown error probability p for each classifier) on the test set, which can give you a p-value.

answered Oct 27 '10 at 17:16

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1895244214333

Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×2

Asked: Oct 27 '10 at 08:30

Seen: 812 times

Last updated: Oct 27 '10 at 17:16

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.