|
I'm working on classification of an EEG dataset. I have data for 5 people, with 280 examples per person. EEG data varies a lot from person to person, so the classifier must be trained and tested on each person separately. I do 10-fold cross validation for each person, and it's fairly straightforward to calculate a confidence interval for the mean accuracy for a single person. For each of the five people I get a mean accuracy, but the real measure of my method is the mean of these five means. Can I calculate a confidence interval for this mean of means? |
|
Did you consider using a bootstrap instead? This thread on SE might provide some insights on your particular problem: http://stats.stackexchange.com/questions/1399/obtaining-and-interpreting-bootstrapped-confidence-intervals-from-hierarchical-d |
|
If you want to show method A is better than method B, you can pose this as a hypothesis test: is the difference b/w A and B significantly different than expected under the null hypothesis that conditions A and B are equally good. With lots of runs you could check significance using a paired t-test (I think). Because you only have a few runs, I recommend reading about permutation tests. These don't require assuming a parametric distribution to derive the p-value. I think you'll find this approach conceptually straightforward and relatively painless to implement once you start reading. I also highly recommend reading Demsar's Statistical Comparisons of Classifiers over Multiple Data Sets . I believe this is the best current paper on statistical tests comparing statistical models. The focus is comparisons over multiple data sets, but I think you'll find some useful summaries / paper pointers. Friedman test is probably a good idea. Nice reference article!
(Nov 09 '11 at 22:27)
Chris Simokat
|
As you say yourself, "EEG data varies a lot from person to person". So, I don't see how you can do any meaningful classification with such small sample size. I guess you will need at least 20 peoples....
@Dov Unless I'm misunderstanding you, I think you're misunderstanding me. I can certainly can make a meaningful classification, it has a measured accuracy of ~90% on average. But in order to determine whether method A is better than method B, I should have a confidence interval on this accuracy. And the accuracy is a mean of means, which complicates things. I'd think I somehow need to account for the variance of each of the means when calculating a CI on the grand mean.
One approach might be to consider your five individuals as a subgroup and creating an x-bar (and either an S or R) control chart. That might give you a lot of information to describe the variability of the means (if all of the samples are within the control limits that might be useful or if some are not that might also be very useful) also control charts are equivalent to hypothesis tests that can serve a similar role to what you're trying to do with the confidence intervals. There are multivariate control charts as well so that should be considered as well. Otherwise you can use some form of ANOVA/MANOVA with repeated measures. If you can tell us more about the data that'd be helpful to provide a better answer. Do you just want a confidence interval for a range because you don't have any hypothesis about the actual values or are you trying to test a specific hypothesis about some value of the means you're expecting?