|
I'm giving a talk about how you can improve modeling accuracy simply by adding more data, but not changing the model: The art of predictive analytics: More data, same models What techniques do you know about for doing so? I will list some of the techniques I know, but I am looking for more suggestions. Also, real world examples are good. I want more specific examples of the techniques I outline in my answer. |
|
|
Not every model can benefit from more data, only high-variance models. |
|
There's a short, fun article in a recent issue of the American Statistician that might be a toy example of what you want: "Fisher’s Conditionality Principle in Statistical Pattern Recognition", The American Statistician Aug 2011, Vol. 65, No. 3: 167–169 It's essentially a stylized example of how an ancillary statistic can be used to improve classification. Not a real world example, but very simple to state and present. |
|
I agree with Melipone, not every problem benefits from getting more training data. If you have a High Variance, you can benefit from getting more training data, but on the other hand, if you have High-bias, it does not matter how much training data you add, you'll probably end up with the same error. Here are Andrew's NG slides on how to deal with different issues in different ML settings On how to add more training data: Depending on your input, if you need more training data, you can also generate a set of artificial training data, as Bishop's describes it in the NN chapter. You can basically do transformation on the data you have. There is a proof that doing that and using tangent propagation to make the model being robust against variations are closely related. On the last part of the Neural Networks Chapter (Regularization) |
I don't know if you can help me see your talk, the RSVP list is full. I've followed metaoptimize for awhile.
@Rob: Email me: joseph at metaoptimize dot com.