|
I recognize that a useful machine learning model would have the following properties:
One nice invention was the CTW sequence predictor in 1995. What are the approaches to invent new models? Is experimenting the main method to invent new (recurrent) neural networks? |
|
I view this type of thing as falling in the realm of mathematical modeling. Certainly you can experiment to your heart's content until you find something that works, but you can drastically reduce the time it takes to discover something useful if you can follow the scientific method more closely, eg something like:
For RNNs specifically, I can't really comment. I'm in the process of learning to understand them; I'm pretty close to having a working algorithm for training one using the hessian-free approach, but I'm unsure when I'll have it ready for testing. Just to re-iterate, though: I think of this as a mathematical modeling problem. My CS classes didn't help much with learning this process; my understanding of how to do this sort of work was fostered from taking classes on and doing mathematical modeling tasks. That seems like a good place for you to start -- in my opinion.
This answer is marked "community wiki".
Thanks for mentioning mathematical modeling. I agree that waiting hours for an experiment may not be the best approach.
(Jul 07 '11 at 03:47)
Ivo Danihelka
|
Most models use domain knowledge to limit the class of possible functions dramatically. Speech recognition knows that it is dealing with a time series, image processing understands that it has an image representing a set of objects and background and possibly actions, all with a strong geometric and spatial relationship.
There are a lot of directions to go, is the point. Theory and reading guide experiment and development of new methods.
Indeed, Jacob says precisely the right thing. Most models are actually built upon other well-known and well-understood models by either (a) using domain knowledge to make the model more accurate or (b) using a new mathematical tool to improve a previous class of models.
Be aware that here "domain knowledge" includes "knowledge you got from training many similar models before"; so for example the main domain knowledge behind confidence-weighted linear learning is that it is much easier to underfit rare features and different features converge to their optimal values faster or slower depending on how often they occur on the data.
@Alexandre: I find your comment about the confidence-weighted learning a bit confusing. The CW algorithm produces a very large update to rare features, and a very small update to frequent ones. This seem contrary to the intuition of not wanting to overfit on the rare features.
@yoavg you're right. The intuition behind confidence-weighted is that the rare features take longer to fit (usually more than one pass over the data) because they receive far less updates than the common features, hence the idea of updating them more aggressively.