I recognize that a useful machine learning model would have the following properties:

  1. Be able to represent many functions.
  2. Implicitly or explicitly prefer simple functions.
  3. Slowdown N-times or less after seeing N data items.

One nice invention was the CTW sequence predictor in 1995.

What are the approaches to invent new models? Is experimenting the main method to invent new (recurrent) neural networks?

asked Jul 04 '11 at 15:18

Ivo%20Danihelka's gravatar image

Ivo Danihelka
25051115

2

Most models use domain knowledge to limit the class of possible functions dramatically. Speech recognition knows that it is dealing with a time series, image processing understands that it has an image representing a set of objects and background and possibly actions, all with a strong geometric and spatial relationship.

There are a lot of directions to go, is the point. Theory and reading guide experiment and development of new methods.

(Jul 04 '11 at 17:14) Jacob Jensen
1

Indeed, Jacob says precisely the right thing. Most models are actually built upon other well-known and well-understood models by either (a) using domain knowledge to make the model more accurate or (b) using a new mathematical tool to improve a previous class of models.

Be aware that here "domain knowledge" includes "knowledge you got from training many similar models before"; so for example the main domain knowledge behind confidence-weighted linear learning is that it is much easier to underfit rare features and different features converge to their optimal values faster or slower depending on how often they occur on the data.

(Jul 05 '11 at 13:59) Alexandre Passos ♦
1

@Alexandre: I find your comment about the confidence-weighted learning a bit confusing. The CW algorithm produces a very large update to rare features, and a very small update to frequent ones. This seem contrary to the intuition of not wanting to overfit on the rare features.

(Jul 05 '11 at 21:01) yoavg

@yoavg you're right. The intuition behind confidence-weighted is that the rare features take longer to fit (usually more than one pass over the data) because they receive far less updates than the common features, hence the idea of updating them more aggressively.

(Jul 06 '11 at 03:11) Alexandre Passos ♦

One Answer:

I view this type of thing as falling in the realm of mathematical modeling. Certainly you can experiment to your heart's content until you find something that works, but you can drastically reduce the time it takes to discover something useful if you can follow the scientific method more closely, eg something like:

  1. Clearly define what you want to accomplish.
  2. Attempt to define the behavior of the system you want to model in mathematical terms, and where appropriate derive further expressions for updating/manipulating the model.
  3. Experiment, modifying (2) and repeating (3) as necessary until you get something useful.

For RNNs specifically, I can't really comment. I'm in the process of learning to understand them; I'm pretty close to having a working algorithm for training one using the hessian-free approach, but I'm unsure when I'll have it ready for testing.

Just to re-iterate, though: I think of this as a mathematical modeling problem. My CS classes didn't help much with learning this process; my understanding of how to do this sort of work was fostered from taking classes on and doing mathematical modeling tasks. That seems like a good place for you to start -- in my opinion.

This answer is marked "community wiki".

answered Jul 05 '11 at 19:54

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

edited Jul 07 '11 at 03:53

Thanks for mentioning mathematical modeling. I agree that waiting hours for an experiment may not be the best approach.

(Jul 07 '11 at 03:47) Ivo Danihelka
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.