|
What are novel research areas and subareas of machine learning whose conceptual birth was in the last 5 years or so? A significant portion of machine learning research could be called "collaboration between the past and present". ML researchers are taking and re-purposing models from math, statistics, physics and old machine learning research that are already well-understood in those fields and finding new applications and extensions. However, I want to know: what are some areas of Machine Learning that are currently quite active that have had their mathematical or conceptual foundations either discovered or greatly expanded upon in the last few years, either in machine learning or an outside but related discipline? I'm looking less for meta-applications (multi-task learning, semi-supervised learning) and more for methods that have a strong mathematical basis. |
|
Sparse methods and dictionary learning are another example of strong topic with recent mathematical developments, notably by the cross-fertilization of the machine learning / statistics communities on the one hand and the signal processing / compressed sensing communities on the other hand. Dictionary learning with sparse priors is also converging with the deep learning approach (where several layers of regularized encoders are stacked and train unsupervisedly to minimize a reconstruction error penalized with a sparsity inducing term). I think sparse methods in general are older, but compressed sensing as a theory and the things that are coming out of it do seem to be as recent as five years.
(Apr 29 '11 at 12:39)
Alexandre Passos ♦
|
|
Developing tools for doing automatic inference in probabilistic graphical models, and probabilistic programming languages in general, is another area which has been gaining a lot of attention lately. I believe that successful developments in this area will go a long way towards making Bayesian methods (and probabilistic methods in general) mainstream in machine learning. Check out the NIPS 2008 workshop on this topic. |
|
Conceptually, recurrent neural networks are not at all new. However, they've been largely relegated to academia because they were intractable to learn. Some recent advances in hessian-free methods have made them more viable, though the authors' paper is awaiting publication. I list this because it's an emerging area that has a lot of potential if RNNs can be made to work well.
This answer is marked "community wiki".
Indeed, there was a come back of RNNs with a couple of papers / posters during the last NIPS in december 2010.
(Apr 28 '11 at 12:21)
ogrisel
|
|
The bandit problem is about 60 years old, and optimal algorithms have been known for about 30 years. However there has recently been a huge surge of interest in this problem and several foundational advances have been made. The interest is largely driven by the relevance to what Internet-based companies are doing (ad-serving is the canonical application). Key advances center around incorporating generalisation into the framework (called the contextual bandit, or bandit with side information) while giving bounds on regret. John Langford's tutorial is a good overview. |
|
One recent trend is using LP conceptually and ILP practically for learning problems, as solvers for these problems have become substantially faster in recent years, and ILP (integer linear programming) allows you to easily specify and quickly (at least quickly enough, most of the time) solve NP-complete formulations of learning problems. Together with submodularity and convexity, this follows a recent trend where a lot of machine learning looks like (combinatorial/continuous/convex/non-convex) optimization theory applied to the generalization error. |
|
AMG (Algebraic Multigrid) methods might qualify: those mathematical tools allow to tackle large scale graph analysis and spectral clustering problems by building a preconditioner for the eigenvalue problem. A more general idea is that of curriculum learning (see where you try to first learn machine learning models for simpler variants of the tasks (e.g. "more" convex using strong priors) and an then step by step moving to the real problem. A bit like simulated annealing. This approach is used to compute efficiently the regularization path of L1 or L1 + L2 penalized linear models (e.g. using LARS or coordinate descent with warm restarts). In this case the problem is always convex hence the regularization schedule is only useful as a performance trick. But this could probably be generalized to more complex, non-linear models supervised or not where the scheduling would both be useful for speed and quality (to avoid the local optima). |