|
The List of ICML accepted Papers is out. I looked through it yesterday and read most of the abstracts. First I'll mention a few papers I was interested in (Alert: Amateur's opinion, and only based on first glance):
Two more papers I've gotten through recently:
The word count for some selected (whim-based; maybe someone should try some topic modeling on it) terms (including repeated mentions in a single abstract breaks down as follows Data: 163 Learning: 210 Feature: 86 Text: 27 Image: 21 Audio: 7 Video: 5 Time series: 4 Supervised: 6 Unsupervised: 9 Semi-supervised: 14 Active: 11 Multi-view: 9 cluster: 69 discriminative: 4 generative: 18 embedding: 15 classification: 52 ranking: 15 Policy: 32 Hashing: 6 recognition: 10 margin: 20 regression: 18 kernel: 72 support vector: 16 svm: 32 RKHS: 5 Hilbert: 5 block coordinate descent: 5 manifold: 16 PCA: 21 (about 17 in a single abstract) projection: 6 Spars: 37 Completion: 7 Decomposition: 12 matrix: 45 Optimiz: 54 Objective: 20 Dictionary: 7 Tree: 55 Graphical: 9 Topic: 13 MCMC: 10 MAP: 25 variational: 19 bayesian: 27 Graph: 60 Network: 44 Edge: 21 pair: 18 link: 8 Deep: 26 Boltzmann: 5 RBM: 8 Convolution: 8 neural net: 10 nearest neighbor: 6 sampling: 22 boost: 19 ensemble: 5 forest: 1 state-of-the-art: 23 improve*:40 |
|
Bayesian Learning via Stochastic Gradient Langevin Dynamics Abstract: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in-built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a ``sampling threshold'' and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients. (I don't try to summarize the paper, the paper as a whole is well written and astonishing) I skimmed over this abstract, even though its actually really closely related to some stuff I've studied! Glad you pointed it out.
(May 27 '11 at 11:24)
Jacob Jensen
This paper is amazing, thank you very much for bringing it to my attention. I'm looking forward to trying this technique on my next models and seeing how it compares to the usual culprits.
(Jun 02 '11 at 09:28)
Alexandre Passos ♦
|
|
There is of course the usage of Martens' hessian free optimizer on recurrent neural networks to generate text. "Generating Text with Recurrent Neural Networks" and "Learning Recurrent Neural Networks with Hessian-Free Optimization". They beat the longstanding state of the art for learning long term dependencies in time series, LSTM. While the achievement is big, I wonder how HF+LSTM would work. Anyway, here is what that RNN of them produced, character by character.
Martens and Sutskever actually had two papers accepted to ICML: one is on using an RNN to generate text, the other is on training RNNs with a hessian-free CG optimizer. I'm in the process of constructing an RNN right now, though it's slow going (mostly due to time constraints)
(Jun 09 '11 at 11:23)
Brian Vandenberg
Is there an implementation of the heassian-free optimization technique that one can download?
(Jun 13 '11 at 14:39)
Frank
The code for the first HF paper in NIPS 2010 is available on J. Marten's site here: http://www.cs.toronto.edu/~jmartens/docs/HFDemo.zip
(Jun 13 '11 at 14:45)
crdrn
|
|
"On RandomWeights and Unsupervised Feature Learning" I wanted to highlight this paper again because I noticed that it makes this comment about pre-training convolutional architectures: "However, we find that the performance improvement can be modest and sometimes smaller than the performance differences due to architectural parameters." This result is both surprising as there is a growing body of work these days about pre-training neural networks. Isn't LeCun moving in this direction of applying pretraining to his CNNs? (Hinton mentioned it in one of his video lectures) |
I feel like this is a problem many of us tackle often while trying to do what we really care about. Having somebody take the time to solve this in a reasonable fashion is very nice. |
I find it a bit strange that the accepted papers are just listed in a single random list. Has anyone tried clustering this, or categorizing it in some way? I think I had some document clustering code lying around...hmm