|
What are some new academic developments in ML + NLP that have immediate applications in industry? I prefer developments that are one step away from application (e.g. question answering), instead of approaches (e.g. deep learning for your application). I am giving on talk on this to thought-leaders in industry at O'Reilly Strata Conference. If I hear interesting suggestions, I'll try to include them in my talk. If you were forming a startup, either a new company or in an existing company, and you had a year of R&D time to transition something from academia to industry, what would it? I'll get the ball rolling by proposing this topics I proposed. These are not meant to be comprehensive in any way. I just zoomed in on some work that I think is particularly cool. Please make one separate answer per idea. Please indicate if it is an application or an approach. Please upvote the ideas that you consider most promising and most close to immediate application. |
|
Decision making Any merchant must make numerous decisions on how they present their products to customers. For physical stores they can directly observe their customers to optimise their presentation. An industry has grown up around this (see the book Why We Buy) and there are fairly accepted patterns for common stores types (e.g. supermarkets always have the fresh produce at the entrance). This is not the case of online stores; the industry is not as mature and more importantly customers cannot be so easily observed. However online stores have the advantage that they can change instantly. This presents a decision making problem: what sequence of content should the store present to a customer to increase the probability of a purchase. Work on bandits (e.g. John Langford's contextual bandit and reinforcement learning is directly relevant here. Of course decision making can be applied any tine a system has to, you know, make a decision but the above is straightforward commercial application. There's also some nice work on nested HMMs for browsing / shopping behavior, but I don't think this can readily be turned into a decision making system.
(Jan 28 '11 at 05:56)
Oscar Täckström
|
|
Application: Knowledge extraction Wick, McCallum, Miklau. Scalable probabilistic databases with factor graphs and MCMC and other recent papers by Andrew McCallum's group. The idea is that you can use factor graphs for inference in information extraction, relation extraction, coreference, and many other interesting problems, while leveraging a traditional database and answering any SQL query by sampling. The idea is that you keep in the database both the actual data and the inferred value, and always resample the inferred values with MCMC. Then to answer a query you create a materialized view, watch it change for a while while the database sampler explores the answer space, and get the answer. As the inference is completely decoupled from the database, you can ask questions that don't fit at all in the graphical model formalism used under the hood. Some details for more complex applications are still a bit fuzzy (like how do you average over inferred data with an identifiability problem), but for simple things this can work really well. See Andrew McCallum's talk at videolectures, it's really interesting, and should be an easy sell for the enterprise environment due to it being database-conscious, java-friendly, and very abstractable (as in, users don't have to know there's an inferential machine in there). |
|
How about the latent factor log linear models by Menon and Elkan? http://arxiv.org/abs/1006.2156 Very nice representation for recommendations that accounts for extraneous features very well. Very simple training and state of the art results. |
|
Applications: Machine translation + spam detection. Large scale recurrent character-level language models (unpublished) Machine translation requires accurate language models, to choose correct translations. Detecting spam also requires large-scale accurate language models. However, most large-scale language models are not very accurate, and most accurate language models are not large scale. Large scale recurrent character-level language models are both accurate, and large-scale, and have immediate applications in machine translation and spam. Source code for this technique has NOT been published. 1
Any links you could suggest where one might find more information about this approach?
(Jan 22 '11 at 15:27)
tlake
tlake, try http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf
(Jul 24 '11 at 23:59)
gdahl ♦
|
|
Approach: Parallel training. [Note: This isn't that good, since I think this technology isn't mature, and it is several steps away from being an application in practice.] Graphlab, a new parallelism abstraction (Low et al, 2010) There are two ways to achieve significant improvements in predictive analytics and ML tasks like recommendation, sentiment analysis, credit risk assessment, financial forecasting, etc: You can throw more data at the problem or you can use more sophisticated learning algorithms. MapReduce, and its implementation Hadoop, have been highly successful at promoting distributed computing. MapReduce is good for single-iteration and embarassingly parallel distributed tasks like feature processing, which means that a lot more data can be processed. However, Map-Reduce is too high-level to implement sophisticated learning algorithms. What kind of gains could you see if you could have the best of both worlds? Large data AND sophisticated learning algorithms? GraphLab might offer those gains. GraphLab is only slightly lower-level than MapReduce, but significantly more powerful. It is good for iterative algorithms with computational dependencies or complex asynchronous schedules, and has been tested on a variety of sophisticated machine learning algorithms. Source code is available that implements GraphLab. Joseph, is MapReduce and Hadoop, too high level for RBMs and autoencoders? I am just starting with MapReduce, so I would like to know if I could try that.
(Jan 26 '11 at 05:20)
Oliver Mitevski
1
@Oliver, Hadoop's biggest drawback is math operations are too slow in Java. There isn't a Java BLAS library that can compete with the likes of ATLAS. When training an RBM on MNIST with a structure of the form 784x500x2000x2000, for example, the 2000x2000 layer is the slowest, but even that layer on a quad core machine, using an appropriate math library to do the computations (eg, ATLAS, or a similar threaded BLAS library) takes about 9 seconds per mini-batch (5 epochs x 600 mini-batches, takes about 90 minutes). Java can make use of C libraries, so you could probably come up with a slick implementation that lets you use ATLAS from hadoop/mapreduce, but if I had to put money on it you'd need a few hundred machines to match what a single nVidia geforce GTX570 can do. Turning one of those loose on a well developed RBM implementation can accelerate it as much as 3-400x the speed of a similarly well-devised implementation that just runs on a CPU.
(Jan 27 '11 at 00:14)
Brian Vandenberg
Ignoring the compute capability of geforce cards -vs- compute limitations in Java, the throughput on a CUDA enabled card vastly outstrips the throughput you get over ethernet.
(Jan 27 '11 at 00:15)
Brian Vandenberg
very interesting Brian! Then, are there any advantages of Hadoop over GPU? Not necessarily for RBMs but in general, why would anyone prefer to use Hadoop and not GPU.
(Jan 27 '11 at 04:11)
Oliver Mitevski
One of the most useful tasks I've seen was a distributed database server. Another good one is massively parallelized text parsing. Basically, anything that can be done well in java as part of a larger task that can be parallelized. If there were a straightforward way to use CUDA from hadoop, you could parallelize to many GPUs that way as well, but you'd still need to have a task large enough that the network traffic is one of your lowest time sinks.
(Jan 27 '11 at 11:56)
Brian Vandenberg
Hadoop is a good tool for machine learning: 1) Deal with massive data set (billions of training samples) 2) Massive parallelism on thousands of machines. That is, it is scalable horizontally. You use high-end data nodes (GPUs,multi-cores). In the end, it depends on how you design your algorithm to fit map/reduce framework. I can easily train regularized logistic regression on billions of training samples with millions of sparse features in hadoop. In order to use C++ code, you can use hadoop streaming. Hadoop is language-neural framework.
(Jul 24 '11 at 15:29)
Jianxiong Dong
showing 5 of 6
show all
|
|
Application: Semantic search. Semantic hashing (Salakhutdinov + Hinton, 2007) Keyword search and its varients, like that done by Google, can easily scale to billions of documents, but can often miss relevant results. What if your search is missing relevant results, because simple keyword matching misses documents that don’t contain that exact keywords? This issue is especially acute for short text, like tweets. Tweets about the MTV music awards, for example, rarely contain the term VMA or the hash tag #vma. But wouldn’t it be useful to retrieve all relevant results? Semantic hashing allows you to do search just as fast as keyword matching, but it does semantic search and find relevant documents that don’t necessarily contain the search keywords. It also is completely automatic, and doesn’t require ontologies or other human annotation. And it can scale to billions of documents, like keyword search. This is indeed a promising development to perform web-scale similarity lookups for documents. I would also add that this approach should be amenable to to indexing non-text multimedia content (image, sounds, audio) if you stack it on top of a good (unsupervised?) feature extraction layer. For instance for images you can extract convolutional code-words using Convolutional-DBNs or simpler convolutional soft kmeans) or more holistic features (eigen-scenes with PCA, GIST scene descriptors, ...)
(Jan 22 '11 at 10:04)
ogrisel
1
I add a specific 'data' provider: Semantic search in twitter messages. The paper by Socher et al. ( Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks) seems to be an ideal candidate for this.
(Jan 28 '11 at 11:55)
osdf
|
|
Application: Question answering. Unsupervised Semantic Parsing (Poon + Domingos, 2009+2010) A lot of work has gone into building natural language search engines, and question-answering systems. However, these works have only been moderately successful. In particular, previous approaches (like that of Powerset and Wolfram Alpha) have required sophisticated linguistic expertise, and extensive ontology and knowledge-base construction. Essentially, there have been a lot of human engineering in the loop, and these techniques still don’t work so well. Unsupervised semantic parsing is a highly ambitious and successful technique that attacks the problem of reading text and understanding its meaning. It requires no human annotation, and just learns by reading text. It has been applied to question-answering and is far more successful that competing academic baselines. By combining this automatic technique with current human-engineered tricks, one could significantly improve deployed NL search and question-answering systems. Source code is available that implements this technique. |
sounds like an interesting presentation. I'd appreciate if you can post your slides after you have presented.