|
What are people's thoughts on the role of Industry versus Academia in Machine Learning? What sort of problems, areas of study etc, are best tackled by each group? It's arguable nowadays that topics like Large-Scale ML (big data etc) is something that only industry has the resources to deal with. However many industry people also contribute to state-of-the-art ML theory conferences like COLT. So what is best tackled by academia? (ps - the easy answer here, not so interesting, is that industry does applications, whereas academia does blue-skies research. I'm wondering if there is a deeper answer.) |
|
Academia in machine learning often has close ties to industry and academics commonly work on problems of great practical interest. For the most part, industry does no real basic machine learning research (there are a few exceptions). People in academia can do basically everything machine learning experts in industry do, but they often won't want to. In my experience using deep neural nets for speech recognition, my collaborators in industry were perfectly willing to spend a lot of time making small tweaks to the system, long after I would get bored of tweaking learning rate schedules and number of training iterations. In industry, it can be completely worth it spending a while making very small adjustments since those adjustments are very easy to do, even if they don't change something fundamental enough to interest an academic researcher. In order for academics to have access to all the data Google has, they need to work with Google. But this can and does happen. People in academia can work on large scale problems if they want to, but they might not be able to work on a particular problem that has propriety data they don't have access to (obviously). I am not too sure about the "people in academia can do basically everything machine learning experts in industry do". I believe there is a lot of skills you don't learn in academia, possibly because of a lack of interest or necessity.
(May 08 '12 at 16:35)
Justin Bayer
2
Can you give an example of those skills Justin? My point is that they are often the same people. Machine learning academics often have very close ties to industry.
(May 09 '12 at 16:58)
gdahl ♦
To be honest, I can only assume. However, working on industry applications comes with so many requirements that you do not have in academia: reliability of application, integration constraints, dead lines, quality of software, flexibility, knowledge of computational details of algorithms. These are all things that you can practically avoid in academia and, so I assume, most people actually do. (Does not necessarily apply to those with industry ties.)
(May 12 '12 at 14:19)
Justin Bayer
Machine learning researchers in industry might only make prototypes that people in product teams will then re-implement.
(May 13 '12 at 01:58)
gdahl ♦
Exactly. However, from my experience, reimplementing prototypes does require machine learning expertise. Thus I refer to the people who do that as "industry machine learning experts". (E.g. from the top of my head: the logistic regression decision can be done without the softmax by looking at the linear output and this can lead to enormous speed ups in real time apps; modification of cost functions to be more appropriate for needs that become apparent in production, e.g. outputs of a model should follow a constraint (optimize an additional error term or cut off the outputs of the model?); feedback from quality of a system to the algorithm (Will more data help? Will more features help? What data, where should we put another sensor? Will moving these computations to the GPU result in a major speedup? Does it make sense to buy better CPU hardware instead? This preprocessing step is too costly (e.g. in terms of run time or development time), can we substitute it with something similar without loss of too much performance?)
(May 13 '12 at 06:04)
Justin Bayer
1
This is an interesting blog post from Simon Tong, who did machine learning research at Stanford and then wrote a pretty big, important ML system at Google. http://googleresearch.blogspot.com/2010/04/lessons-learned-developing-practical.html
(May 14 '12 at 12:09)
Rob Renaud
showing 5 of 6
show all
|
|
I have read to much white papers in which academics are building great classifiers that score 98% accuracy, on embarrassingly small amount of data (like 1k samples). But if you will test same classifier on real amount of data, accuracy becomes so low that classifier is useless. I have tried with few in my work. So natural conclusion is that scientist does not have access to real data. But maybe I am wrong. |
|
But how you can do something fundamental if you do not have access to real datasets and other infrastructure (like computer cluster)? It could be then only something theoretical, probably not very useful in practice ;) I'm going to guess that this answer was voted down because it's based on some incorrect assumptions. 1) academics don't do work on "real data" and 2) academic infrastructure is too weak to run state-of-the-art algorithms. Both of these are quite false.
(May 10 '12 at 11:54)
Andrew Rosenberg
Also, 3) fundamental things always involve real and big data sets and 4) theoretical results don't help in practice.
(May 14 '12 at 03:47)
Justin Bayer
|
I believe that academic researches can afford to spend more times on problems and specificly on problems that their immediate usefulness isn't clear. The industry also does research (lots of papers by google, microsoft, yahoo, more) but I think it's narrower.
Let me rephrase the question: "Given the skills and resources that each group (industry versus academics) has, what current topics, or types of problems, should they spend their time on to be the most effective?".