I'd like propose some research directions in large scale deep learning (so that those of you in academia can publish papers and code and I can get the fruits of that labor).

The basic premise - stacking autoencoders in various ways, where an autoencoder is simple a 3 layer neural network whose in and out layers are the same size and the hidden layer is smaller. Training consists of minimizing the difference between data going into and coming out of the network. Afterwards the hidden-to-out layer is thrown away (the decoder stage) and the in-to-hidden is kept (the encoder stage). Training can be sgd or simulated annealing or ga etc.

Multiple autoencoders can be trained over the same data (from random starting spaces) and they will land on different "hills" of the optimization landscape (so that ensembles will tend work better).

The output of one encoder stage can be fed as data into 1 or more other autoencoders, so that you have a directed acyclic graph of encoder stages. Further, this encoder dag will tend to be too large to fit in the memory of one computer and will be distributed across a network.

Research+code:

  • Determining whether two encoders have hit the same "hill". This can be done wither be directly examining the connection weights or examining the data output. If the same hill has been reached, methods for caching/memoizing the data input so that encoder dag is more efficient.

  • Building a "library" as in genomics/connectomics, the "artificial connectome", of the encoders and encoder dags, over various datasets such as imagenet, speech dataset, wikipedia, human genome etc.. Analysis on those artificial connectomes, and between connectomes (e.g. have the same encoder dags been found in the wikipedia connectome and the imagenet connectome).

  • performance improvements, such as creating fpga/asic for autoencoder training.

asked Jul 30 '12 at 22:57

marshallp's gravatar image

marshallp
8391016

4

Your post is not clear to me at all. I would suggest you edit it to clarify what you are saying, but I doubt it will be well received even with edits since it doesn't ask a question. In general, if there is research you want to have done, you are best off doing it yourself or funding/supervising other people that are doing it. Most researchers have so many ideas of their own they will always work on one of their own if given the choice.

(Aug 01 '12 at 03:49) gdahl ♦

The main point I'm trying to convey is that the deep learning community should be moving to large-scale thinking, like in astronomy or genomics. The Andrew Ng google paper about using 1000 computer cluster is what I'm talking about. It's beyond the budgets of almost everyone to do something on that scale or larger, so that a combined effort of storing the "found" models should be undertaken, tools should be created so that hobbyists can donate computing time easily etc. and a catchy name such as artificial connectomics or perceptomics etc. should be agreed upon to generate buzz. The net effect will be that you'll get more citations, more funding, possibly kickstate a new industry, and accelerate human progress (which is what i assume most ml people care about).

(Aug 01 '12 at 14:31) marshallp

I'm not a researcher but I would think that some researchers would be offended by this post. This place is not a place to recruit researchers to work on your pet projects for free so you can "get the fruits of that labor". If you really want this done, like gdhal said, just do it yourself or provide funding. There might be a chance to persuade a research group to pursue your idea otherwise, but I don't think a few paragraphs on QA site will cut it....

(Aug 01 '12 at 15:41) mugetsu

I'm not a billionaire or a professor, so where I exactly can I reach academics in ml (or fund them)?

I don't think it's offensive at all, just throwing an idea out there, takes a couple of minutes to process - if it doesn't catch on - fine - i'll keep going with it (software/blogposts) and hope for the best in the future.

(Aug 01 '12 at 17:55) marshallp
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.