Has there been any research on combining a graph database such as pregel and deep learning? For example, take a graph of encoders
encoder+decoder = autoencoder (contractive/denoising/sparse) or encoder = ica or encoder = sparse pca or encoder = non-negative matrix factorization
The encoder graph is directed and acyclic
The output of two or more encoders must match the input of the one their plug into. In the simplest case, input = 2*hidden, then 2 encoders plug into another encoder.
Pregel is a graph database (there are open source versions too) which can handle trillion+ connections. As far as I can tell, the database is almost perfectly suited to the encoder graph formulation of unsupervised deep learning, and should probably take only a few lines of code.
Is there any obvious flaws in the above, has it been already tried?
A distributed graph-based system is used to train deep networks at Google. Here's a talk on it here: http://graphlab.org/wp-content/uploads/2012/07/graphlab_talk.pdf
Theoretically you could use Pregel, but I think in practice if you put a large neural network into it, the update will take too long. You need neural network specific optimizations that are hard to fit into Pregel. For instance, Pregel's bulk synchronous parallel model is an overkill for gradient descent, see Hogwild by Niu,Recht. By forcing synchronous updates you are going to spend most time waiting for few stragglers to finish or have to spend a lot of resources duplicating effort with backup workers. There's also an issue of network -- you want to organize your task updates in a smart way so that you are not network bandwidth bound. For vision tasks, this means putting computations corresponding to nearby patches to same machine or same rack. If you don't control the bandwidth requirements, it's easy to end up with a distributed gradient descent that works slower than a single GPU, and gets slower as you distribute it over more machines
I assume it would be much more efficient to have all the model parameters live contiguously in memory (possibly sharded on a cluster of machines) rather than using a database that would enforce disk write operations for each weight update. It's also probably better to isolate parameter updates as much as possible to avoid synchronization (blocking message passing between cluster nodes) by using tricks such as Local Receptive Fields.
You can always snapshot the states of the network weights to a persistent storage such as a distributed FS or a Amazon S3-like blob store asynchronously in the background from time to time for backup purposes.