The finite element method is a commonly used technique in engineering. A graph of finite elements (distributed over a cluster using graph partitioning algorithms for efficiency) is run to simulate a system. The method allows to do simulations without an expensive global optimization. Bulk synchronous process is often used to simplify programming.

The same method could be used to scale neural networks to much larger sizes.

The finite elements would be autoencoders and logistic regressors.

(this is also what encoder graphs are)

asked Sep 03 '12 at 11:41

marshallp's gravatar image

marshallp
8391016

Care to elaborate how it could be used to scale neural networks? I don't see it. From my understanding, FEM is a method to approximately solve differential equations. Where is the connection?

(Sep 03 '12 at 14:19) Justin Bayer

FEM cuts a big equation into small pieces. My previous post on "graph database + encoder graph = " explains how it would work in the context of stack autoencoders.

(Sep 03 '12 at 14:34) marshallp

2 Answers:

FEM is for solving PDEs and is not directly applicable to neural network training. What is the PDE you would solve to train a neural net and how would its solution become a trained neural net? The reason FEM hasn't been applied to neural net training is because there isn't an obvious way to do it.

In general, we can already train sufficiently large neural nets on a couple of GPUs (hundreds of millions of weights) to be interesting. Scaling to much more data with nets of these sizes is far more useful than scaling to much larger neural nets. I haven't yet seen a problem where a net that can't fit on 2 modern GPUs has a clear advantage over a net that can. Maybe someday this will happen, but we need to solve the data-scalability problem for fully-connected deep nets first.

answered Sep 03 '12 at 16:17

gdahl's gravatar image

gdahl ♦
341453559

Neural nets can be written as pde's - they are just equations like anything else. However, that's not how FEM has to be used. You can simply plug together a set of elements.

I disagree that small neural nets suffice. That directly contradicts physical evidence (the human brain) and also the work of Jeff Dean (he's going for bigger s better).

If you haven't seen a problem of that scale, then have you already secretly developed human-level object detection and natural language understanding capabilities? I doubt that.

There's nothing to stop the training of the "finite element" on gpu's.

You could simply train things on gpu's like you are doing and plug them together in the end. However, that a more complicated method and less well abstracted method than thinking of it has FEM (or encoder graph).

(Sep 03 '12 at 21:29) marshallp

Based on @gdahl's answer, I think the short answer to your question is that no one's figured out what the PDE looks like. If you do, by all means let us know when you publish a paper or have a working product. I'm sure people would be interested to know how one derives a PDE for neural nets.

(Sep 04 '12 at 02:16) Keith Stevens

I think you misunderstand FEM. It's not about PDE's. PDE's can't be solved by hand or calculation so they are reformulated as variational problems. A variational problem is simply optimization, or in terms of neural networks - training. FEM was invented because you can split this optimization problem into small pieces so that they can be more easily solved as a lot of mini-problems.

You don't have to see FEM in terms of PDE's. In civil engineering, where they were first invented, you see them as the plugging together of finite elements.

The PDE's of neural networks would simply be the PDE's of random equations.

From a computer science perspective FEM = Bulk Synchronous Parallel.

(Sep 04 '12 at 05:17) marshallp

Also, about "publishing a paper" - I'm not an academic - I get nothing by doing so.

Also, this is about scaling to the large. I don't have access to a supercomputer. I'm hoping people who do, get notice of this and use it to make a human-level computer vision system a reality quickly.

(Sep 04 '12 at 05:21) marshallp
3

You misunderstand what I am saying. I am not saying "small nets suffice" I am saying scaling nets to larger datasets is more important than scaling nets to larger models. Also, I think it is possible to beat the image net results from Le et al.'s paper with a neural net that fits in the memory of a single GPU.

(Sep 04 '12 at 18:48) gdahl ♦
3

marshallp, If the thing you want people in large companies like google to do is "parallelize neural net training as much as they can and train larger models" then they are already working on that. If your suggestion has more content than that, you haven't made that clear and you might actually be well served by writing it clearly enough to be published and understood. Otherwise people will probably ignore it. Until you have created something that is specific, precise, clear, and readable I suspect it will be a waste of your time to post it on metaoptimize.

(Sep 04 '12 at 18:54) gdahl ♦

It has more content then that. The google team are doing "distributed optimization". I'm arguing they should not do that and follow the FEM model, that's all.

I don't understand the obsession with reading/writing papers in this community. Can simple concepts not be understood in any other way.

I assumed one of you would be in contact with large corp - Geoff Hintoon was at google this summer right? If someone were to drop a little hint.

I'm sure they'd figure it out sometime (they might already have done so). You guys hadn't done so since 2006 (how to parallelize by using FEM), so I was just trying to put in front of you clearly.

Thanks

(Sep 04 '12 at 22:15) marshallp

Great to see you downvoted as well - not only did you fail to do an obvious thing (apply FEM), but it incenses you to be told of it. You should try doing a little more thinking and a little less arrogance.

(Sep 04 '12 at 22:21) marshallp
showing 5 of 8 show all

An FEM has 2 distinct parts. Nodes, which contain one or more bulk values(mass, velocity, rotation etc.) and the edges on those cells through which flows pass. These flows update the bulk values of the nodes.

The nodes of the FEM are directly analogous to the nodes in the neural net.

The boundary's are directly analogous to the connections, with weights, between nodes.

If I read you correctly you appear to propose using FEM to average over many neurons, to treat the neurons response as the bulk properties of a FEM. That is an interesting proposal but averaging over neurons would lose the details which we believe are vital to the performance of NNs at least for machine learning application, perhaps the method would have some success at simulating a brain, in fact I bet there has been work performed on this very idea. However this has little to do with machine learning so perhaps this is not the best forum to has the question.

Without the averaging the calculations for FEM would be identical to the calculations performed to update NNs and so the point of the exercise disappears.

answered Sep 04 '12 at 12:51

DwoaC's gravatar image

DwoaC
603

I'm proposing simulating a generalization of stacked autoencoders. The nodes of the graph would be dimensionality reducers (with exit nodes being logistic regression).

This is different from classical neural networks.

(Sep 04 '12 at 13:13) marshallp
2

The principles are identical and my comments remain the same.

(Sep 04 '12 at 17:54) DwoaC
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.