|
Hi, Most networks I have seen have weights that are either fully connected or are some sort of convolutional. Instead of a full connection (i.e. a dense weight matrix), I want to have some sort of sparse connectivity (e.g. maybe 10-100 connections for each neuron, with some random distribution or so). I want to do that mostly because of performance reasons. I want to connect two layers with about 10k neurons each and a dense matrix is just too big. I haven't really found much about this. Have other people tried that? |
|
Ok, I'm not an expert in this, but can't you just use sparse matrices and maybe also use labels to keep track of what weight corresponds to what kind of neuron it connects to? Yes, I also thought about this as one possible solution. I probably will start this way as there are many sparse matrix implementations around. At the learning-step (when applying the gradients from backpropagation), I probably need some extra logic so that the sparse matrix does not explode, i.e. I don't modify too many entries. However, my question was mostly whether there are people who already have tried that. E.g. I wonder if there is some experience about how much connections per neuron I should allow, how I should select the connections, etc.
(Dec 13 '13 at 08:31)
Albert Zeyer
|
|
Recently have been shown that initialization is very important in deep learning. And your approach have very good empirical justification: The intuitive justification is that the total amount of input to each unit will not depend on the size of the previous layer and hence they will not as easily saturate. Meanwhile, because the inputs to each unit are not all randomly weighted blends of the outputs of units in the previous layer, they will tend to be qualitatively more "diverse" in their response to inputs. Also there are some theoretical justifications about what spectral radius of these matrices should be and how this helps to avoid vanishing gradient problem. So I recommend you to read this paper and that paper for more details. I don't really see how the papers are related to my question. I am not asking about sparse initialization. I'm asking about not using a dense weight matrix but some sparse connectivity instead (so that it wont be possible for some neuron to be connected to more than 100 other neurons or so).
(Dec 12 '13 at 15:34)
Albert Zeyer
It seems for me that if weight of some connection is 0 it is the same as the connection is absent, so sparse initialization is equal to your sparse architecture. Only difference is that some zero weights may become non-zero during learning, but I think it will have a small influence for overall performance. Did I miss something?
(Dec 12 '13 at 15:45)
Midas
1
Depends how you clever you design your code. With matlab and the gpuArray class you will run into trouble because it is not possible to initialize a matrix --- that represents all the weights of that network ---- on the GPU (memory issue). If you code clever and the porportion of elements that are absent is always the same you may work around.
(Dec 12 '13 at 16:40)
gerard
1
Yes, it has two issues. One is the memory issues. For a 10k*10k matrix, you need about 400MB. I want to have several of those. That would be just too much. If I restrict it to about 100 connections for each unit, I just need about 4MB (maybe 8MB if I add the index for each connection). The other thing is the calculation/performance. On forward propagation, I can expect it to be 100 times faster because I don't need to go through those 400MB, only through the 8MB. And the same thing for backpropagation and the learning-step because I don't need to calculate gradients for connections which can not exist.
(Dec 13 '13 at 08:28)
Albert Zeyer
|
|
I found something partially related learning a deep compact image representation The authors break the first layer into four parts to speed up initial training. But later they join the 4 parts, which are initialized with the already trained weights, and continue training. It is not directly what you meant, but one step into the direction. I am as well interested in what you are looking for Zeyer. Especially high-dimenionsal images with more than 5000 pixels can be painful with fully connected neural networks. Maybe you just try your ideas and you will see if it works. For images, use convolutional neural networks instead. You only need connections between a node A in one layer and the nodes in the next layer that correspond to nodes in the former layer that are close to A. For large convolution kernels, the convolution can be efficiently calculated by using FFT on both image and kernels and by using the convolution theorem (see Wikipedia).
(Nov 04 '14 at 17:40)
HelloGoodbye
|
|
If you are working with images, I would suggest using convolutional neural networks for this—if it is okay that the network becomes translation invariant; i.e. the network will give almost the same output no matter where in the image the object is located, as long as it is the same object. Convolutional neural networkshave two major advantages over normal neural networks; the first one being that they are much sparser and thus become much faster, and can further be optimized using FFT and the convolution theorem in order not to be slowed down by large convolution kernels. The second advantage being that they learn faster since they automatically generalize knowledge learned in one part of the image to all other parts of the image as well. This ability to generalize is also what makes them translation invariant. |