|
For distributed representation of words(language processing), what is the role of hidden layers. Is it possible to induce comparable representation without hidden layers. As, introducing hidden layers is always computationally expensive.Let's take an example of Bengio et al, do you think it can still give similar representation without hidden layers.I have been trying it without hidden layers, after 15th of iteration , I am not getting encouraging result. Is it due to not adding hidden layers , or I have to still wait for some more iterations.On an average, I many iterations we need to wait for a neural network to converge?But , it seems log-bilinear model gives similar result although it does not contain any hidden layer. |
|
Any model that learns an embedding for words in the usual neural language model way can be thought of as having a hidden layer with linear units. For instance, if v is a high dimensional binary unit vector (i.e. v looks like a column of the identity matrix) we can think of the table lookup in a neural language model as actually being performed by Wv where W is the table of word embeddings. I suspect your question is really about getting neural language models to work. Assignment 1 from an undergrad course taught by Geoff Hinton at U of Toronto provides matlab code implementing a simple neural language model. (You can find the course website from Professor Hinton's web page.) There is no way to meaningfully answer your general question about how many iterations one needs to wait before one stops training a neural net. The number of iterations depends on the learning algorithm, the initialization, the data, the neural net architecture, and the criterion used to decide convergence. Logistic regression and linear regression can be viewed as neural nets with zero hidden layers, but as I mentioned above, for the purposes of your question I think it is best to view a neural language model that learns word embeddings as having a hidden layer formed by the embeddings themselves. |
|
Neural networks with no hidden layers can only linearly partition a space. For most complex problems, including what you are doing, this is insufficient. With a hidden layer, the neural network is able to partition the space non-linearly, and hence learn more useful classifications. Also, a neural network with one hidden layer is a universal function approximator (which is not true for a neural network with no hidden layers). |