|
Dear all, I have been having this thought for a while concerning priors on parameters and initialisation schemes for parameters where I can seem to reconcile the two. More specifically in an example:
The problem:
My thought: The apparent contradiction seems to me to be some kind of artifact of the optimisation procedure. It is true that before seeing the data we can only make very vague assumptions (thus the braod prior) about the weights, but our search for optimal weights needs to start from a simple neural network with small weights. After all, small weights are as likely as large weights under the broad prior. I realise that this post is a bit vague and probably I am confusing myself somewhere. Maybe somebody finds it interesting or wants to share his/her thoughts on this. Thanks! N. ps. the same thing probably happens also in other cases e.g. training a mixture of gaussians with broad priors set on the means, but initialising it via K-means. |
|
I'm not sure if that broad prior can be considered an actual prior, at least not in the sense that imposing a prior distribution is intended to provide prior knowledge. Assuming the weights can aquire any value reminds me of the assumption in maximum entropy approaches where the uniform distribution has maximum entropy when no prior knowledge is given. That aside, you could actually consider small weights at initialization to be a prior which is not necessarily imposed by the task but by the model (ie preventing saturation of the activations in a network). That prior is mainly used to accelerate gradient descent convergence rather than improving classification performance. Hello Christopher, thanks for your answer. I like what you said about the prior being imposed by the model and not the task. The model thinks to itself: before I see any data I will start with a very simple configuration, namely (almost) straight lines, and then move on to more complex configurations.... However, I still cannot reconcile the two ideas, the task and model prior as you call them. Cheers, N
(Jul 15 '11 at 08:53)
Nikos G
|