In several papers that discuss layer-wise pre-training of deep networks (e.g Hinton 2002, Bengio et al. 2007), there are references to the empirical data distribution. In the Bengio paper (see section 2.3) where auto-encoders are the building block, it is referred to as hat(p):

klzzwxh:0004at(p): empirical data distribution, g^0:

In the Hinton paper, where RBMs are the basic building block, it is referred to as p^0 (see section 3).

What do these papers mean by the empirical data distribution? And what is an effective way to sample from it?

asked Apr 26 '13 at 11:49

LeeZamparo's gravatar image

LeeZamparo
56247


One Answer:

I believe they are referring to the distribution of your actual dataset, e.g., the input data you are training your autoencoder on. This would be the empirical data distribution because its what you actually see, as opposed to the true distribution that may have created your data. Sampling from it just means grabbing a vector (or whatever form your data is in) at random from your set of input vectors.

answered Apr 27 '13 at 10:47

Benjamin%20Rapaport's gravatar image

Benjamin Rapaport
313

edited Apr 27 '13 at 10:48

Ah, ok thanks. This was my first impression, but then I asked myself why they didn't explicitly write to sample elements from the data. Sampling from the data distribution doesn't mean the same thing, imo.

(May 16 '13 at 13:01) LeeZamparo
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.