|
I'm training a DBN for classification with CD1 pretraining and Conjugate Gradient fine tuning. The CG implementation is based on Carl Edward Rasmussen code (http://learning.eng.cam.ac.uk/carl/code/minimize/minimize.m). It works quite well with sigmoid binary hidden units, but does not seem to work on Rectified Linear Hidden Unit. Is there any reason why CG should not work on RLU hidden units ? Is there a better method for fine tuning a DBN with RLU hidden units ? |