|
Does anyone have any experience with this? It seems like the gradients will be "too stochastic", so to speak. I would like to try automatic differentiation, but my implementation is in Java and it will be rather expensive to translate it to C++, which appears to the dominant language for AD. |
|
Main problem is that you need lots of evaluations to get a stable gradient. If you had N parameters, doing it correctly would require 2N evaluations of the loss. You can of course approximate this, but it is still a lot more computation than doing it analytically. If you are using models of which the gradient is tedious to derive and implement, I whole heartedly recommend tools like autodiff or theano, which do this for you. It safes so much time. If you don't want to switch, using a numerical gradient in order to check your implementation of he analytical gradient is great help as well. |
|
You are much better off just writing code by hand to compute the derivatives. The numerical finite difference derivatives should just be used to check your other code. Since you mentioned that you would like to try automatic differentiation, I can only assume that it is possible to write the code for the derivatives. Using numerical derivatives will be unstable numerically and thus inaccurate and it will almost certainly be much slower. i second this. if possible, write out the gradient in code. if it's not possible, you probably have bigger problems.
(Oct 10 '11 at 16:32)
Travis Wolfe
|
|
another way to smooth out the 'wild' gradients is to use mini-batches. simply build up a buffer of m examples, for some small m. Compute the gradient over this small set, update your hypothesis, and repeat. This is usually a very simple change to one's code. |
How is the gradient stochastic? Is the function non-smooth? In that case you might try other optimization technique.
Why are you not using regular backpropagation?
Isn't backpropagation used only for NN? And still, stochastic derivatives sounds like a non differentiable function, which would fail to most of standard derivative-based optimization techniques
Oh, I kind of just thought NN when I saw SGD - pavlovian reaction. Makes more sense now.