Does anyone have any experience with this? It seems like the gradients will be "too stochastic", so to speak. I would like to try automatic differentiation, but my implementation is in Java and it will be rather expensive to translate it to C++, which appears to the dominant language for AD.

asked Oct 05 '11 at 20:17

george%20s's gravatar image

george s
517810

How is the gradient stochastic? Is the function non-smooth? In that case you might try other optimization technique.

(Oct 06 '11 at 01:10) Leon Palafox ♦

Why are you not using regular backpropagation?

(Oct 06 '11 at 02:08) Jacob Jensen

Isn't backpropagation used only for NN? And still, stochastic derivatives sounds like a non differentiable function, which would fail to most of standard derivative-based optimization techniques

(Oct 06 '11 at 03:32) Leon Palafox ♦

Oh, I kind of just thought NN when I saw SGD - pavlovian reaction. Makes more sense now.

(Oct 07 '11 at 18:03) Jacob Jensen

3 Answers:

Main problem is that you need lots of evaluations to get a stable gradient. If you had N parameters, doing it correctly would require 2N evaluations of the loss. You can of course approximate this, but it is still a lot more computation than doing it analytically.

If you are using models of which the gradient is tedious to derive and implement, I whole heartedly recommend tools like autodiff or theano, which do this for you. It safes so much time.

If you don't want to switch, using a numerical gradient in order to check your implementation of he analytical gradient is great help as well.

answered Oct 06 '11 at 07:43

Justin%20Bayer's gravatar image

Justin Bayer
170693045

You are much better off just writing code by hand to compute the derivatives. The numerical finite difference derivatives should just be used to check your other code. Since you mentioned that you would like to try automatic differentiation, I can only assume that it is possible to write the code for the derivatives.

Using numerical derivatives will be unstable numerically and thus inaccurate and it will almost certainly be much slower.

answered Oct 07 '11 at 16:20

gdahl's gravatar image

gdahl ♦
341453559

edited Oct 09 '11 at 00:51

i second this. if possible, write out the gradient in code. if it's not possible, you probably have bigger problems.

(Oct 10 '11 at 16:32) Travis Wolfe

another way to smooth out the 'wild' gradients is to use mini-batches. simply build up a buffer of m examples, for some small m. Compute the gradient over this small set, update your hypothesis, and repeat. This is usually a very simple change to one's code.

answered Oct 08 '11 at 09:28

downer's gravatar image

downer
54891720

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.