I'd like to do stochastic gradient descent on f(A,B), where A is a matrix and B is a diagonal matrix. I'd like to maintain a constraint A^T B A=I during optimization. It doesn't matter if this constraint holds exactly, but I'd like it to stay very nearly satisfied at all times. Is there a good way to do this?

Also, is there a Google-able mathematical term for the property A^T B A = I?

asked Nov 15 '12 at 11:35

Ian%20Goodfellow's gravatar image

Ian Goodfellow
1072162734

I don't know if this helps, but if you take B to be the identity matrix, then your equation is equivalent to saying A is orthogonal.

(Nov 15 '12 at 14:21) alto

Yeah, I know that. I don't know if there is a more general term than "orthogonal" that describes the case where B can have values other than 1 on the diagonal.

(Nov 16 '12 at 15:12) Ian Goodfellow

One Answer:

I tried but I couldn't figure out how to derive a projection operator for such a constraint.

However, you can do ADMM (I like these slides by Steve Boyd, also the full paper). Essentially, you replace your primal objective with f(A,B) + lambda * (A^T B A - I) + (rho/2)||A^T B A - I ||^2, and then do gradient updates on A,B followed by updating lambda = lambda + rho(A^T B A - I). Then the lambdas are guaranteed to converge to the optimal dual variables if your problem were convex (though it isn't you still should expect something reasonable). Note that the normal version of the ADMM uses exact minimization in the primal, but the thing I described (where you do gradient steps) is the linearized version, also known as the split inexact Uzawa method (according to this paper).

answered Nov 15 '12 at 18:30

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thanks, I'll read that paper.

(Nov 16 '12 at 15:12) Ian Goodfellow
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.