|
I'm trying to implement a paper I read and am having trouble setting it up. The goal is to learn a transformation matrix between two spaces containing the same set of labeled points. So given a set of vector pairs The transformation matrix W can be learned by optimizing using stochastic gradient descent. I'm having some trouble figuring out the gradient though. Any help or tips? |
Where 
