|
I think a good rule of thumb is to go and compute the partial derivative w.r.t A_ij. First, if X and Y are matrices then this is a matrix function and each partial derivative is a matrix, and your gradient is a 4-tensor. It can be worked out coordinate-wise but I'll assume this is not what you want. If X and Y are vectors, this is a scalar function and your gradient is a matrix. The function can be written by noticing that X^T A is a vector, such that (X^T A)_i = sum_j x_j A_ij, and hence X^T A Y = sum_i y_i sum_j x_j A_ij. Then each A_ij is multiplying y_i x_j, so the gradient is X Y^T. If what you want is the case where X and Y are matrices then you can repeat the argument above for each row of x or column of y and get the answer. |