I am new to the forum and I would like to thank everyone for taking their time to read my question.

I am trying to optimized three related tasks, therefore I decide to let them share some parameters and train them in parallel. My model is a neural net and the training method is gradient descent. I have now a question regarding the loss functions.

Say, task 1 has a loss function f1(w,x) where w are the shared parameters and x are task 1's private parameters; similarly we have task 2's loss function f2(w,y) and task 3's loss function f3(w,z). I define the joint loss function as the addition of them, i.e., F(w,x,y,z) = f1(w,x)+f2(w,y)+f3(w,z).

In previous experiments I observed that the value of f3 is about 500 times the value of f1 and f2. I wonder if this fact will affect the effectiveness of training, and if there is anything I can do if so.

asked Sep 19 '14 at 11:59

Rex%20Liu's gravatar image

Rex Liu
1111


One Answer:

You can try to normalize all the loss functions, so that they are of similar range. e.g. to have unit variance and zero mean.

That's the easiest thing to try.

Later on, you could get fancy and parameterize each loss with a weight:

l1 * f1(w,x)+ l2 * f2(w,y)+ l3 * f3(w,z)

You can then tune the weights using crossvalidation.

answered Sep 24 '14 at 14:51

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.