4
2

How are domain adaptation (DA) and multitask learning (MTL) different from each other? In my mind, DA assumes that the tasks P(Y|X) stays the same, only the distribution P(X) changes across domains. On the other hand, MTL assumes that the distribution P(X) stays the same across all tasks but the labeling function P(Y|X) is different for each task.

Is this the right way to think about the difference? Or are there other ways to explain the difference?

asked Jul 15 '10 at 14:37

spinxl39's gravatar image

spinxl39
3698114869

edited Aug 10 '10 at 01:36

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

1

Oops, I used to call both cases 'domain adaptation' ...

Sometimes they are combined: both p(X) and P(Y|X) change. For example, when you learn a parser on an English treebank and try to adapt it to parse Polish, which uses different tree labels (e.g. more fine-grained, slightly different syntactic categories etc.). (You may have a small Polish adaptation set, which uses these different labels.)

(Aug 13 '10 at 10:16) Frank

3 Answers:

answered Jul 15 '10 at 15:33

Jurgen's gravatar image

Jurgen
99531419

That's a nice way of thinking about it, but I think many domain adaptation methods don't quite fit this metaphor (after all, if P(Y|X) remained exactly the same there would be no point in doing domain adaptation at all if you're learning a discriminative model). Domain adaptation in practice is multi task learning when the tasks are expected to be very similar, and share a bit of the conditional structure (not just the generative structure).

answered Jul 15 '10 at 15:44

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

1

About your question of why DA if P(Y|X) stays the same, the paper by Shimodaira (2000) titled "Improving predictive inference under covariate shift by weighting the log-likelihood function" provides an answer.

The problem comes up when misspecified models are used. Basically, if you selected a model family P(Y|X,theta) and none of the models in your model family match the true relation between X and Y, then you do have to take into account the change in P(X). If the model is misspecified, the optimally learned parameter would depend on P(X).

(Jul 15 '10 at 16:26) spinxl39

This does make sense, yes.

(Jul 15 '10 at 16:30) Alexandre Passos ♦

would it be right to say in MTL, the set of labels between the source domain and target domain are different?

answered Aug 10 '10 at 01:12

priya%20venkateshan's gravatar image

priya venkateshan
1646812

Not necessarily. A very common MTL setting could be such that the examples from both domains have binary labels. It's just the way the examples from the two domains are labeled is different.

I haven't seen any work specifically where the set of labels is different but I think it may be possible, for example having one domain with real valued "labels" (as in regression) and the other domain with discrete valued labels (as in standard classification).

(Aug 10 '10 at 10:19) spinxl39
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.