11
7

If I have a conditional probability distribution p(y|x) -- is that necessarily called a discriminative model? What's a discriminative model anyway? Is it p(y|x), as opposed to p(x,y)?

Can I train a discriminative model in a generative way? Can I train a generative model p(x,y) in a discriminative way? For the latter, training p(x,y) using perceptron might be an example?

asked Jun 30 '10 at 13:20

Frank's gravatar image

Frank
1349274453

edited Jul 03 '10 at 15:12

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

MEMM and CRF are two examples of probabilistic discriminative models. HMM is a generative model.

(Jun 30 '10 at 15:52) Aman

5 Answers:
14

In a probabilistic setting, training a generative model typically means that you estimate a joint probability: P(x, y), whereas a discriminative model means that you estimate a conditional probability: P(y | x).

Used more loosely, a generative model is one in which the error term is computed with respect to some unsupervised criterion over the input, e.g. reconstruction error. A discriminative model is one tuned for a supervised task of interest.

Ng + Jordan (2001) "On Discriminative vs. Generative classifiers", among other works, demonstrate that generative models converge faster, but discriminative models achieve lower error. Hence, one can imagine a training regime in which one first quickly fits a generative model, then fine tunes the generative parameters under a discriminative criterion.

In fact, that is the very training strategy of most deep architectures. However, in a deep setting, the generative then discriminative training approach is used not for speed, but to avoid local minima in the optimization. In particular, modeling the input carries a stronger error signal than modeling the output (because the input is typically much high dimension than the output), which makes it less likely for the optimizer to fall in bad local minima. After the optimizer has found a good local minimum using the generative criterion, it then fine tunes against the discriminative criterion to improve its generalization.

answered Jun 30 '10 at 15:26

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

1

I'd like to point to another paper that I think is helpful in this discussion: http://research.microsoft.com/en-us/um/people/cmbishop/downloads/Bishop-Valencia-07.pdf, Generative or Discriminative? Getting the Best of Both Worlds by C. Bishop and J. Lasserre. Additionally, section 4 in Bishops PRML provides a nice introduction to concepts like disriminant function, generative model, disriminative model etc.

(Jun 30 '10 at 17:39) osdf

Generative models can be trained using supervised information. I find your second paragraph a bit misleading.

(Apr 30 '12 at 13:22) gdahl ♦

A probabilistic model is places a distribution on a group of random variables. It is generative if some of those random variables which are explicitly parametrized are input random variables. Usually, the input variables condition on the label variables in such models.

Strictly speaking, being discriminative is a criterion for selecting model parameters which for the specific case of conditional likelihood models means selecting parameters to maximize P(y|x); obviously, this criterion would be indifferent to the setting of parameters associated with P(x), since you condition on them. However, discriminative is more broad since non-probabilistic approaches, think SVM and related margin techniques, have discriminative objectives without having any explicit probabilistic elements.

answered Jun 30 '10 at 15:17

aria42's gravatar image

aria42
209972441

Tom Minka has a short writeup that discusses discriminative models vs. training.

answered Jul 01 '10 at 00:08

Dumitru%20Erhan's gravatar image

Dumitru Erhan
12169

As stated by others, discriminative models will focus on estimating P(Y|X) without first trying to estimate P(X, Y). This is the case for logistic regression w.r.t. naive bayes classifier, or CRF w.r.t. HMM for instance.

Furthermore most discriminative models don't even try to model P(Y|X) but just f(x) = argmax_y P(Y=y|X=x) .

answered Jun 30 '10 at 18:35

ogrisel's gravatar image

ogrisel
498995591

edited Jul 03 '10 at 16:17

As far as I'm concerned, the discriminative model is data-driven. It needs only a small amount of data and corresponding labels and there's no need of priors about P(x) and P(y|x). This implies the entropy is maximized. While the generative model is model-driven, which needs the extra priors. And if there's no prior, we need to approximate it using a vast amounts of data. So the former should be faster and more convenient, while the latter more proper in some degree.

answered Apr 30 '12 at 10:14

Zhen%20He's gravatar image

Zhen He
16225

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.