|
I know that generative means "based on P(x,y)" and discriminative means "based on P(y|x)," but I'm confused on several points:
Related question that got me thinking about this. |
|
I think of generative algorithms more as algorithms that provide a model of how the data is actually generated (I think of them as giving you a model of both P(X|Y) [and P(Y)], rather than of P(X, Y), though I guess it's equivalent), and discriminative algorithms as algorithms that do a slightly more black-box classification (which doesn't necessarily have to be probabilistic). Compare, for instance, Gaussian mixture models and k-mean clustering. In the former, we have a nice probabilistic model for how points are generated (pick a component with some probability, and then emit a point by sampling from the component's Gaussian distribution), but there's nothing we can really say about the latter. [Note that generative algorithms have discriminative properties, since you can get P(Y|X) once you have P(X|Y) and P(Y) (by Bayes' Theorem), though discriminative algorithms don't really have generative properties.] 1: Discriminative algorithms are any algorithms that allow you to classify points, without providing a model of how the points are actually generated. So these could be either algorithms that try to learn P(Y|X) directly (e.g., logistic regression) or algorithms that try to learn the mappings directly from the points to the classes (e.g., perceptron and SVMs -- these pretty much simply give you a separating hyperplane, but no model of generating new points). So yep, discriminative classifiers are any classifiers that aren't generative. Another way of thinking about this is that generative algorithms make some kind of structure assumptions on your model, but discriminative algorithms make fewer assumptions. For example, Naive Bayes (generative) assumes conditional independence of your features, while logistic regression (discriminative "counterpart") does not; similarly, latent Dirichlet allocation (generative) assumes your topic mixtures are sampled from a Dirichlet distribution and so on, while latent semantic analysis (not sure I'd quite call it a "discriminative algorithm", since it's not really a classifier, but it's more discriminative than generative) is pretty much just a mathematical algorithm. 2: Yep, Naive Bayes is generative because it captures P(X|Y) and P(Y). For example, if we know that P(Y = English) is 0.7 and P(Y = French) is 0.3, along with English and French word probabilities (e.g., P(hello|English) = 0.09, P(bonjour|French) = 0.08), then we can now generate a new document by first choosing the language of the document (English with probability 0.7, French with probability 0.3), and then generating words according to the chosen language's word probabilities. Yeah, I guess you could make logistic regression generative in that fashion, but it's only because you're adding something to logistic regression that's not already there. That is, when you're performing a Naive Bayes classification, you're directly computing P(Y|X) = P(X|Y) P(Y) [the terms on the right, P(X|Y) and P(Y), are what allow you to generate a new document]; but when you're computing P(Y|X) in logistic regression, you're not computing these two things, you're just applying a logistic function to a dot product. 3: Elaborating a bit on what Alexandre Passos said, generative models often outperform discriminative models on smaller datasets because their generative assumptions place some structure on your model that prevent overfitting. For example, let's consider Naive Bayes vs. Logistic Regression. The Naive Bayes assumption is of course rarely satisfied, so logistic regression will tend to outperform Naive Bayes as your dataset grows (since it can capture dependencies that Naive Bayes can't). But when you only have a small data set, logistic regression might pick up on spurious patterns that don't really exist, so the Naive Bayes acts as a kind of regularizer on your model that prevents overfitting. There's a paper by Andrew Ng and Michael Jordan on discriminative vs. generative classifiers that talks about this more: http://robotics.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf 4: I think what it means is that generative models can actually learn the underlying structure of the data if you specify your model correctly and the model actually holds, but discriminative models can outperform in case your generative assumptions are not satisfied (since discriminative algorithms are less tied to a particular structure, and the real world is messy and assumptions are rarely perfectly satisfied). [I'd probably just ignore these quotes if they're confusing.] |
For (3), is this because generated models can depend on the prior P(x) when little data is available to get a good estimate of P(y|x)?
(Jul 01 '11 at 15:29)
crdrn
|
|
There is also two types of discriminative: discriminative models and discriminative functions. The first gives you the p(y|x), while the latter only gives you $f: X rightarrow Y$ -- it maps the input directly to a class. edit: For the downvoters: This definition is not mine, it's Chris Bishop's. |