|
Machine learning algorithms aim to learn mapping of observations X to outputs Y. In classical Maxent application we have a bunch of features and some labeled data with a finite predefined set of labels. Instead of specifying a closed set of finite labels is it possible to learn the optimal number of classes from the data itself. Though I think that the number of classes cannot be left totally unconstrained as then the highest entropy solution will be each data point being labeled into its own class which is not what we want. Is there any work in the area of unsupervised (or semi supervised) maximum entropy models without predefined label set? Does this even make sense or is it just rambling of insufficient knowledge :-) |
|
Essentially what you're asking is for a log-linear clustering algorithm. I don't know of any paper describing one, but you can theoretically learn a Markov random field with one hidden label variable (that can take K values) from your data (which is a hard problem), and use held-out likelihood on a subset of the data to choose the optimum value of k. The optimization problem above is non-convex, due to the non-identifiability of the labels, but should be easy enough to optimize with an EM variant. I don't think this is a good idea, however, as log-linear models are known to overfit rather easily. Maybe you're interested in Dirichlet Processes for supervised clustering? Is so, see Daume and Marcu, A Bayesian model for supervised clustering with the Dirichlet process prior for an introduction to the problem and some approaches, and also search for dirichlet processes in this site. |
Do you want to learn something that maps (x,y) pairs to probabilities, or something that maps x's to probabilities? In the first case, labels are given. In the second case you don't need labels.