What do people view as being the most compelling practical successes of semi-supervised learning? Of the papers in

 Chapelle, O., Scholkopf, B., and Zien, A. Semi-Supervised Learning. MIT Press, 2006.

the protein ones (Weston/Leslie/Ie/Noble and Shin/Tsuda) look most compelling to me, but I have no idea if bioinformatics practitioners actually find unlabeled data useful. There's lots of proposed applications in language processing, but it's unclear to me how much the unlabeled data is really contributing, and whether one might better have just used an active learning approach. Thoughts?

Update (11-Sep-10): There's actually three versions of this question:

  1. Semi-supervised learning is used during active selection of labels, but ignored afterwards?

  2. Semi-supervised learning is used both during active selection of labels, and afterwards in fitting final model/making final predictions?

  3. Semi-supervised learning is used in combination with randomly selected labels?

In all cases I'm interested in whether there's industrial applications where semi-supervised learning is actually used and provides real benefits. (For active learning this is unambiguously true.)

asked Sep 10 '10 at 16:52

Dave%20Lewis's gravatar image

Dave Lewis
890202846

edited Sep 11 '10 at 08:04


6 Answers:

In my (admitedly small) experience with generative models (mainly in NLP) using unlabeled data usually helps (unless there is far more unlabeled than labeled data), but I haven't done anything worth calling a success.

In many cases in NLP you can get by with very little (or no) labeled data and still get good results. Some examples are transfer learning of dependency parsing (Ganchev et al, Dependency Grammar Induction via Bitext Projection Constraints) where you can use a parser for one language to induce one in another using only aligned text (and finding aligned text is cheaper than actually building a treebank) or prototype learning for sequence models (Haghighi and Klein, Prototype-driven learning for sequence modeling), where you can just give a few prototypes (which do not constitute labeled data) and get surprisingly good POS tagging and information extraction. Both of these works only use labeled data to evaluate the algorithms, which are close to unsupervised in their inner working. There are lots of examples of unlabeled data helping some results slightly, but I think you're asking for radical changes.

answered Sep 10 '10 at 17:48

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

1

While the prototypes paper is extremely cool, I'd take their results with a grain of salt: the selected prototypes list of each pos-tag are very specific, and were selected in a way which IS dependent on the annotated data. In my experience, choosing other prototypes usually result in significantly worse results.

Having said that, semi-supervised POS-tagging IS working quite well, when you take the supervision to be a word-tags lexicon for at least some of the words in the corpus.

You can get even better results if you estimate some of your parameters based on annotated stuff, and then run EM/Gibbs on lots of unannotated data on top of that.

(Sep 11 '10 at 17:10) yoavg

Co-training is a semi-supervised learning algorithm that is simple yet has had a great deal of success, evident by the fact that the influential Blum & Mitchell paper Combining Labeled and Unlabeled Data with Co-Training from COLT 1998 got the ICML 2008 10 years best paper award. There have been a lot of variants of co-training that people use.

answered Sep 10 '10 at 20:02

spinxl39's gravatar image

spinxl39
3698114869

BTW, you mentioned about active learning. Note that there are various ways you can combine active learning with semi-supervised learning. See this paper "Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions" which discusses one such approach.

answered Sep 10 '10 at 20:09

spinxl39's gravatar image

spinxl39
3698114869

Zhu, Lafferty, Ghahramani is a wonderful research paper, and I would be fascinated to know if there's any industrial uptake of the algorithm. However, as far as I can tell, ZLG don't compare their semi-supervised variants with straight supervised learning and active learning. In addition, all of their tasks have artificial structure unnaturally favorable for the use of unlabeled data.

(Sep 11 '10 at 08:18) Dave Lewis

Or, what if you ignored the labels, and just used an unsupervised approach?

I've been wondering a similar question about transduction. Transduction uses unlabelled data at training time by only computing predictions on those points instead of inferring a general learning rule. There are some formulations which use kernels and graphs and are closely related to spectral clustering. My advisor told me that in practice the presence of labels doesn't actually help, you could do just as well ignoring them and simply spectral clustering the data. My question is, has transduction been used for any practical problem?

answered Sep 10 '10 at 18:27

Vicente%20Malave's gravatar image

Vicente Malave
355137

Cluster-then-label is actually a popular aapproach to semi-supervised learning (which is sometimes referred to as unsupervised, as in POS tagging, but labeling a few hundred clusters with a few dozen tags and calling it unsupervised feels odd, in my opinion). Also lots of domain adaptation/transfer learning algorithms that work in the source-labeled target-unlabeled scenario work pretty much as you described.

(Sep 10 '10 at 18:38) Alexandre Passos ♦

I think transductive approaches are quite well studied in semi-supervised learning (e.g., transductive SVMs) and usually do reasonably well.

(Sep 10 '10 at 19:58) spinxl39

I think you're maybe confusing semi-supervised learning and semi-unsupervised learning (terminoloy by daume in this paper http://www.cs.utah.edu/~hal/docs/daume09sslnlp.pdf ). There are algorithms that work almost as well with no labeled data and algorithms that work almost as well with no unlabeled data, and sometimes it can be helpful to keep this distinction in mind.

The workshop where this was an opinion paper also has an interesting bibliography for the original question: http://aclweb.org/aclwiki/index.php?title=Semi-supervised_Learning_in_NLP

(Sep 12 '10 at 10:10) Alexandre Passos ♦

I've been working on semi-supervised learning for identity inference, using an approach similar to what's been outlined in this paper. So identity inference/face recognition is a huge area. An interesting consequence of this research is that far simpler feature functions can be used than (to my understanding) what has been used in the Vision community to track similarity.

answered Dec 30 '10 at 07:37

Avneesh%20Saluja's gravatar image

Avneesh Saluja
11

Closely related to some of the methods you mentioned (transductive label propagation, with some corrections for small amounts of labeled positive examples) is the algorithm that powers the GeneMANIA gene function prediction server. Basically it acts as a recommendation engine for genes powered by graphs derived from high-throughput biological assays, and a browser interface on top of those graphs (full disclosure: I was heavily involved in the development). Because the graphs are aggressively sparsified, you can run the inference procedure on a customized composite graph at the speeds necessary for a live response over the web, despite outperforming algorithms that require far more computational overhead on a recent benchmark of gene function prediction in mouse. That's pretty practical, don't you think?

See Mostafavi et al, 2008 for algorithmic details and, to a lesser extent, Warde-Farley et al, 2010 for the more application-related details.

answered Sep 29 '10 at 15:21

David%20Warde%20Farley's gravatar image

David Warde Farley ♦
551820

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.