3
2

Almost all learning theory I've seen is based on i.i.d training and test sets. This, unfortunately, is often not the case in real life.

Suppose instead we have our training and test sets generated by a cunning adversary. This makes things a lot harder of course, but it doesn't sink us completely as long as we impose a few restrictions.

0) The training set must be of some minimal fraction of all data. In some cases, it may have an appropriately representative class distribution (though in the case of transfer learning, the opposite is the case)

1) The adversary does not know our algorithm - this is natural to assume, since in applications like transfer learning, where this could be applied, we know our data before we design our algorithm.

2) We can see unlabeled instances - optional, but allows us to use semi-supervised and transductive learning methods, powerful tools in this case.

3) We know approximate class distributions - definitely not always true, but useful if so.

Anyways, does anyone know any papers in learning theory or providing theoretic bounds given conditions similar to these? Also, the bound of PAC learning theory are often very conservative (despite the i.i.d assumption), but provide useful guidance for practice. Would the bounds on an adversarial distribution provide good guidance?

asked Aug 10 '10 at 10:50

Jacob%20Jensen's gravatar image

Jacob Jensen
1644285360

edited Aug 10 '10 at 14:46

I like this question: kind of like the cryptoanalysis concept of 'trusted receiver'. Can patterns be learnt when an agent is actively harming the learning?

(Aug 11 '10 at 02:14) Robert Layton

You may need more restrictions on how the training set is chosen. For instance, suppose we want to learn the bias of a coin. The coin is fair, what's to stop the cunning adversary from giving us "All Heads" for training data?

(Aug 11 '10 at 03:09) Yaroslav Bulatov

@Yaroslav Bulatov: You make a good point. I guess the restrictions need to be somewhat problem-specific, unless someone could come up with a good general (probably information-theoreticish) formulation. For a biased coin problem, the adversary could maybe block only certain arithmetic sequences.

Still, you'd also be in decent shape if you were able to see that a coin flip took place, but not what its result was. If you have only heads, but also a suspiciously spaced set of hidden results, you can still put error bounds on your bias estimate that will include the true answer.

The sort of problem that I was implicitly thinking of was more along the lines of image recognition, or similar problems where there's a pretty huge amount of data even in a single uncorrupted instance. For instance, say you're trying to learn digits: if your adversary chooses a third of each class to be in your training set then he can do serious damage to your accuracy (compared to a randomly partitioned set), but he can't completely derail you.

In fact, in certain applications even random partitions into training and test sets will have highly variable performance (I had a ~1000 size set of faces for a transfer learning application, and target task accuracy would vary between 75% and 90% depending on the partition - part of this is probably poor convergence of my method, but the set choices make a big difference too). Establishing a bound in the adversarial case gives you a bound on the worst random case too, which is useful in this sort of circumstance.

In many other real-life applications, the test/training same distribution assumption does not hold. Say you're trying to profile criminals. The set of criminals who have been caught (and therefore can be profiled for your training set) is intrinsically different from your test set, the set of criminals who haven't been caught. In finance, the division between past and future data (your implicit training and test sets) is created by perhaps the strongest adversary possible (so strong, in fact, that assumption (1) above, that the adversary does not know your algorithm, may not hold), the efficient market.

(Aug 12 '10 at 14:00) Jacob Jensen

Are you talking about sampling the training set IID, and then having the adversary select a worst part of it? I guess perhaps some bounds could be possible in this case. If you put adversary in charge of sampling, then learning is impossible IMHO.

A related issue are people relaxing the IID assumption without going so far as to include an adversary in the process. For instance here's one approach

(Aug 12 '10 at 17:35) Yaroslav Bulatov

3 Answers:

About your point 0, it isn't clear how simply having more than a minimal size of the training data would help if the training data in totally in the adversary's control? Wouldn't something like a restriction on the fraction of the total training data that the adversary can fiddle with make more sense? For this setting, in particular, there does exist some work that gives PAC-style bounds (e.g., this paper and this paper).

answered Aug 10 '10 at 14:21

spinxl39's gravatar image

spinxl39
3458104368

Fraction is indeed what I meant. Edited.

Thanks for the paper links

(Aug 10 '10 at 14:45) Jacob Jensen

Tangentially related:

Eyal Beigman; Beata Beigman Klebanov. Learning with Annotation Noise. ACL 2009

This discusses a situation where the adversary gets to change labels in your training data. The deeper question is "how costly could annotation noise be"? I thought it was quite nice when I saw the presentation last year.

answered Aug 13 '10 at 03:18

syllogism's gravatar image

syllogism
181139

edited Aug 13 '10 at 03:19

-1

This might be tangentially related to your your question

http://rjlipton.wordpress.com/2010/05/17/the-shadow-case-model-of-complexity/

Refer to following blog post about work by Nina Balcan, it has been applied to Clustering.

To quote from the post

A New Model

Nina has developed a new model that is currently called the BBG model after the three authors of the seminal paper: Balcan, Blum, and Gupta.

Shadow Case Model: This is the new model—the name is mine. The central idea is that in practice, problem instances are not selected by an all powerful adversary, nor are they selected by the flip of a coin. BBG argue instances are selected to have a certain formal property that allows approximation algorithms to work. I think we need a better name, but for now I will stay with “the Shadow Case model.”

I have always thought problem instances are neither worst case nor random, but have never been able to find a good replacement. BBG seem to have succeeded in finding the right way to model many situations. At first it seems like their idea might be circular: a problem instance is “good,” provided our algorithms work well on it. This might be true, but it seems useless for building a sound theory. The magic is BBG have a notion of what a “good” instance is, they can build a sound mathematical theory, further the theory is beautiful, and even better it works in practice. Hard to beat.

her areas of research are machine learning + complexity, some of her recent papers might be more relevant to your question

answered Aug 10 '10 at 20:54

DirectedGraph's gravatar image

DirectedGraph
54531422

edited Aug 10 '10 at 20:56

This is interesting, but not too related. PAC, by contrast to regular complexity theory, not only makes some very pessimistic assumptions but also some wildly optimistic ones.

(Aug 11 '10 at 10:35) Jacob Jensen
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.