|
Almost all learning theory I've seen is based on i.i.d training and test sets. This, unfortunately, is often not the case in real life. Suppose instead we have our training and test sets generated by a cunning adversary. This makes things a lot harder of course, but it doesn't sink us completely as long as we impose a few restrictions. 0) The training set must be of some minimal fraction of all data. In some cases, it may have an appropriately representative class distribution (though in the case of transfer learning, the opposite is the case) 1) The adversary does not know our algorithm - this is natural to assume, since in applications like transfer learning, where this could be applied, we know our data before we design our algorithm. 2) We can see unlabeled instances - optional, but allows us to use semi-supervised and transductive learning methods, powerful tools in this case. 3) We know approximate class distributions - definitely not always true, but useful if so. Anyways, does anyone know any papers in learning theory or providing theoretic bounds given conditions similar to these? Also, the bound of PAC learning theory are often very conservative (despite the i.i.d assumption), but provide useful guidance for practice. Would the bounds on an adversarial distribution provide good guidance? |
|
About your point 0, it isn't clear how simply having more than a minimal size of the training data would help if the training data in totally in the adversary's control? Wouldn't something like a restriction on the fraction of the total training data that the adversary can fiddle with make more sense? For this setting, in particular, there does exist some work that gives PAC-style bounds (e.g., this paper and this paper). Fraction is indeed what I meant. Edited. Thanks for the paper links
(Aug 10 '10 at 14:45)
Jacob Jensen
|
|
Tangentially related: Eyal Beigman; Beata Beigman Klebanov. Learning with Annotation Noise. ACL 2009 This discusses a situation where the adversary gets to change labels in your training data. The deeper question is "how costly could annotation noise be"? I thought it was quite nice when I saw the presentation last year. |
|
This might be tangentially related to your your question http://rjlipton.wordpress.com/2010/05/17/the-shadow-case-model-of-complexity/ Refer to following blog post about work by Nina Balcan, it has been applied to Clustering.
her areas of research are machine learning + complexity, some of her recent papers might be more relevant to your question This is interesting, but not too related. PAC, by contrast to regular complexity theory, not only makes some very pessimistic assumptions but also some wildly optimistic ones.
(Aug 11 '10 at 10:35)
Jacob Jensen
|
I like this question: kind of like the cryptoanalysis concept of 'trusted receiver'. Can patterns be learnt when an agent is actively harming the learning?
You may need more restrictions on how the training set is chosen. For instance, suppose we want to learn the bias of a coin. The coin is fair, what's to stop the cunning adversary from giving us "All Heads" for training data?
@Yaroslav Bulatov: You make a good point. I guess the restrictions need to be somewhat problem-specific, unless someone could come up with a good general (probably information-theoreticish) formulation. For a biased coin problem, the adversary could maybe block only certain arithmetic sequences.
Still, you'd also be in decent shape if you were able to see that a coin flip took place, but not what its result was. If you have only heads, but also a suspiciously spaced set of hidden results, you can still put error bounds on your bias estimate that will include the true answer.
The sort of problem that I was implicitly thinking of was more along the lines of image recognition, or similar problems where there's a pretty huge amount of data even in a single uncorrupted instance. For instance, say you're trying to learn digits: if your adversary chooses a third of each class to be in your training set then he can do serious damage to your accuracy (compared to a randomly partitioned set), but he can't completely derail you.
In fact, in certain applications even random partitions into training and test sets will have highly variable performance (I had a ~1000 size set of faces for a transfer learning application, and target task accuracy would vary between 75% and 90% depending on the partition - part of this is probably poor convergence of my method, but the set choices make a big difference too). Establishing a bound in the adversarial case gives you a bound on the worst random case too, which is useful in this sort of circumstance.
In many other real-life applications, the test/training same distribution assumption does not hold. Say you're trying to profile criminals. The set of criminals who have been caught (and therefore can be profiled for your training set) is intrinsically different from your test set, the set of criminals who haven't been caught. In finance, the division between past and future data (your implicit training and test sets) is created by perhaps the strongest adversary possible (so strong, in fact, that assumption (1) above, that the adversary does not know your algorithm, may not hold), the efficient market.
Are you talking about sampling the training set IID, and then having the adversary select a worst part of it? I guess perhaps some bounds could be possible in this case. If you put adversary in charge of sampling, then learning is impossible IMHO.
A related issue are people relaxing the IID assumption without going so far as to include an adversary in the process. For instance here's one approach