In his workshop lecture, Zoubin mentioned that,for parametric models, the complexity of the model is bounded even if the amount of data isn't.

I know is an intuitive idea, and I do agree, I just want to know whether any of you have a link to a paper where I can find the proof that for finite models the complexity of the models is bounded.

Thanks

asked Feb 10 '12 at 02:12

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

That would indeed be interesting. I think one of the problems is how to formalize parametric and complexity. For example, an SVM with an RBF kernel might be considered non-parametric whereas an SVM with a linear kernels is most definitely parametric.

I think a main point by Zoubin was considering models whose (decision) functions are dense in the space of all (smooth) functions.

If you take that as a definition of non-parametric, then the claim becomes somewhat of a tautology.

(Feb 10 '12 at 04:02) Andreas Mueller

2 Answers:

It depends on how you measure complexity. If you use something like VC dimension, it turns out that simple parametric models can hanve infinite VC dimension if you allow them to be curvy enough (an example is the sine function). For most reasonable ways of measuring complexity, however, in a fixed model class, it grows as you allow more parameters (for linear models, for example, the VC dimension is the dimensionality of the space, +- 1).

If you're willing to make an argument that real numbers are unrealistic (it's because of the infinite capacity of real numbers that you get things like the infinite VC dimension of the sine function) and discretize your parameters, then obviously there are only so many functions you can write with a finite parametrization.

answered Feb 10 '12 at 07:40

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

I think VC dimension is not really what Zoubin had in mind when making this argument, for two reasons: 1) It only applies to classification. 2) As your example shows, it does not really capture expressiveness of a model.

I don't think "sin(alpha x)" can be called a very flexible model. The existence quantor in VC dimension is a bit weired....

(Feb 10 '12 at 07:53) Andreas Mueller

I actually like the example of sin(x) because a high-frequency sine can indeed split any set of points in (0,1) in any way you want, so it's exactly the kind of thing you don't want in a classifier.

(Feb 10 '12 at 08:03) Alexandre Passos ♦

Btw, I think you're discreteness argument is a bit weired: if the input and output are discrete, there are only so many functions. So if you have exponentially many parameters, you're fine ;)

(Feb 10 '12 at 08:04) Andreas Mueller

I just meant that if your parameter space is discrete there are only so many functions.

(Feb 10 '12 at 08:05) Alexandre Passos ♦

Yes but when you assume your parameter space to be discrete because you can not represent real numbers, then you should also consider your input space parameter to be discrete. About the sin: Afaik that works only for a very specific set of points. Can you explain how to do that with any set of points?

(Feb 10 '12 at 08:06) Andreas Mueller

The points have to be in general position (not colinear, not equally spaced, etc). See page 11-4 in http://www.cs.huji.ac.il/~shashua/papers/class11-PAC2.pdf for a discussion.

(Feb 10 '12 at 08:12) Alexandre Passos ♦

Thanks for the reference but the explanation "by choosing a big enough omega" ist not really clear to me. What does colinear mean for numbers?

(Feb 10 '12 at 08:20) Andreas Mueller

Imagine for the sake of intuition that the xs are all points in the real line. As omega gets higher, you're dividing the unit line into smaller and smaller equally-spaced intervals labeled as either 0 or 1. If the distances between points are not allowed to be exactly equal you can always find a size for these intervals that is divided by all of these, and then adjust the phase to flip points either way.

(Feb 10 '12 at 08:26) Alexandre Passos ♦

Do you have a reference for a proof? It is not so obvious to me.

(Feb 10 '12 at 12:06) Andreas Mueller

It's in Vapnik's book on statistical learning theory if I recall correctly.

(Feb 10 '12 at 12:11) Alexandre Passos ♦
showing 5 of 10 show all

I think I have somewhat of an answer now. Consider a model as a set of functions F and some compatibility function between elements of F and data sets. Then I would consider a model parametric if F is the image of a function m: R^n -> F for some fixed n.

Then the claim is that if F is parametric, it can not be dense in the set of "all functions".

I think you can proof this by letting F be of a particular form (i.e. topological space, Hilbert space, manifold) and "m" be nice in a way compatible with the form of "F" and "dense" be defined in the right way ;)

For example, if you think of F as a manifold and your data as points in R^k, then the set of all smooth functions on R^k has infinite dimension and R^n has finite dimension so if F is parametric then F has finite dimension (if m is a homo of manifolds).

This answer is marked "community wiki".

answered Feb 10 '12 at 08:18

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.