As I see things, nonlinear regression (i.e. real attributes (X) and a real prediction variable (Y) ) becomes very difficult in data of dimensions higher than say 40 or something. The reason why I think this are:

  • Any sample data in the space normally is extremely sparse. Even if one got a lot of samples this would still not suffice to represent a full nonlinear structure in such a huge space adequately (curse of dimensionality)
  • If a nonlinear structure of the output (Y) would be true then this nonlinearity could be reduced to a relativly small subspace. As a consequence that would mean that one only had to concentrate on this particular subspace for doing the regression and ignoring the other dimensions

This is a guess from me personally and I would like to hear if my thought make sense to you or whether they're wrong.

asked Dec 20 '12 at 06:52

Tom's gravatar image

Tom
71101214


3 Answers:

I would agree with you except for your statement that it is nonlinear structure of the output (Y). I would say it is (non) linear structure of the input (X) , and thats why people do PCA ICA etc. as discussed by daniel.

In particular, since any distribution of data looks sparse in a high dimensional space ( your point 1) and therefore looks like nonlinear structure (point 2) then it often makes sense to remove this by dimensionality reduction. eg PCA is used when we know that the "signal" is correlated across lots of different input variables ( eg in image processing). It makes less sense for say measurements in a scientific experiment where by design you measure things efficiently rather than redundantly.

answered Dec 24 '12 at 15:40

SeanV's gravatar image

SeanV
33629

A lot depends on the distribution of your data in the high dimensional space. One method that can work in a number of situations is to do some for of dimension reduction first & do the nonlinear regression in the reduced space. Choosing the dimention reduction algorithm & regression algorithm depends on particulars of your data. For smaller data sets you can use smarter dimension reduction algorithms like NMF, sparse dictionary learning, manifold learning or ICA. For something larger you can fall back to SVD and for huge data sets you can still do random projections.Choosing the regression algorithm largely depend on the type of nonlinearity you are expecting.

answered Dec 23 '12 at 19:54

Daniel%20Mahler's gravatar image

Daniel Mahler
122631322

There are many different kinds of nonlinear regression. Fitting a low-order polynomial can be very easy (for example, by using a polynomia-kernel support vector regressor), as is fitting a logistic regression model, or a lognormal regression model (which is linear in a transformed space). As the capacity of your nonlinear function increases the problem becomes harder, and specially for nonconvex functions like those used in neural networks it can be impossible to find the global optimum, while still being often possible to find good predictors.

answered Dec 20 '12 at 09:02

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.