|
As I see things, nonlinear regression (i.e. real attributes (X) and a real prediction variable (Y) ) becomes very difficult in data of dimensions higher than say 40 or something. The reason why I think this are:
This is a guess from me personally and I would like to hear if my thought make sense to you or whether they're wrong. |
|
I would agree with you except for your statement that it is nonlinear structure of the output (Y). I would say it is (non) linear structure of the input (X) , and thats why people do PCA ICA etc. as discussed by daniel. In particular, since any distribution of data looks sparse in a high dimensional space ( your point 1) and therefore looks like nonlinear structure (point 2) then it often makes sense to remove this by dimensionality reduction. eg PCA is used when we know that the "signal" is correlated across lots of different input variables ( eg in image processing). It makes less sense for say measurements in a scientific experiment where by design you measure things efficiently rather than redundantly. |
|
A lot depends on the distribution of your data in the high dimensional space. One method that can work in a number of situations is to do some for of dimension reduction first & do the nonlinear regression in the reduced space. Choosing the dimention reduction algorithm & regression algorithm depends on particulars of your data. For smaller data sets you can use smarter dimension reduction algorithms like NMF, sparse dictionary learning, manifold learning or ICA. For something larger you can fall back to SVD and for huge data sets you can still do random projections.Choosing the regression algorithm largely depend on the type of nonlinearity you are expecting. |
|
There are many different kinds of nonlinear regression. Fitting a low-order polynomial can be very easy (for example, by using a polynomia-kernel support vector regressor), as is fitting a logistic regression model, or a lognormal regression model (which is linear in a transformed space). As the capacity of your nonlinear function increases the problem becomes harder, and specially for nonconvex functions like those used in neural networks it can be impossible to find the global optimum, while still being often possible to find good predictors. |