|
Apologies for a generic question. I have this simulation data, which I want to model. The data dimensions are 6x2001. Basically, a six-dimensional input vector, and correspoding to which I have a 2001 dimensional output. The first thing that I have done is "mapped" the output to a 50-dimensional output using interpolation. So, now I want to model this 6x50 data set. What meta-model should I start with? The output vector has some degree of correlation -- each dimension has a perfect correlation with up to 3 neighborhood dimensions on each side. There are so many options to try, but I am at a loss of what would give me the best idea as to what regression model to use for the data at hand. Any generic advice will be much appreciated. EDIT: Here are a couple of plots showing graphically the output space: |
|
Have a look at Weka for already-implemented regression algorithms. Also, chapter 3 "Linear Methods for Regression" from this book presents some approaches you can use; they have R implementations, so there is no need to waste precious time in order to implement them. If you want to test the MLP approach, the Extreme Machine Learning approach gives you a high training speed and a generalization power, due to the minimal value of weights' norm (see theoretical explanations). Recall that there is no free lunch, so you might have to consider some different approaches. Then share your experience with us :). Edit:.. after preprocessing data as @Andreas Mueller said :) Sorry for hijacking this thread. I only skimmed the paper very quickly, but isn't this EML-approach just linear regression after a randomized non-linear mapping of the input? You construct random combinations of input features, add non-linearity, and then view it as a linear regression problem. To me, what would be interesting is to which function classes such random feature generation would be beneficial and to which it would be detrimental. Are you aware of any work in this area?
(Feb 20 '11 at 10:03)
Oscar Täckström
@Oscar: excellent remark! However, the main merit of the paper is that it goes beyond the ubiquitous backpropagation. I do not know which type of problems would benefit from EML. The cited article does not show a better MSE result for triazines and Auto price, which has largest number of (input) features. But using this approach in an ensemble learning might help, due to the high training speed claimed by the authors. As you mentioned "randomized non-linear mapping of the input", using ensemble methods for EML would be similar to Breiman's random forest approach (they select random features, while here you create new random ones - almost the same). Ensemble methods are found to be quite productive.
(Feb 20 '11 at 10:52)
Lucian Sasu
|
|
Hi. Before doing anything else, I would try to reduce the output space further. How do you mean you used interpolation? I don't see how interpolation can be used to do dimensionality reduction. I would do a PCA of the output to see how big the output space actually is. Maybe it would help to know what kind of data this is. To me this seems a little weird. Cheers, Andy Hello Andreas: Thanks very much for your comments. I shall try to answer all your meta-queries:
Thanks in advance for any suggestions/help.
(Feb 20 '11 at 20:49)
Amit Saha
For future analysis, that is no Interpolation, thare is called sampling
(Feb 20 '11 at 22:02)
Leon Palafox ♦
1
Sorry, I am still not really with you. This is probably some misunderstanding on my part. But first you talk about an output dimensionality of 2001, then you say something about 2001 points. Also you say you data matrix is 6x2001. I can't get that all together. If you data-matrix it 6x2001, this means you have 2001 data points of dimensionality 6. If this includes the target, this means 5 input dimensions and one output dimension, if the target is separate, it means 6 input dimensions. If you say the input is 6 dimensional and the output 2001 dimensional, that would mean your regression function goes from R^6 -> R^2001, which is what I understood first. The data-matrix would then be (6+2001)xN where N is the number of samples. Your plots show 2 different functions R^1 -> R^1. I don't understand what they mean. Are these the targets as functions of single input dimensions or something? Sorry if these are stupid questions but as I said, I haven't really got my head around your problem.
(Feb 21 '11 at 11:06)
Andreas Mueller
|
|
If you want to use some cool interpolation algorithm I might recommend Rasmussen's Book on Gaussian Processes, in the first chapter he gives a really good example to do bayesian interpolation, and he also explains how it works. If you have it, try to go over the BIshop's book on Machine Learning, he also discusses regression, and the Andrew Ng lectures on machine Learning also discuss Regression in the first 2 or 3 sessions. |