|
Hi, So we have the current setting, a stream of non-stationary data, which is locally unbounded in the capture data set. However we do know some global bounds that the dataset has. The data cannot exceed certain value, but that is a really large value that is not very frequent. We are going to apply online processing, so I was wondering what are the best practices when it comes to normalization, scaling, whitening, etc. Since most of these tools require that you have the full set of data, I imagine that operating over small buffers should be the norm, but I'm not so sure. Also, if this is not possible, it would mean that the learning algorithms would have to be used on unnormalized data, and that is a bit dicey even for simple regressions. Any suggestions would be great |
|
Instead of using blocks, I would compute an exponentially decayed mean with the recursive formula, and similarly for the variance. There are also on-line sequential estimators for quantiles, I think the original author's name is "Tierney", search for that. |
You can always take a small sample of the data, compute the mean and covariance in that sample, and whiten all future data with respect to that.
Since your data is non-stationary do this every once in a while.
I think you have to clarify what you mean by non stationary, and then everything else will fall into place...(blocks exponentials etc) unless you are clear what regularities exist, how can you learn?
What do you mean by that, non stationarity has a very clear definition, and of course we are doing feature extraction to detect stationary features.
My question is not really related to that
if you don't suffer non stationarity, why bother?? Take a sample, calculate the expected values and they will keep constant...