In typical gene expression datasets we are often time provided with an additional confidence metric of the "intensity" of each pixel of the data. This information could be used to throw out data points or features. After that one may do the classification/clustering.

Apart from just a simple majority vote system. What would be the methods that come to mind while performing such data set reduction. Could I use sparse learning?

asked Sep 16 '10 at 18:55

kpx's gravatar image

kpx
541182636


One Answer:

Bayesian methods can handle this sort of information. Essentially, extract a probability distribution from this confidence measure (could be something as simple as a univariate gaussian per pixel) and sample from your data distribution every time you sample from your model (assuming inference with MCMC, for other methods your mileage may vary).

answered Sep 16 '10 at 19:08

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.