|
In typical gene expression datasets we are often time provided with an additional confidence metric of the "intensity" of each pixel of the data. This information could be used to throw out data points or features. After that one may do the classification/clustering. Apart from just a simple majority vote system. What would be the methods that come to mind while performing such data set reduction. Could I use sparse learning? |
|
Bayesian methods can handle this sort of information. Essentially, extract a probability distribution from this confidence measure (could be something as simple as a univariate gaussian per pixel) and sample from your data distribution every time you sample from your model (assuming inference with MCMC, for other methods your mileage may vary). |