|
What's good (persuasive) way of evaluating features other than based on classification rate? This also has to do with model selection. For example, if I have choice of using features f1, f2, f3, ..., where f1 and f2 are features based on Autoencoder and f3 is based on ICA. |
|
It seems like you're looking for a way of evaluating cluster quality across different feature spaces without reference to an external label. I may be wrong, but I think this is impossible. There are two main styles of evaluating cluster quality. External validation measures come in two styles: 1) use a label and compare the correspondence between cluster identity and the gold-standard label. (There are a lot of ways to do this, but I would recommend b-cubed, v-measure, or adjusted mutual information.) 2) embed your clustering task in a broader application and see how the clustering performance impacts the performance on the larger application. Both of these are tractable if you're willing to go that way. The other approach is an internal validation measure, this includes Silhouette and inter-/intra-cluster distance ratios and loads other. Each of these measures are based on distances in your feature space. If you want to compare across feature spaces, you would want some guarantees that the spaces are really comparable -- it would be easy to accidentally introduce significant bias. (Imagine if your features were not orthogonal...). A general purpose feature-space-independent measure is, i believe, impossible. i'd love to be wrong about this though. Thank you for your comment. I wasn't actually thinking in terms of clustering, but using some latent features to represent the data like ICA. And I am wondering how to measure the quality of latent features.
(Jul 09 '14 at 15:07)
Dannnnn
|