I am a newbie for machine learning. I was trying to train a model of the execution profile of a parallel program on a small-scale system (say, my dual-core laptop), then test the accuracy of the model with the execution profile of the same program running on a large-scale system (e.g. a 128-node cluster). Obviously, the features I can get from the small system would be different from the ones from the large system. Suppose the same set of latent variables control the two sets of features as they come from the same program. What method would be a good fit for this problem?

I would be more specific in the following. Please ignore if you aren't interested. Just give your advice based on the above would still be very helpful. Thank you.

The aim is to detect anomalies in the volume of inter-node communications. In case you are not familiar with parallel computer, just think it as a network of N computers collaborating on computation and talking to each other when necessary. The feature I use to model the communication behavior is the amount of messages sent from a node to other nodes in the system. For example, in a 16-node system, every node is represented by a 16-component vector which consists of the number of messages the node sends to each other nodes (including 0 message to itself). This is why I have different feature spaces for different systems. A node in 16-node system is represented by 16-component vector while a node in 128-node system by 128-component vector. Now I want to train a model of communication volume and use it to predict when there is abnormal communication behaviors. My current approach is to first use k-means to cluster the nodes into groups, then for each group I use PCA to derive the normal range of communications based of squared prediction error (the Q statistic). So if any node's communication volume goes beyond that limit on squared error, it would be predicted abnormal.

My question is how to derive a model (e.g. PCA) in a small system and test it in a large system of different feature spaces?

Thanks for your patience and advice!



asked Nov 18 '10 at 01:30

bwzhou's gravatar image


You can't generalize to a different feature space unless your features are already structured, as in a CRF or M3N. If you some, but not all, features in common, this is a domain adaptation problem, and your PCA technique sounds a lot like some work by John Blitzer.

(Nov 18 '10 at 04:22) Alexandre Passos ♦

One Answer:

As you say that both set of features (train and test) have the same underlying latent variables then I think one way would be to train a factor analysis model (with the same number of factors, say K) separately on both the training and the test data. Once you have the new representations of the training and test data (in terms of the K latent factors), you can learn a model from the training data and apply it on the test data.

Edit: As Alexandre pointed out, the factor analysis approach I suggested above actually wouldn't do the right thing in this case due to the identifiability issue in factor analysis. One hack that you might try is to cluster the test data features (128 in number) into 16 clusters (i.e., the number of features in the training data). Then pick each cluster center as a feature for the test data which would give a new feature representation for the test data. Another possibility could be to use something like the weakly paired maximum covariance analysis on the training and the test data which is a multimodal dimensionality reduction technique.. kind of like canonical correlation analysis (CCA) but does not require matchings between pair of examples in the two datasets (and the number of examples could be different in both datasets, unlike CCA).

answered Nov 18 '10 at 04:48

spinxl39's gravatar image


edited Nov 18 '10 at 12:13


But what guarantees that the factors found by factor analysis (which are sort-of-unidentifiable I think) will be the same in both settings?

I think the asker should consider extracting some meaningful or invariant features, otherwise learning doesn't make any sense.

(Nov 18 '10 at 07:30) Alexandre Passos ♦

I think you are right. Identifiability indeed would be an issue in this case. So doing factor analysis wouldn't be right here. Maybe then some domain adaptation method would be needed.

(Nov 18 '10 at 11:50) spinxl39

If the features are completely non-overlapping even domain adaptation shouldn't work. I guess the route to go should be better feature extraction, even though this is not exactly encouraging.

(Nov 18 '10 at 12:01) Alexandre Passos ♦

Yeah, maybe. I think the weakly paired maximum covariance analysis technique from the paper I've linked above could be useful in this case.

(Nov 18 '10 at 12:14) spinxl39
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.