To simplify my question, I create a dummy problem here: I have two sets of training data that are labelled with 1 and 2 respectively. Both training datasets assumed to follow mixture of Gaussian distribution. I can easily use Matlab toolbox function (gmdistribution.fit) to estimate their mean and covariance.

Then I have some testing dataset that assumed to be created with an MoG similar to training dataset 2, but with noise. I would like to calculate something like a likelihood probability that my testing dataset is more likely to be generated using the MoG of training dataset 2. In other words, I would like to get the likelihood of my testing dataset to have the label 2.

Could you please point a direction how to do this? Thanks very much.

N.B.:

  1. The sizes of my two training datasets are different.
  2. The distributions of the two training datasets are overlapped.
  3. The size of the testing dataset is much smaller than the training datasets.

Some Matlab codes:

%% Mixture of Gassian 1 (Training set 1)
mean1                                   = [1 -2];
cov1                                    = [2 0; 0 .5];
mean2                                   = [0.5 -5];
cov2                                    = [1 0; 0 1];
trainingDataset1                        = [mvnrnd(mean1, cov1, 1000); mvnrnd(mean2, cov2, 1000)];

MoGOptions                              = statset('Display', 'final');
MoGObj1                                 = gmdistribution.fit(trainingDataset1, 2, 'Options', MoGOptions);

figure,
scatter(trainingDataset1(:,1), trainingDataset1(:,2), 10, '.')
hold on
ezcontour(@(x,y)pdf(MoGObj1,[x y]), [-8 6], [-8 2]);

%% Mixture of Gassian 2 (Training set 2)
mean4                                   = [0.5 -1];
cov4                                    = [1.5 0; 0 .8];
mean5                                   = [-2 -3];
cov5                                    = [1 0; 0 1];
mean6                                   = [-4 -2];
cov6                                    = [1 0; 0 1];
trainingDataset2                        = [mvnrnd(mean4, cov4, 500); mvnrnd(mean5, cov5, 500); mvnrnd(mean6, cov6, 500)];

MoGOptions                              = statset('Display', 'final');
MoGObj2                                 = gmdistribution.fit(trainingDataset2, 2, 'Options', MoGOptions);

figure,
scatter(trainingDataset2(:,1), trainingDataset2(:,2), 10, '.')
hold on
ezcontour(@(x,y)pdf(MoGObj2,[x y]), [-8 6], [-8 2]);

%% Test set
mean7                                   = [1.1 -2.1];
cov7                                    = [2.2 0; 0 .4];
mean8                                   = [0.3 -5.4];
cov8                                    = [1.2 0; 0 1.1];
testingDataset1                         = [mvnrnd(mean7, cov7, 100); mvnrnd(mean8, cov8, 100)];

figure,
scatter(testingDataset1(:,1), testingDataset1(:,2), 10, '.')

asked Feb 25 '14 at 10:57

Samo%20Jerom's gravatar image

Samo Jerom
1111

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.