I want to partition some 2d points into 2 groups (clustering). The way that I need to do it is by using PCA to find the first principle component. Then I project the data to find 1d projections. Then I find the middle point on the principal component and partition the 2d points based on the indexes of point on each side of the middle point.

How can I find the line which separates the data into 2 partitions in 2D space? in 1d projected space the partition line is the one perpendicular to the first principal component that passes through the middle point.

Is the line in 2d space unique? Is it possible to compute the best line if it is not unique?

Note: I am actually doing hierarchical clustering with first component of PCA. I want to check the performance of this method against k-d tree.

asked Apr 18 at 23:44

Afsh's gravatar image

Afsh
1445


One Answer:

I used PCA to represent 2d in 1d then I found the median of the points and partitioned the data based on that point in 1d (if the median was like 0.5 all the points less than 0.5 are in partition1 and all the points more than or equal to 0.5 are in partition2).

To find the line I use the scikit-learn's pca.component_ property and set the slope of the partitioning line to slope = -pca_component[0,1] / pca_component[0,0]

We have the slope and a point (median point's index) so we can pass a partitioning line between points in 2d.

answered Apr 21 at 19:52

Afsh's gravatar image

Afsh
1445

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.