How do you calculate the minimum circle within a cluster set of data points to potentially identify the ones that are most similar?
17 views (last 30 days)
Show older comments
I have made a PCA plot and I would like to calculate the minimum circle within a cluster set of data points to potentially identify the ones that are most similar, how can I do that?
0 Comments
Accepted Answer
John D'Errico
on 14 Feb 2023
Let me give an example.
XY = [randn(20,2)*1 + [-5,5];randn(10,2)*1.5 + [3 6];randn(10,2)*1 + [-7 -4]];
plot(XY(:,1),XY(:,2),'o')
How would you draw circles around what appear to be three groups of points here? As I have said many times, I would jsut use kmeans. I know there are three groups. This is a serious problem in any clustering tool, since you need to know how many clusters there are. Lacking that, the problem becomes MUCH more difficult. Again, these are things the human eye/brain combination does very naturally.
clustidx = kmeans(XY,3);
C = [1 0 0;0 1 0;0 0 1];
scatter(XY(:,1),XY(:,2),[],C(clustidx,:))
So k-means has done very nicely. But it really does need to know the number of clusters to search for.
And, now we can draw bounding circles.
hold on
t = linspace(0,2*pi);
C = zeros(3,2);
R = zeros(3,1);
colors = 'rgb';
for i = 1:3
ind = find(clustidx == i);
[C(i,:),R(i)] = minboundcircle(XY(ind,1),XY(ind,2));
plot(C(i,1) + R(i)*cos(t),C(i,2) + R(i)*sin(t),['-',colors(i)])
end
hold off
Now, could I have done the clustering in a different way? Possibly. One idea might be to use graph theoretic tools in MATLAB.
For example, on the same set of data, start with tihe set of interpoint distances.
D = squareform(pdist(XY));
Now, if there are truly 3 distinct clusters, then roughly 66% of the distances between points will be between points from different clusters. So I'll zero out 75% of the distances.
D(D > prctile(D(:),25)) = 0;
Next, create a graph of what remains.
GD = graph(D)
plot(GD)
The conncomp function will identify the sets of points that are connected to each other.
conncomp(GD)
As you can see, it did indeed identify three main clusters of points, though one point was disconnected from the rest, effectively seen as an outlier. Regardless, I could now have built the same plot as above, using these connected components.
2 Comments
Laura
on 22 Feb 2023
Sorry I am curious if this code worked as depicted since I have a similar question. Is the output accurate? Thank you from Australia!
More Answers (2)
Image Analyst
on 13 Feb 2023
Edited: Josh Natanson
on 14 Feb 2023
You have the classes to which the cluster is assigned. But I'm not sure, for a particular cluster, what "the minimum circle within a cluster set of data points" means. Do you want to find the pair of points in each cluster that are closest to each other? Do you want to find the minimum containing/bounding circle for each cluster? Do you want the size of each cluster's bounding circle? Please clarify. How many PCs do you have? How many clusters do you have?
See
16 Comments
John D'Errico
on 14 Feb 2023
Edited: John D'Errico
on 14 Feb 2023
How many times will you ask the same question? You keep on talking about a minimal circle. Sometimes it is to draw a minimal circle. And sometimes you seem to be confusing different algorithms. (Why I pretty much gave up on responding to your posts. Sorry, but I did.) This time it is to use the minimal circle to identify a cluster of similar points, but you don't say how the cluster would be identified.
The problem is, your question does not seem to understand there are several issues here. If you have a cluster of points, you can trivially find the minimal bounding circle. But a mimimal bounding circle algorithm is not a clustering tool. So you cannot use that bounding circle code to find a cluster of points that you have not first identified.
I really am sorry, but it just seems as if you want to do something, but have no idea what is involved in doing what you want. So you just keep on asking the same question over and over again.
So, you have some points (currently in 2-d, the result of a PCA.) They may be pretty sparse, at least from the pictures drawn, which makes EVERYTHING more difficult. And what your eye sees are several groups of points. What you want is some automatic code that will draw a circle around the various clusters that your eye sees. This seems to be the gist of the last 3 or more questions that you have asked. The problem is, you need to use the tools we have been suggesting, at least if you want to do this in a way that is automatic.
First, perform a clustering analysis. There are MANY clustering algorithms available, but kmeans has some of the most commonly used tools.
Or, you could simply choose the clusters for any similar group of points yourself. For example, I have posted code on the file exchange (selectdata) that will allow a user to select groups of points using a lasso. Just draw a curve around the points in one group, and it will tell you which points they were. Or you can tell the code to use a circular shape, then you click on the figure, and expand that circle until it contains the points you want. The idea is, you surely know what you want to see. So do it using a mouse. But if not, then you NEED TO USE A CLUSTERING TOOL. A minimal bounding circle is not a tool for clustering.
Can I explain why a minimal bounding circle does not work as a tool for clustering? That might be a good question. The point is a bounding circle is almost always controlled by 3 points, and sometimes rarely only 2 points. If you drop one or more of those points, the minimal bounding circle changes dramatically. This makes the bounding circle rather an unstable thing, since it does not depend on any of the other data points inside the circle. And that means you cannot use it to perform clustering. You need to use some other scheme first. PLEASE READ AND UNDERSTAND THIS LAST PARAGRAPH. A bounding circle is not in itself a clustering tool, nor can it be made to work as one. At least not well.
Once you have the points identified in any one similar cluster, then use a tool as I have shown to find the circle and then draw it. I don't see the problem, or why this requires so many repeated questions on the topic.
Again, break down the problem into smaller ones. First you need to use clustering though. We dont know enoiugh about your problem to know which clustering scheme might be best, but very likely just selecting the points with a mouse might be the best.
2 Comments
John D'Errico
on 14 Feb 2023
But again, you cannot do that in any stable way. A minimal circle is a poor thing to identify a cluster of points, because that minimal circle is ALWAYS based on only a few points in any set. You really don't want to do exactly what it is you think you want to do. For example:
xy = randn(20,2);
[C,R] = minboundcircle(xy(:,1),xy(:,2))
t = linspace(0,2*pi,100);
plot(xy(:,1),xy(:,2),'bo',R*cos(t) + C(1), R*sin(t) + C(2),'-r')
axis equal
Do you see that only 3 points ALWAYS control the position and size of the circle? In some cases, only 2 points may control the circle. And if you were to delete one or more of those points, then the circle changes in a very unstable, totally unpredictable way. The rest of the points inside that circle have no impact at all on the circle itself.
You cannot use a minimal bounding circle to perform clustering. You CAN use it AFTER you have performed the clustering. But not to do the clustering itself. And I know that when you look at a set of points in a picture, you see a circle that contains them. So you think the circle can be made to do what you want. But your eye/brain combination can do many things that are exceedingly difficult to do using a computer, and to do it robustly.
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!