K-means clustering giving different results
82 views (last 30 days)
Show older comments
I tried implementing Lloyd's algorithm and it seemed good until I ran it multiple times. Sometimes it gives the results I want, sometimes it gives strange centres. I tried to change the condition so it stops when it has converged, but it doesn't help. Sorry for not translating comments to English, I hope it's clear enough.
I can't see the problem. Can you give me an idea what might be the problem from the result figures?
This is my code:
%A is a matrix whose rows are my points.
% initialization of centroids; further-first
n=size(A,1);
dim=size(A,2);
centri=zeros(k,dim); %matrix of centroids
for i=1:n
centri(1,:)=centri(1,:)+A(i,:);
end
centri(1,:)=centri(1,:)/n;
for j=2:k %u svakom koraku postavljamo za centar onu tocku koja je najdalje od centra 1,..j-1
maks=zeros(1,n);
%maks(i) je najveca udaljenost te tocke do centra =max d(x(i),c), c centri
for i=1:n
dist=zeros(1,j-1);
for l=1:j-1
dist(l)=norm(A(i,:)-centri(l,:));
end
if(size(dist,2)==1) maks(i)=dist;
else
maks(i)=max(dist);
end
%maks(i)=0;
%for l=1:j-1
% if(maks(i)<dist(l)) maks(i)=dist(l);
% end
%end
end
[maksi, ind]=max(maks);
centri(j,:)=A(ind(1),:);
end
indeksi=zeros(1,n);
for i=1:n
indeksi(i)=randi(k,1);
end
% u centrima je postavljena pocetna inicijalizacija
br_iter=0;
tic
while br_iter<=1000
br_iter=br_iter+1;
for i=1:n
dist=zeros(1,k); % udaljenosti od tocke x do centra j
for j=1:k
dist(j)=norm(A(i,:)-centri(j,:));
end
[mini, ind]=min(dist); % ind je vektor za koji se poprima minimalna vrijednost
indeksi(i)=ind(1); % uzmemo prvi po redu
end
% sad radimo nove centroide koji su aritmetička sredina svih vektora koji mu pripadaju
for j=1:k
centri(j,:)=zeros(1,dim);
brojac=0;
for i=1:n
if indeksi(i)==j
centri(j,:)=centri(j,:)+A(i,:);
brojac=brojac+1;
end
end
if brojac
centri(j,:)=centri(j,:)/brojac;
else
ind=randi(n, 1);
centri(j,:)=A(ind,:);
end
end
end
toc
for i=1:n
plot(A(i,1), A(i,2), '.b');
if(i==1) hold on;
end
end
for i=1:k
plot(centri(i,1), centri(i,2), '+r');
end
hold off
0 Comments
Answers (2)
the cyclist
on 16 Nov 2017
Edited: the cyclist
on 16 Nov 2017
K-means clustering does involve a random selection process for the initial centroid guesses, so you may get different results from different runs. Sometimes those results are not great; it is the nature of the algorithm, which is not perfect.
There is an input parameter "Start" that you can use to set fixed initial centroid positions when you call kmeans, and then you will get repeatable results.
2 Comments
the cyclist
on 16 Nov 2017
Oh, sorry! I assumed you were using the built-in kmeans function. (This is pretty common question about kmeans here.)
john kozicz
on 21 Feb 2019
i know that this comment / response is a long time after the question was asked.
for reproducability check the documentation for rng() and randi().
if you want reproducible results from randi reset the random number generator state before calling the function
rng('default'); %% set random number generator to default ie seed 0, generator mersenne twister
s = rng; %% save random number generator state for subsequent use
A = randi(10,3,3) %% generate 3 x 3 matrix of random integers between 1 and 10
A =
9 10 3
10 7 6
2 1 10
A = randi(10,3,3) %% note different result
A =
10 10 2
2 5 5
10 9 10
rng(s) %% reset random number generator state
A = randi(10,3,3) %% note same result as initial call
A =
9 10 3
10 7 6
2 1 10
hope this helps
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!