K-means clustering giving different results

82 views (last 30 days)
Dorotea
Dorotea on 16 Nov 2017
Answered: john kozicz on 21 Feb 2019
I tried implementing Lloyd's algorithm and it seemed good until I ran it multiple times. Sometimes it gives the results I want, sometimes it gives strange centres. I tried to change the condition so it stops when it has converged, but it doesn't help. Sorry for not translating comments to English, I hope it's clear enough.
I can't see the problem. Can you give me an idea what might be the problem from the result figures?
This is my code:
%A is a matrix whose rows are my points.
% initialization of centroids; further-first
n=size(A,1);
dim=size(A,2);
centri=zeros(k,dim); %matrix of centroids
for i=1:n
centri(1,:)=centri(1,:)+A(i,:);
end
centri(1,:)=centri(1,:)/n;
for j=2:k %u svakom koraku postavljamo za centar onu tocku koja je najdalje od centra 1,..j-1
maks=zeros(1,n);
%maks(i) je najveca udaljenost te tocke do centra =max d(x(i),c), c centri
for i=1:n
dist=zeros(1,j-1);
for l=1:j-1
dist(l)=norm(A(i,:)-centri(l,:));
end
if(size(dist,2)==1) maks(i)=dist;
else
maks(i)=max(dist);
end
%maks(i)=0;
%for l=1:j-1
% if(maks(i)<dist(l)) maks(i)=dist(l);
% end
%end
end
[maksi, ind]=max(maks);
centri(j,:)=A(ind(1),:);
end
indeksi=zeros(1,n);
for i=1:n
indeksi(i)=randi(k,1);
end
% u centrima je postavljena pocetna inicijalizacija
br_iter=0;
tic
while br_iter<=1000
br_iter=br_iter+1;
for i=1:n
dist=zeros(1,k); % udaljenosti od tocke x do centra j
for j=1:k
dist(j)=norm(A(i,:)-centri(j,:));
end
[mini, ind]=min(dist); % ind je vektor za koji se poprima minimalna vrijednost
indeksi(i)=ind(1); % uzmemo prvi po redu
end
% sad radimo nove centroide koji su aritmetička sredina svih vektora koji mu pripadaju
for j=1:k
centri(j,:)=zeros(1,dim);
brojac=0;
for i=1:n
if indeksi(i)==j
centri(j,:)=centri(j,:)+A(i,:);
brojac=brojac+1;
end
end
if brojac
centri(j,:)=centri(j,:)/brojac;
else
ind=randi(n, 1);
centri(j,:)=A(ind,:);
end
end
end
toc
for i=1:n
plot(A(i,1), A(i,2), '.b');
if(i==1) hold on;
end
end
for i=1:k
plot(centri(i,1), centri(i,2), '+r');
end
hold off

Answers (2)

the cyclist
the cyclist on 16 Nov 2017
Edited: the cyclist on 16 Nov 2017
K-means clustering does involve a random selection process for the initial centroid guesses, so you may get different results from different runs. Sometimes those results are not great; it is the nature of the algorithm, which is not perfect.
There is an input parameter "Start" that you can use to set fixed initial centroid positions when you call kmeans, and then you will get repeatable results.
  2 Comments
Dorotea
Dorotea on 16 Nov 2017
But my initial centroids should always be the same. There's no randomness in the further-first method of initialization. The only randomness I have in my code is when a cluster empties, I replace it with a random point. If that's the problem, what else could be done in case of empty clusters?
the cyclist
the cyclist on 16 Nov 2017
Oh, sorry! I assumed you were using the built-in kmeans function. (This is pretty common question about kmeans here.)

Sign in to comment.


john kozicz
john kozicz on 21 Feb 2019
i know that this comment / response is a long time after the question was asked.
for reproducability check the documentation for rng() and randi().
if you want reproducible results from randi reset the random number generator state before calling the function
rng('default'); %% set random number generator to default ie seed 0, generator mersenne twister
s = rng; %% save random number generator state for subsequent use
A = randi(10,3,3) %% generate 3 x 3 matrix of random integers between 1 and 10
A =
9 10 3
10 7 6
2 1 10
A = randi(10,3,3) %% note different result
A =
10 10 2
2 5 5
10 9 10
rng(s) %% reset random number generator state
A = randi(10,3,3) %% note same result as initial call
A =
9 10 3
10 7 6
2 1 10
hope this helps

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!