K-means clustering giving different results

Question

Dorotea on 16 Nov 2017

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/367432-k-means-clustering-giving-different-results

Answered: john kozicz on 21 Feb 2019

I tried implementing Lloyd's algorithm and it seemed good until I ran it multiple times. Sometimes it gives the results I want, sometimes it gives strange centres. I tried to change the condition so it stops when it has converged, but it doesn't help. Sorry for not translating comments to English, I hope it's clear enough.

I can't see the problem. Can you give me an idea what might be the problem from the result figures?

This is my code:
 %A is a matrix whose rows are my points.
  % initialization of centroids; further-first 
  n=size(A,1);
  dim=size(A,2);
  centri=zeros(k,dim); %matrix of centroids
  for i=1:n
      centri(1,:)=centri(1,:)+A(i,:);
  end
  centri(1,:)=centri(1,:)/n;
  for j=2:k %u svakom koraku postavljamo za centar onu tocku koja je najdalje od centra 1,..j-1
      maks=zeros(1,n);
      %maks(i) je najveca udaljenost te tocke do centra =max d(x(i),c), c centri
      for i=1:n
          dist=zeros(1,j-1);
          for l=1:j-1
              dist(l)=norm(A(i,:)-centri(l,:));
          end
          if(size(dist,2)==1) maks(i)=dist;
          else
              maks(i)=max(dist);
          end
          %maks(i)=0;
          %for l=1:j-1
            %  if(maks(i)<dist(l)) maks(i)=dist(l);
             % end
          %end
      end
      [maksi, ind]=max(maks);
      centri(j,:)=A(ind(1),:);
  end
    indeksi=zeros(1,n);
    for i=1:n 
        indeksi(i)=randi(k,1);
    end  
    % u centrima je postavljena pocetna inicijalizacija
    br_iter=0;
    tic
    while br_iter<=1000
        br_iter=br_iter+1;
        for i=1:n
            dist=zeros(1,k); % udaljenosti od tocke x do centra j
            for j=1:k
                dist(j)=norm(A(i,:)-centri(j,:));
            end
            [mini, ind]=min(dist); % ind je vektor za koji se poprima minimalna vrijednost
            indeksi(i)=ind(1); % uzmemo prvi po redu
        end
        % sad radimo nove centroide koji su aritmetička sredina svih vektora koji mu pripadaju
        for j=1:k
            centri(j,:)=zeros(1,dim);
            brojac=0;
            for i=1:n
                if indeksi(i)==j 
                    centri(j,:)=centri(j,:)+A(i,:);
                    brojac=brojac+1;
                end
            end
            if brojac 
                centri(j,:)=centri(j,:)/brojac;
            else
                ind=randi(n, 1);
                centri(j,:)=A(ind,:);
            end
        end
    end 
    toc
    for i=1:n
        plot(A(i,1), A(i,2), '.b');
       if(i==1) hold on;
       end
    end
    for i=1:k
        plot(centri(i,1), centri(i,2), '+r');
    end
    hold off

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

the cyclist on 16 Nov 2017

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/367432-k-means-clustering-giving-different-results#answer_291543

Edited: the cyclist on 16 Nov 2017

K-means clustering does involve a random selection process for the initial centroid guesses, so you may get different results from different runs. Sometimes those results are not great; it is the nature of the algorithm, which is not perfect.

There is an input parameter "Start" that you can use to set fixed initial centroid positions when you call kmeans, and then you will get repeatable results.

2 Comments
Show NoneHide None

Dorotea on 16 Nov 2017

But my initial centroids should always be the same. There's no randomness in the further-first method of initialization. The only randomness I have in my code is when a cluster empties, I replace it with a random point. If that's the problem, what else could be done in case of empty clusters?

the cyclist on 16 Nov 2017

Oh, sorry! I assumed you were using the built-in kmeans function. (This is pretty common question about kmeans here.)

Sign in to comment.

Answer 2

john kozicz on 21 Feb 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/367432-k-means-clustering-giving-different-results#answer_362111

Open in MATLAB Online

i know that this comment / response is a long time after the question was asked.

for reproducability check the documentation for rng() and randi().

if you want reproducible results from randi reset the random number generator state before calling the function

rng('default');     %% set random number generator to default ie seed 0, generator mersenne twister
s = rng;            %% save random number generator state for subsequent use 
 
A = randi(10,3,3)   %% generate 3 x 3 matrix of random integers between 1 and 10

A =

9 10 3

10 7 6

2 1 10

A = randi(10,3,3) %% note different result

A =

10 10 2

2 5 5

10 9 10

rng(s)            %% reset random number generator state
A = randi(10,3,3) %% note same result as initial call

A =

9 10 3

10 7 6

2 1 10

hope this helps

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

K-means clustering giving different results

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

K-means clustering giving different results

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments Show NoneHide None

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments