Question about kmeans centroid

hi, i have a quick question about kmeans.
i randomly generated 1,000 number in the range of (0,1) and clustered them into 20.
however, i found the mean of each cluster is slightly different from their centroid. Why? By definition, they should be the same, right?
thanks.

 Accepted Answer

Star Strider
Star Strider on 20 Jul 2012
Edited: Star Strider on 20 Jul 2012
I wouldn't expect them to be the same. The mean is a probability measure (the ‘expected value’ of the set) and is a linear function of the individual probabilities of the members of the set. The centroid minimizes the Euclidean (or other metric) distance between itself and the members of the set, and is not specifically a probability measure.
The ‘cityblock’ metric might approximate the mean, but there is no reason to expect any metric based on a quadratic or other nonlinear function to do so.

More Answers (2)

Rebecca, are you seeing something like this?
>> x = rand(1000,1);
>> [idx,c] = kmeans(x,20);
>> c2 = grpstats(x,idx,@mean);
>> c - c2
ans =
0
0
-1.38777878078145e-17
0
0
0
0
0
0
-1.38777878078145e-17
0
0
0
-2.77555756156289e-17
0
0
-5.55111512312578e-17
0
0
0
That is to be expected, the differences are due to different rounding errors. Consider this:
>> x = rand(1000,1);
>> ( sum(x) - sum(x(randperm(length(x)))) ) / sum(x)
ans =
-7.87959181618481e-16
which is because the sums are in different order. Same idea.
If you're seeing something else, you;ll have to provide more info. Hope this helps.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!