# large numbers in K means

3 views (last 30 days)
Tino on 15 Jun 2019
Commented: John D'Errico on 15 Jun 2019
I am trying to find the kmeans of 1 column with 1048575 numbers rows
and I am getting the error below
Error using kmeans (line 277)
X must have more rows than the number of clusters.
Error in svia (line 18)
[idx, C, miro , D] = kmeans(bibi, K);
Can anyone assist me in finding a solution for this.
Tino

Guillaume on 15 Jun 2019
What's unclear about the error? You must specify less cluster than the number of rows in your matrix/vector. So make sure that K is less than size(bibi, 1) (or numel(bibi) if a vector).

Walter Roberson on 15 Jun 2019
It would be a problem for kmeans if bibi is a row vector instead of a column vector.
Guillaume on 15 Jun 2019
Not according to the doc: "If X is a numeric vector, then kmeans treats it as an n-by-1 data matrix, regardless of its orientation."

John D'Errico on 15 Jun 2019
Edited: John D'Errico on 15 Jun 2019
x = rand(1048575,1);
size(x)
ans =
1048575 1
[IDX, C] = kmeans(x, 10);
C
Warning: Failed to converge in 100 iterations.
> In kmeans/loopBody (line 479)
In internal.stats.parallel.smartForReduce (line 136)
In kmeans (line 343)
C =
0.74809
0.049586
0.94968
0.2486
0.54773
0.64779
0.34794
0.14901
0.44755
0.84915
It works fine, athough It seems starved for iterations. But that is trivially solved.
However, if I make a mistake, and pass in a ROW vector instead, then I get the same error that you did.
[IDX, C] = kmeans(x', 10);
Error using kmeans (line 277)
X must have more rows than the number of clusters.

Guillaume on 15 Jun 2019
That's countrary to the documentation which says: "If X is a numeric vector, then kmeans treats it as an n-by-1 data matrix, regardless of its orientation".
I don't have the required toolbox but if that's not what happens then it should be reported as a bug.
John D'Errico on 15 Jun 2019
That is indeed the statement, in R2019a. So it is indeed a bug, since the behavior runs contrary to the documentation.