K-mean for Wine data set

Question

Ganesh on 27 Aug 2013

1
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/85698-k-mean-for-wine-data-set

Answered: Paul Munro on 21 Feb 2023

Hi,

I performed a K-mean algorithm command on the wine data set from UCI respiratory. This dataset contains chemical analysis of 178 wines, derived from three different cultivars. Wine type is based on 13 continuous features.

Here's the command load 'wine_data.txt';

[IDX,C,sumd,D] = kmeans(wine_data,3,... 'start','sample',... 'Replicates',100,... 'maxiter',1000, 'display','final');

The final Best total sum of distances is 2.37069e+06. This result is way far from the reported K-means solution from the literature, which is aournd 18,061. Is the K-mean solution of Matlab stuck in local minima? Please advice. Thanks.

1 Comment
Show -1 older commentsHide -1 older comments

the cyclist on 27 Aug 2013

For anyone who is interested in helping out on this one, the data set is here: http://archive.ics.uci.edu/ml/datasets/Wine

Sign in to comment.

Sign in to answer this question.

Answer 1

Shashank Prasanna on 27 Aug 2013

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/85698-k-mean-for-wine-data-set#answer_95218

Ganesh, what distance metric does the 'literature' use?

The kmeans default is 'sqEuclidean'. You have to make sure you are comparing the same metric. Try changing it to cityblock or any of the other options:

http://www.mathworks.com/help/stats/kmeans.html

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Ganesh on 27 Aug 2013

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/85698-k-mean-for-wine-data-set#answer_95235

Thanks for the reply Shashank The literature used 'sqEuclidean' and so did I.

1 Comment
Show -1 older commentsHide -1 older comments

tryhard on 29 Aug 2013

Could you post a link to the relevant article. I get the same result you do. It seems like they might have performed pre-processing on the data of some sort.

Sign in to comment.

Answer 3

gheorghe gardu on 1 Nov 2015

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/85698-k-mean-for-wine-data-set#answer_198056

I would like to ask if you could post the Matlab code that you have used ? I would like to thank you in advance.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 4

Paul Munro on 21 Feb 2023

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/85698-k-mean-for-wine-data-set#answer_1176875

The large distance sum you report makes me think that you did not rescale the data. Variable 13 is in the thousands and will overwhelm the effect of the other variables. You will probably get better results if you rescale the variables separately (Z scoring for example).

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

K-mean for Wine data set

1 Comment
Show -1 older commentsHide -1 older comments

Answers (4)

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

K-mean for Wine data set

1 Comment Show -1 older commentsHide -1 older comments

Answers (4)

0 Comments Show -2 older commentsHide -2 older comments

1 Comment Show -1 older commentsHide -1 older comments

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments