question about k means clustering

How can we figure out a data set using all columns of a dataset with k=2 means clustering? Data set is here: https://archive.ics.uci.edu/ml/machine-learning-databases/hepatitis/

7 Comments

What is the problem? Issue with dataset or k-means?
Note: if you want help, then you need to make it easy to be helped.
when i add two columns of dataset it works,but when i try to add all columns it doesn't work.
load hepatitis;
X=hepatitis(:,16:17);
figure;
plot(X,'k*');
title 'Hepatitis Data';
hold on;
opts = statset('Display','final');
[idx,C] = kmeans(X,2,'Distance','sqeuclidean',...
'Replicates',5,'Options',opts);
I saved "hepatitis.data" at that web site and it didn't work
load('hepatitis.data')
X=hepatitis(:,16:17);
figure;
plot(X,'k*');
title 'Hepatitis Data';
hold on;
opts = statset('Display','final');
[idx,C] = kmeans(X,2,'Distance','sqeuclidean',...
'Replicates',5,'Options',opts);
Please post the actual data file and code that actually works with it.
Doesn't run. load doesn't work. You're not making it easy for us, are you? I'll try to fix it. In the meantime, edit yoru post and format your code as code by highlighting and clicking the code icon.
Come on Eeengineer. Please don't waste my time when I try to help you. I used xlsread() instead of load() and that got the data in, but there is no 17th column. Please fix or post your actual code. I'm going to do other stuff now and I'll check back later.
clear all;
close all;
clc;
format long g;
format compact;
fontSize = 15;
fprintf('Beginning to run %s.m ...\n', mfilename);
hepatitis = xlsread('hepatitis.xlsx')
X = hepatitis(:,16:17)
plot(X,'k*');
title 'Hepatitis Data';
hold on;
idx=kmeans(X,2);
opts = statset('Display','final');
[idx,C] = kmeans(X,2,'Distance','sqeuclidean',...
'Replicates',5,'Options',opts);
figure;
plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12)
hold on
plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12)
plot(C(:,1),C(:,2),'kx',...
'MarkerSize',15,'LineWidth',3)
legend('Cluster 1','Cluster 2','Centroids',...
'Location','NW')
title 'Cluster Assignments and Centroids'
hold off
Only columns 2 and 15 look like there is any real data in them. The rest of the columns just have 1, 2, or nan in them. Which columns do you want to take as "observations"? Are all of them observations, or just the columns 2 and 15?
If I scatter columns 1 and 2 and 15, I see this:
hepatitis = xlsread('hepatitis.xlsx')
x = hepatitis(:,1);
y = hepatitis(:, 2);
z = hepatitis(:, 15);
scatter3(x, y, z, 'Filled');
title('Hepatitis Data', 'FontSize', 20);
xlabel('Column 1', 'FontSize', 20);
ylabel('Column 2', 'FontSize', 20);
zlabel('Column 15', 'FontSize', 20);
So where are the clusters? If you're going to include columns 1 and 3-14, and 16 in the observations, then the clusters might be dominated by what's in those columns since they're very discrete - either 1 or 2. Looking at just columns 2 and 15, it doesn't look like there are any meaningful clusters.
i tried to use the first plot into the second plot using all values.As you said,it is about the columns data(1 and 2).ı want to use all values of first plot to second plot.I will do your advice thank you.

Sign in to comment.

Answers (2)

Eeengineer
Eeengineer on 3 Jan 2021
sorry and thanks for your help.here it is
Eeengineer
Eeengineer on 3 Jan 2021
i used 16 and 17 as an example you can use other columns also.The main problem is to use all columns using by k means clustering

Asked:

on 2 Jan 2021

Commented:

on 3 Jan 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!