kmeans clustering of matrices

7 views (last 30 days)

Susan on 4 Jun 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/848085-kmeans-clustering-of-matrices

Commented: Susan on 7 Jun 2021

Hi All,

I have 12X190 cells. Each cell contains a complex matrix of size n*550 (assuming each row is an observation on 550 variables. The number of observations varies cell to cell but the variables are the same for each matrix). I need to classify these matrices using kmeans and I am trying to cluster the large matrix (i.e., 12*190*n*550 and I am not working with each matrix separately).

Any idea how I can do that? Any method better than kmeans to cluster these data? Any input would be appreciated.

11 Comments
Show 9 older commentsHide 9 older comments

Susan on 4 Jun 2021

Edited: Susan on 4 Jun 2021

Thank you so much for your response.

You're right. "I just want to make one large (but 2-dimensional) array, by concatenating each individual matrix such that I have a (sum of all the individual n values from the 12-by-190 smaller matrices)-by-550 matrix, with lots and lots of observations, but still 550 features".

As you mentioned, I want to cluster matrices not observations. For each matrix in the 12-by-190 = 2280 cell array I have one label from 0 to 10. (in most examples that I've seen so far usually we have a lable which is assigned to either a scalar or a vector, but here a lable is assigned to a matrix). Each of these cell arrays is the output of one expriement and we got bunch of them by changing some parameters, so I think we can consider them somehow as observations. So I have 2280 cells each contains a n*p matrix and a 1-by-2280 vector which contains the label.

My aim is to see if the matrices with the same label can be clustered to gether or not.

And later, when I have an unseen input matrix (n-by-550) I can find which cluster this matrix is belonged to and somehow predict the label.

Moreover, I'm interested in figuring out which of these features p are more impactful and which one I can get rid off.

Please let me know if you need more detailts to be able to help. Thanks!

the cyclist on 4 Jun 2021

Wish you had mentioned the labels earlier. :-)

OK, so each matrix is the result of an experiment. And each experiment results in n measurements of 550 features. (The value of n can vary for each experiment.) Each experiment also results in a label.

Then, given a new matrix (with unknown label), you want to assign the correct label.

The major stumbling block (at least in my mind) here is that your measured variables are features of the observations, not of the matrices. If you want to predict the label of an unseen matrix, you need features of the matrices. Presumably you can build features of the matrices from the features of the observations, but I'm not sure how that would work. (Specifically, I don't see how k-means helps.)

I think I would try to simplify this, to really sort out the specifics of how to do this. For example:

imagine you have the same n for all matrices (and imagine it is small, like 5)
instead of 550 feature, suppose you only have 3
instead of 12x190 matrices, just fix that number to something like 10
instead of 11 labels, maybe just 2 or three

Then really think through what you really mean by "some matrices are more similar to each other, and therefore should have the same label". That thinking might help you see the proper mathematical method for getting there.

Image Analyst on 5 Jun 2021

OK, so you're just going to consider the real part of the complex numbers. So, how many clusters do you believe there to be? What did you put in for k (if you put in anything)? Do you think there are 3 clusters? 6? 100? Or no idea?

https://www.mathworks.com/discovery/machine-learning-models.html?s_eid=psm_dl&source=15308

Susan on 5 Jun 2021

@Image Analyst There would be 19 cluster

Accepted Answer

Walter Roberson on 5 Jun 2021

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/848085-kmeans-clustering-of-matrices#answer_717020

k-means is not the right technology for situations in which you have labels, except for the situation in which the labels have numeric values that can be made commensurate with the numberic coordinates. For example if you can say that having a label differ by no more than 1 is 10.28 times as important as having column 3 differ by 1, then you might be able to use k-means by adding the numeric value of the label as an additional coordinate. But this is not the usual case.

When you have matrices of numbers and a label associated with the matrix, then Deep Learning or (Shallow) Neural Network techniques are more appropriate. Consider that if you have a matrix of data and a label, and the matrices are all the same size, that that situation could be treated the same was as if the matrix of data were an "image"

5 Comments
Show 3 older commentsHide 3 older comments

Walter Roberson on 7 Jun 2021

Yes! This is expected, and is a fundamental challenge of this kind of learning: to determine the best subset of data to train on for the highest accuracy and lowest over-training.

k-fold cross validation is indeed one of the techniques that is used. It will reduce the variation you see, but do expect that there will still be some variation depending on the random choice.

Susan on 7 Jun 2021

Thanks!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

kmeans clustering of matrices

11 Comments
Show 9 older commentsHide 9 older comments

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

kmeans clustering of matrices

11 Comments Show 9 older commentsHide 9 older comments

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

11 Comments
Show 9 older commentsHide 9 older comments

5 Comments
Show 3 older commentsHide 3 older comments