Array Row Similarity/Comparison

12 views (last 30 days)
I want to compare rows of two arrays to see which rows are most similar to one another, sort of like clustering. To be clear, I don't want to compare the differences between the numbers, rather the entire row as a whole. Another thing I would like to be able to do is see if particular numbers in the SLP variable occur more often when a number shows up in the same index in the 500z variable. Both variables have 20 columns and 49 rows, but in the following example variables I only included 3 rows and 5 cols.
SLP = [1,3,4,2,3
4,7,6,5,6
1,4,3,3,2]
500z= [9,6,7,6,6
7,5,7,6,8
9,7,6,6,6 ]
An example output I would like is: 1.) A measure of row similarity (perhaps a percentage of similarity or even a cluster number): The most similar rows in SLP are rows 1 and 3: therefore an example output could be a 3x3 matrix (SLP rows 1-3 going down and 500z rows 1-3 going across) with the percentage of similarity between each row. Or it could be in the form of a cluster. Ex: rows 1 and 3 belonging to cluster 1 and row 2 belonging to cluster 2. 2.) Which numbers occur most frequently in the same index between the two variables. So looking at the sample variables, I would get SLP 1 tends to occur with 500z 9. SLP 3 tends to occur with 500z 6, SLP 4 tends to occur with 500z 7, and so on. This could be output as simply as an array where column 1 is the SLP pair and column 2 is the 500z pair. It would be great to be able to have a column 3 as well saying how often the pair occurred.
I've been stuck on this for a while so any help or suggestions of how to best approach this problem would be awesome! I am also fairly new to Matlab, so my wording may not be the best.

Accepted Answer

Walter Roberson
Walter Roberson on 17 Nov 2016
One of the techniques for similarity is sum-of-squares-of-differences between the rows.
It so happens that square root of sum-of-squares-of-differences is equivalent to Euclidean distance. Therefore you can find a similarity measure by using pdist() between the rows.
  1 Comment
Tyler Smith
Tyler Smith on 17 Nov 2016
Thanks, that definitely helps! I ended up using the pdist2(SLP, SLP) to get the matrix I wanted in the first question.

Sign in to comment.

More Answers (0)

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!