# How to compare columns of a matrix with a column vector and check for similarity?

80 views (last 30 days)
Liam Holbeche-Smith on 17 Mar 2021
Answered: Vineet Joshi on 22 Mar 2021
Good Afternoon,
I have a matrix, M, containing 29 rows and 92 columns of data. The matrix contains numeric values ranging from 1 to 8. I would like to be able to compare each column of the matrix, M, with a column vector, V and identify which columns of the matrix M are most similar to the vector V.
For example if the first element in vector V is 1, then a similar column in matrix M would contain the value 1 in the first row. Likewise if the second element in V is 6, then a similar result in M would contain 6. The goal is to identify the column(s) of M most similar to the vector V and if possible, rank the columns from most similar to least similar.
Had the matrix been smaller I would have been able to do this by hand, however I wondered if there was a way to automate this process in MATLAB to save time and possibly handle larger data sets.
Liam Holbeche-Smith on 17 Mar 2021
Hi Steven,
For the type of analysis I am performing, similarity is somewhat ambiguous. I will do my best to explain below:
% Example
A = [1 2 3 4 5]';
% Consider vectors representing columns of a matrix
B = [1 2 3 4 5]' % Clearly the best match, 5 out of 5 matches in the same position
C = [0 2 3 4 5]' % Also 'similar' with 4 out of 5 matches in the same position
D = [2 3 4 5 0]' % 4 out of 5 matches, ie the same sequence, however in a different postion
The data I am working with is infact categorical weather data from a postprocessing output. Each element corresponds to a categorical weather pattern. The columns of the matrix are the output year and the rows of the matrix are the category that occured on that day.
What I am trying to acheive is to find sequences of categories that are the most similar. In this respect, the sequence is more important than the position, however in the instance above, B is clearly the best match to A, then I would consider C to be the next best match followed by D.
I do not expect to have any results like B, this is extremely unlikely. It is more likely that I will find instances of C and D.
I hope this helps.

Vineet Joshi on 22 Mar 2021
As per my understanding of the question, you aim to automate the process of sorting the columns of a matrix based on their similarity with a column vector, where similarity is defined by comparing the sequences of elements.
This can be done as follows:
1. Check for exact element wise match of the column with the reference vector.
2. If there is no exact match from the above, shift the array elements using circshift and check for element wise match.
You can associate a value with each column to help you in sorting.
You can refer the following code for reference.
%Sample Data Matrix
Data = [1,2,3,4,5;0,2,3,4,5;2,3,4,5,0;7,3,4,5,0;3,4,5,1,2;7,8,9,10,11]';
%Reference Vector
A = [1,2,3,4,5]';
%Loop through each column in the matrix.
for i=1:6
%Compare elementwise matching score between vector i and reference
%vector.
Value = sum(Data(:,i) == A);
%If no elemnts match. Try circular shift operation to find matching
%sequences.
if Value == 0
%In case two sequences exits, we need to find the maximum one.
max_val = 0;
%Apply the operation until vector is back to original.
for j=1:5
%Rotate the vector by j elements.
Rotated_vec = circshift(Data(:,i),j);
%Check for maximum elemntwise match.
max_val = max(max_val,sum(Rotated_vec == A));
end
%Subtract a small amount as a penality for rotation.
Value = max_val - 1;
end
fprintf('Column = %d, Value = %d\n',i,Value)
end
%Output
Column = 1, Value = 5
Column = 2, Value = 4
Column = 3, Value = 3
Column = 4, Value = 2
Column = 5, Value = 4
Column = 6, Value = -1

R2020b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!