How to compare columns of a matrix with a column vector and check for similarity?

Question

Liam Holbeche-Smith on 17 Mar 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/775647-how-to-compare-columns-of-a-matrix-with-a-column-vector-and-check-for-similarity

Answered: Vineet Joshi on 22 Mar 2021

Good Afternoon,

I have a matrix, M, containing 29 rows and 92 columns of data. The matrix contains numeric values ranging from 1 to 8. I would like to be able to compare each column of the matrix, M, with a column vector, V and identify which columns of the matrix M are most similar to the vector V.

For example if the first element in vector V is 1, then a similar column in matrix M would contain the value 1 in the first row. Likewise if the second element in V is 6, then a similar result in M would contain 6. The goal is to identify the column(s) of M most similar to the vector V and if possible, rank the columns from most similar to least similar.

Had the matrix been smaller I would have been able to do this by hand, however I wondered if there was a way to automate this process in MATLAB to save time and possibly handle larger data sets.

3 Comments
Show 1 older commentHide 1 older comment

Steven Lord on 17 Mar 2021

Open in MATLAB Online

Define "similar".

x = 1:5

x = 1×5

1 2 3 4 5

y = [1:4 10] % most elements equal to the corresponding elements in x

y = 1×5

1 2 3 4 10

z = x-1 % parallel to x

z = 1×5

0 1 2 3 4

w = x + (-0.5:0.25:0.5) % smallest sum of squares difference from x among (y, z, w)

w = 1×5

0.5000 1.7500 3.0000 4.2500 5.5000

plot(x, x, 'r-o', x, y, 'k:', x, z, 'c--', x, w, 'g.-')

Which of y, z, and w do you consider most "similar" to x and why?

Liam Holbeche-Smith on 17 Mar 2021

Open in MATLAB Online

Hi Steven,

For the type of analysis I am performing, similarity is somewhat ambiguous. I will do my best to explain below:

% Example
A = [1 2 3 4 5]';
% Consider vectors representing columns of a matrix
B = [1 2 3 4 5]' % Clearly the best match, 5 out of 5 matches in the same position
C = [0 2 3 4 5]' % Also 'similar' with 4 out of 5 matches in the same position
D = [2 3 4 5 0]' % 4 out of 5 matches, ie the same sequence, however in a different postion

The data I am working with is infact categorical weather data from a postprocessing output. Each element corresponds to a categorical weather pattern. The columns of the matrix are the output year and the rows of the matrix are the category that occured on that day.

What I am trying to acheive is to find sequences of categories that are the most similar. In this respect, the sequence is more important than the position, however in the instance above, B is clearly the best match to A, then I would consider C to be the next best match followed by D.

I do not expect to have any results like B, this is extremely unlikely. It is more likely that I will find instances of C and D.

I hope this helps.

Sign in to comment.

Sign in to answer this question.

Answer 1

Vineet Joshi on 22 Mar 2021

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/775647-how-to-compare-columns-of-a-matrix-with-a-column-vector-and-check-for-similarity#answer_654582

Open in MATLAB Online

As per my understanding of the question, you aim to automate the process of sorting the columns of a matrix based on their similarity with a column vector, where similarity is defined by comparing the sequences of elements.

This can be done as follows:

Check for exact element wise match of the column with the reference vector.
If there is no exact match from the above, shift the array elements using circshift and check for element wise match.

You can associate a value with each column to help you in sorting.

You can refer the following code for reference.

%Sample Data Matrix
Data = [1,2,3,4,5;0,2,3,4,5;2,3,4,5,0;7,3,4,5,0;3,4,5,1,2;7,8,9,10,11]';
%Reference Vector
A = [1,2,3,4,5]';
%Loop through each column in the matrix.
for i=1:6
    %Compare elementwise matching score between vector i and reference
    %vector.
    Value = sum(Data(:,i) == A);
   
    %If no elemnts match. Try circular shift operation to find matching
    %sequences. 
    if Value == 0
       
       %In case two sequences exits, we need to find the maximum one.
       max_val = 0; 
       
       %Apply the operation until vector is back to original. 
       for j=1:5
  
        %Rotate the vector by j elements.
        Rotated_vec = circshift(Data(:,i),j);
        
        %Check for maximum elemntwise match.
        max_val = max(max_val,sum(Rotated_vec == A));
       
       end 
       
       %Subtract a small amount as a penality for rotation.
       Value = max_val - 1;
    end
   fprintf('Column = %d, Value = %d\n',i,Value)
end
%Output
Column = 1, Value = 5
Column = 2, Value = 4
Column = 3, Value = 3
Column = 4, Value = 2
Column = 5, Value = 4
Column = 6, Value = -1

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to compare columns of a matrix with a column vector and check for similarity?

3 Comments
Show 1 older commentHide 1 older comment

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to compare columns of a matrix with a column vector and check for similarity?

3 Comments Show 1 older commentHide 1 older comment

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments