7 views (last 30 days)

Of course, a great plus of matlab is to be able to do a matrix multiplication without writing 3 loops, and with lots of syntax sugar (for example, imagine having to write C = A.mult(B) each time). Moreover, it will be fast because the interpreter just reads 3 symbols (namely: A*B) and turn to assembly or whatever low level code matlab uses there.

However, what I often have is that I have a set of data points of multiple dimensions, for example 3, and another set of data points, again with for example 3 dimensions. Then I want to compare each data point of one set to another using a distance measure. If my distance measure is a dot product, then of course a matrix product would suffice:

A : m x 3 matrix

B : 3 x n matrix

C=A*B, then C(i,j) is the distance between data point i in set A and data point j in set B.

But what if my distance measure is euclidean? Then I would like to be able to do the same. However, whenever the * operation does a multiplication, it should subtract and square, and whenever the * operation does a sum, I also want it to do a sum. C=A*B would create a simular matrix as above, yet using euclidean distance.

Of course, I can simulate this writing a function, maybe one forloop and some matrix magic. However, I want (1) syntax sugar and (2) fast c/assembly implementation. Moreover, I think such a use case is actually quite realistic. I often want to find for each x out of M data points to which of some other reference database of N data points, x lies closest (using euclidean distance). For example in k-means when you try to decide to wich centroid each data point belongs.

So, does this exist? Would this be a nice feature? Or am I the only one who sees this as a 'nice to have'? Should I write a c-file for this and mex it?

Matt J
on 7 Jun 2013

Edited: Matt J
on 7 Jun 2013

You can get something like what you're after using this

and using my INTERDISTS utiltity below. For example

>> A=rand(2,3), B=rand(3,2),

A =

0.9448 0.4893 0.9001

0.4909 0.3377 0.3692

B =

0.1112 0.2417

0.7803 0.4039

0.3897 0.0965

>> Ops.mtimes=@(a,b)interdists(a.',b);

>> A=DataObj(A,'Ops',Ops); B=DataObj(B,'Ops',Ops);

>> Euclidean = A*B

Euclidean =

1.0198 1.0712

0.5834 0.3753

My overall experience though is that syntactic sugar is rarely ever worth it. Better just to use the interdists function directly.

function Graph=interdists(A,B)

%Finds the graph of distances between point coordinates

%

% (1) Graph=interdists(A,B)

%

% in:

%

% A: matrix whose columns are coordinates of points, for example

% [[x1;y1;z1], [x2;y2;z2] ,..., [xM;yM;zM]]

% but the columns may be points in a space of any dimension, not just 3D.

%

% B: A second matrix whose columns are coordinates of points in the same

% Euclidean space. Default B=A.

%

%

% out:

%

% Graph: The MxN matrix of separation distances in l2 norm between the coordinates.

% Namely, Graph(i,j) will be the distance between A(:,i) and B(:,j).

%

%

% (2) interdists(A,'noself') is the same as interdists(A), except the output

% diagonals will be NaN instead of zero. Hence, for example, operations

% like min(interdists(A,'noself')) will ignore self-distances.

%

% See also getgraph

noself=false;

if nargin<2

B=A;

elseif ischar(B)&&strcmpi(B,'noself')

noself=true;

B=A;

end

N=size(A,1);

B=reshape(B,N,1,[]);

Graph=l2norm(bsxfun(@minus, A, B),1);

Graph=squeeze(Graph);

if noself

n=length(Graph);

Graph(linspace(1,n^2,n))=nan;

end

Matt J
on 7 Jun 2013

Btw, I don't think it's a good idea to actually 'overwrite' the mtimes operations, as I can not imagine matlab without the original matrix multiplicaiton

Ah, well, the original mtimes isn't gone. You just have to convert back to the ordinary matrix data type to use it,

>> Euclidean = A*B

Euclidean =

0.6025 0.6298

0.7382 0.2923

>> DotProduct= double(A)*double(B)

DotProduct =

0.5023 0.7829

0.6347 1.1619

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 0 Comments

Sign in to comment.