Massive time required for pdist

5 views (last 30 days)
Hello,
I am using the Matlab function pdist to calculate the distance between two points. However, I noticed that the function needs a lot of time, despite it is using all four cores. I build this example to demonstrate the massive time comsumption. If I calculate the distance between two points with my own code, it is much faster. The example calculates the distance between a thousand points.
clear
close all
clc
tic
j=1;
X = rand(1000,2);
Y = rand(1000,2);
fprintf('Time for array creation: ');
toc
tic
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
fprintf('Time for own distance calculation: ');
toc
j = 1;
tic
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean');
j = j+1;
end
end
fprintf('Time for distance calculation using Matlab function pdist: ');
toc
Output:
Time for array creation: Elapsed time is 0.000386 seconds.
Time for own distance calculation: Elapsed time is 0.251026 seconds.
Time for distance calculation using Matlab function pdist: Elapsed time is 10.776532 seconds.
You can clearly see, that the Matlab function pdist takes over 10 seconds longer.
My question is: Why? What else is this function doing?
Would be nice to know.
Thank you very much
Kind regards,
Sebastian

Accepted Answer

Chunru
Chunru on 4 Oct 2021
Edited: Chunru on 4 Oct 2021
%tic
X = rand(1000,2);
Y = rand(1000,2);
% fprintf('Time for array creation: ');
%toc
%% Version 1
tic
j=1;
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
size(A)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for own distance calculation: %.6f\n', t);
Time for own distance calculation: 0.307268
%% Version 1.1
% Pre-allocate A
tic
j=1;
A = inf(size(X,1)*size(Y,1), 1);
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
size(A)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for own distance calculation with preallocation: %.6f\n', t);
Time for own distance calculation with preallocation: 0.112437
%% Version 2
tic
j=1;
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean'); % one pair
j = j+1;
end
end
size(B)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for distance calculation using Matlab function pdist: %.6f\n', t);
Time for distance calculation using Matlab function pdist: 15.181589
%% Version 2.1
% Pre-allocate B before hand
tic
j=1;
B = inf(size(X,1)*size(Y,1), 1);
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean');
j = j+1;
end
end
size(B)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for distance calculation using Matlab function pdist: %.6f\n', t);
Time for distance calculation using Matlab function pdist: 12.980660
%% Version 3
% pdist of many points (this compute distance x2-x1, x3-x1, ... x1000-x1,
% y1-x1, ..., y10001; x3-x2, ..., x1000-x2, ..., y1000-x2 etc
% doc pdist
tic
p = pdist([X; Y]); % dist
size(p)
ans = 1×2
1 1999000
t = toc;
fprintf('Time for distance calculation using Matlab function pdist (many points): %.6f\n', t);
Time for distance calculation using Matlab function pdist (many points): 0.016222
  1 Comment
Sebastian Stumpf
Sebastian Stumpf on 6 Oct 2021
Thank you for your detailed answer. It looks like I didn't use the function very efficently.
Kind regards

Sign in to comment.

More Answers (0)

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Tags

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!