# Fast Euclidean Distance Calculation

41 views (last 30 days)
Moreno, M. on 17 Mar 2022
Commented: Moreno, M. on 18 Mar 2022
Hi, I hope you are well.
I am trying to compute euclidean distances in 2D or 3D (for the sake of the question, I will refer to 2D for ease of computation) in the fastest way possible in MATLAB. I am aware that if comparing distances between 3 points the following algebrae can be done, as well as L1-norm filters:
• d1 > d2 is equal to d1^2 > d2^2
• d1^2 > d2^2 is equal to (A - B)^2 > (A - C)^2 is equal to (B^2 - 2AB) > (C^2 - 2AC).
However, the speed of computations of these expressions failed to match the traditional d = sqrt(sum(t .* t, 2)), where t = B - A. I have calculated the distance between elements of two arrays X = rand(1e6, 2), Y = rand(1e6, 2) in different ways to illustrate this:
1. L1 distance as the absolute difference between coordinates
2. Quasi-euclidean distance (octagonal distance): 0.941246 * max(abs(X), abs(Y)) + 0.414214 * min(abs(X), abs(Y))
3. Expanded expression as sqrt(X .* X + Y .* Y - 2 * X .* Y)
4. Same as 3. but with bsxfun
5. Traditional distance sqrt(sum((X - Y) .^ 2, 2))
X = rand(1e6, 2);
Y = rand(1e6, 2);
% L1 distance
tic
t = abs(X - Y);
d1 = sum(t, 2);
toc
% Quasi-Euclidean distance
tic
t = abs(X - Y);
a = t(:, 1);
b = t(:, 2);
i = a > b;
d2 = i .* (0.941246 * a + 0.414214 * b) + ~i .* (0.941246 * b + ...
0.414214 * a);
toc
% Expanded binomial distance
tic
a = sum(X .* X, 2);
b = sum(Y .* Y, 2);
ab = 2 * sum(X .* Y, 2);
d3 = sqrt(a + b - ab);
toc
% Expanded binomial with bsxfun
tic
d4 = sqrt(bsxfun(@plus, sum(X .* X, 2), bsxfun(@minus, sum(Y .* Y, 2), ...
2 * sum(X .* Y, 2))));
toc
tic
t = X - Y;
d5 = sqrt(sum(t .* t, 2));
toc
As a result, I am getting the following elapsed times for the L1, quasi-euclidean, expanded binomial, bsxfun and traditional L2 distance calculation methods:
Elapsed time is 0.004421 seconds.
Elapsed time is 0.009535 seconds.
Elapsed time is 0.021147 seconds.
Elapsed time is 0.019462 seconds.
Elapsed time is 0.009760 seconds.
It was surprising that, after simplifying the calculations via algebra or quasi-euclidean norms, no significant speed gains are achieved. It is true that the L1 distance is faster than other cases, but the error between its result and the actual L2 distance is too great (1x - 2x %). For example, the speed comparison between the quasi-euclidean distance (with no square roots or matrix multiplication) and the traditional L2 distance is negligible.
The speed gain between bsxfun and 'manually' introducing the array terms is also small. The square distance in the traditional calculation way (5) has the following elapsed time for the previous X and Y matrices:
Elapsed time is 0.008278 seconds.
Which comes accross shocking to me as it seems even faster than the quasi-euclidean method, and I thought square roots were incredibly costly as they are typically solved iteratively or as 1/sqrt(x).
Is there any way to speed up this distance calculation in MATLAB (accessible to the users)? There must be, as calling functions that involve distance calculation or comparison such as knnsearch result in faster computing times than the traditional computation of the square distance (sometimes, even twice as fast, in an already optimised KNN code).
As a note, I also tried gpuArrays but the elapsed times were greater than the ones I am showing here, as shown here:
Thank you in advance for the help and regards.
##### 2 CommentsShowHide 1 older comment
Moreno, M. on 17 Mar 2022
Thank you Matt, it makes sense that it is a memory-related problem and the optimisation work must come from allocation strategies as well as from the intrinsic distance calculation. I appreciate the reply and the help. I will look for workarounds in those lines, but it is beyond my knowledge at the moment.
Regards,
Moreno, M.

Jan on 17 Mar 2022
Edited: Jan on 18 Mar 2022
There is no reason to call bsxfun, if the arrays have the same sizes:
% d = sqrt(bsxfun(@plus, sum(X .* X, 2), ...
% bsxfun(@minus, sum(Y .* Y, 2), 2 * sum(X .* Y, 2))))
% d = sqrt(sum(X .* X, 2) - sum(Y .* Y, 2) + 2 * sum(X .* Y, 2))
Both do exactly the same, but bsxfun has a certain overhead.
Creating large arrays in the RAM is the bottleneck of many functions. Although sqrt() is slow (even if it is not solved iteratively in modern CPUs), waiting for the data to be delivered from the slow RAM takes more time. In your case:
X = rand(1e6, 2);
Y = rand(1e6, 2);
tic
for kk = 1:100 % Repeat this to reduce noise of the timing
t = X - Y;
d5 = sqrt(sum(t .* t, 2));
end
toc
tic
for kk = 1:100
d6 = zeros(1e6, 1);
for k = 1:1e6
t1 = X(k, 1) - Y(k, 1);
t2 = X(k, 2) - Y(k, 2);
d6(k) = sqrt(t1*t1 + t2*t2);
end
end
toc
% Matlab 2018b, i7-3770:
% Elapsed time is 1.645735 seconds.
% Elapsed time is 1.197466 seconds. !!!
% In Matlab online the loop is slower than the vectorized code.
The loop is faster, because Matlab have to access the three used arrays X, Y and d6 locally only. In oppsite to this, sqrt(sum(t .* t, 2)) creates the complete t .* t at first and throws it from the cache to the slow RAM. Then sum() is fed with the complete result, and throws the result again to the RAM. Then sqrt() requests the complete arrays again and (you know it already) pushs the result back to the RAM again.
The loop imports the data from the RAM in blocks of the cache line size (64 Byte usually) processes them and if they are thrown back to the RAM they are not touched anymore.
Highly optimzed functions consider the cache sizes of the CPU and the limited bandwidth to the RAM .
Moreno, M. on 18 Mar 2022
Hi Jan, thank you for this.
That is a very good suggestion, I will start there.
Regards,
Moreno, M.

Matt J on 17 Mar 2022
Edited: Matt J on 17 Mar 2022
vecnorm(X-Y,2,2);
Matt J on 17 Mar 2022
So it would surprise me that it has faster computing times than the traditional element-wise calculation.
And yet it does...