Fast Euclidean Distance Calculation
41 views (last 30 days)
Show older comments
Hi, I hope you are well.
I am trying to compute euclidean distances in 2D or 3D (for the sake of the question, I will refer to 2D for ease of computation) in the fastest way possible in MATLAB. I am aware that if comparing distances between 3 points the following algebrae can be done, as well as L1-norm filters:
- d1 > d2 is equal to d1^2 > d2^2
- d1^2 > d2^2 is equal to (A - B)^2 > (A - C)^2 is equal to (B^2 - 2AB) > (C^2 - 2AC).
However, the speed of computations of these expressions failed to match the traditional d = sqrt(sum(t .* t, 2)), where t = B - A. I have calculated the distance between elements of two arrays X = rand(1e6, 2), Y = rand(1e6, 2) in different ways to illustrate this:
- L1 distance as the absolute difference between coordinates
- Quasi-euclidean distance (octagonal distance): 0.941246 * max(abs(X), abs(Y)) + 0.414214 * min(abs(X), abs(Y))
- Expanded expression as sqrt(X .* X + Y .* Y - 2 * X .* Y)
- Same as 3. but with bsxfun
- Traditional distance sqrt(sum((X - Y) .^ 2, 2))
X = rand(1e6, 2);
Y = rand(1e6, 2);
% L1 distance
tic
t = abs(X - Y);
d1 = sum(t, 2);
toc
% Quasi-Euclidean distance
tic
t = abs(X - Y);
a = t(:, 1);
b = t(:, 2);
i = a > b;
d2 = i .* (0.941246 * a + 0.414214 * b) + ~i .* (0.941246 * b + ...
0.414214 * a);
toc
% Expanded binomial distance
tic
a = sum(X .* X, 2);
b = sum(Y .* Y, 2);
ab = 2 * sum(X .* Y, 2);
d3 = sqrt(a + b - ab);
toc
% Expanded binomial with bsxfun
tic
d4 = sqrt(bsxfun(@plus, sum(X .* X, 2), bsxfun(@minus, sum(Y .* Y, 2), ...
2 * sum(X .* Y, 2))));
toc
% Traditional distance
tic
t = X - Y;
d5 = sqrt(sum(t .* t, 2));
toc
As a result, I am getting the following elapsed times for the L1, quasi-euclidean, expanded binomial, bsxfun and traditional L2 distance calculation methods:
Elapsed time is 0.004421 seconds.
Elapsed time is 0.009535 seconds.
Elapsed time is 0.021147 seconds.
Elapsed time is 0.019462 seconds.
Elapsed time is 0.009760 seconds.
It was surprising that, after simplifying the calculations via algebra or quasi-euclidean norms, no significant speed gains are achieved. It is true that the L1 distance is faster than other cases, but the error between its result and the actual L2 distance is too great (1x - 2x %). For example, the speed comparison between the quasi-euclidean distance (with no square roots or matrix multiplication) and the traditional L2 distance is negligible.
The speed gain between bsxfun and 'manually' introducing the array terms is also small. The square distance in the traditional calculation way (5) has the following elapsed time for the previous X and Y matrices:
Elapsed time is 0.008278 seconds.
Which comes accross shocking to me as it seems even faster than the quasi-euclidean method, and I thought square roots were incredibly costly as they are typically solved iteratively or as 1/sqrt(x).
Is there any way to speed up this distance calculation in MATLAB (accessible to the users)? There must be, as calling functions that involve distance calculation or comparison such as knnsearch result in faster computing times than the traditional computation of the square distance (sometimes, even twice as fast, in an already optimised KNN code).
As a note, I also tried gpuArrays but the elapsed times were greater than the ones I am showing here, as shown here:
Thank you in advance for the help and regards.
Accepted Answer
Jan
on 17 Mar 2022
Edited: Jan
on 18 Mar 2022
There is no reason to call bsxfun, if the arrays have the same sizes:
% d = sqrt(bsxfun(@plus, sum(X .* X, 2), ...
% bsxfun(@minus, sum(Y .* Y, 2), 2 * sum(X .* Y, 2))))
% d = sqrt(sum(X .* X, 2) - sum(Y .* Y, 2) + 2 * sum(X .* Y, 2))
Both do exactly the same, but bsxfun has a certain overhead.
Creating large arrays in the RAM is the bottleneck of many functions. Although sqrt() is slow (even if it is not solved iteratively in modern CPUs), waiting for the data to be delivered from the slow RAM takes more time. In your case:
X = rand(1e6, 2);
Y = rand(1e6, 2);
tic
for kk = 1:100 % Repeat this to reduce noise of the timing
t = X - Y;
d5 = sqrt(sum(t .* t, 2));
end
toc
tic
for kk = 1:100
d6 = zeros(1e6, 1);
for k = 1:1e6
t1 = X(k, 1) - Y(k, 1);
t2 = X(k, 2) - Y(k, 2);
d6(k) = sqrt(t1*t1 + t2*t2);
end
end
toc
% Matlab 2018b, i7-3770:
% Elapsed time is 1.645735 seconds.
% Elapsed time is 1.197466 seconds. !!!
% In Matlab online the loop is slower than the vectorized code.
The loop is faster, because Matlab have to access the three used arrays X, Y and d6 locally only. In oppsite to this, sqrt(sum(t .* t, 2)) creates the complete t .* t at first and throws it from the cache to the slow RAM. Then sum() is fed with the complete result, and throws the result again to the RAM. Then sqrt() requests the complete arrays again and (you know it already) pushs the result back to the RAM again.
The loop imports the data from the RAM in blocks of the cache line size (64 Byte usually) processes them and if they are thrown back to the RAM they are not touched anymore.
Highly optimzed functions consider the cache sizes of the CPU and the limited bandwidth to the RAM .
See Also
Categories
Find more on Problem-Based Optimization Setup in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!