Performance decrease accessing elements >4-D matrix
2 views (last 30 days)
Show older comments
Introduction
I have an algorithm that loops billions (trillions) of times and manipulates a matrix stored in 7 dimensions [10x10x10x10x10x10x10], I found that accessing elements in a 7-dimensional matrix was quite slow, curious as I am I ran some tests to identify the performance of accessing elements of multi-dimensional matrices.
Hypothesis
I was reminded that MatLab uses linear indexing under the hood and a friend of mine stated that the performance penalty was probably due to the conversion of 'normal' indexing to linear indexing under the hood [Source][1].
Testing method
To test this hypothesis I tested accessing elements using linear indexing and regular indexing for 2D to 7D matrices. I varied the element I was accessing as well as the matrix size I was accessing, i.e. length of each dimension, but that didn't change the results significantly. The file I used for testing is found below. The hardware used is an Intel® Xeon® CPU E5-1620 v2 @ 3.70 GHz, 16GB RAM. The MatLab version is R2013B.
Results
Normal indexing 2D: 0.073659s
Linear indexing 2D: 0.064026s
Normal indexing 3D: 0.050719s
Linear indexing 3D: 0.064096s
Normal indexing 4D: 0.055674s
Linear indexing 4D: 0.062112s
Normal indexing 5D: 15.689907s
Linear indexing 5D: 5.265076s
Normal indexing 6D: 16.660496s
Linear indexing 6D: 5.295958s
Normal indexing 7D: 17.029072s
Linear indexing 7D: 5.291139s
It appears that normal indexing is very similar compared to linear indexing in terms of performance up to 4D matrices. For 3D and 4D matrices it appears slightly preferable to use normal indexing. Above 4D matrices linear indexing is highly preferable over normal indexing, but both normal indexing and regular indexing receive a huge performance penalty (~two orders of magnitude).
Question(s)
As in the title: what is the (potential) cause for this huge hit in performance when exceeding four dimensions in a matrix?
Code used for testing
clear all
clc
% Number if iterations
n=10000000;
A = rand(10,10);
k1=sub2ind(size(A),3,2);
fprintf('\n')
tic
for ii=1:n
A1=A(3,2);
end
a=toc;
fprintf('Normal indexing 2D: %fs\n',a)
tic
for ii=1:n
A2=A(k1);
end
a=toc;
fprintf('Linear indexing 2D: %fs\n',a)
B = rand(10,10,10);
k2=sub2ind(size(B),3,2,1);
tic
for ii=1:n
B1=B(3,2,1);
end
a=toc;
fprintf('Normal indexing 3D: %fs\n',a)
tic
for ii=1:n
B2=B(k2);
end
a=toc;
fprintf('Linear indexing 3D: %fs\n',a)
C = rand(10,10,10,10);
k3=sub2ind(size(C),3,2,1,10);
tic
for ii=1:n
C1=C(3,2,1,10);
end
a=toc;
fprintf('Normal indexing 4D: %fs\n',a)
tic
for ii=1:n
C2=C(k3);
end
a=toc;
fprintf('Linear indexing 4D: %fs\n',a)
D = rand(10,10,10,10,10);
k4=sub2ind(size(D),3,2,1,10,1);
tic
for ii=1:n
D1=D(3,2,1,10,1);
end
a=toc;
fprintf('Normal indexing 5D: %fs\n',a)
tic
for ii=1:n
D2=D(k4);
end
a=toc;
fprintf('Linear indexing 5D: %fs\n',a)
E = rand(10,10,10,10,10,10);
k5=sub2ind(size(E),3,2,1,10,1,2);
tic
for ii=1:n
E1=E(3,2,1,10,1,2);
end
a=toc;
fprintf('Normal indexing 6D: %fs\n',a)
tic
for ii=1:n
E2=E(k5);
end
a=toc;
fprintf('Linear indexing 6D: %fs\n',a)
F = rand(10,10,10,10,10,10,10);
k6=sub2ind(size(F),3,2,1,10,1,2,3);
tic
for ii=1:n
F1=F(3,2,1,10,1,2);
end
a=toc;
fprintf('Normal indexing 7D: %fs\n',a)
tic
for ii=1:n
F2=F(k6);
end
a=toc;
fprintf('Linear indexing 7D: %fs\n',a)
[1]: http://www.mathworks.nl/help/matlab/math/multidimensional-arrays.html#f1-86846
0 Comments
Answers (2)
dpb
on 25 Jul 2014
I didn't read your test code but...addressing will be fastest in the real code if order is sequential in column-major order...that is the first (leftmost) index incremented first and subsequent to the last (rightmost).
Particularly if arrays are large accessing elements in nonsequential order can cause cache misses and even swapping of memory if physical memory is full or nearly so. Coding carefully to ensure the most nearly linear addressing as possible can pay large dividends.
And, of course, rearranging the storage structure to use less complex schemes is another ploy... :)
2 Comments
dpb
on 25 Jul 2014
That would be an unusual access pattern for a deeply-nested loop which is what your posting starts off with so figured it was a point worth making regarding storage order and accession for the real case as opposed to just a test that may or may not really reflect the actual.
Matt J
on 25 Jul 2014
Edited: Matt J
on 26 Jul 2014
I suspect it might have something to do with limitations in the JIT and what level of indexing it will optimize in for-loops. Below, I do a similar experiment where I grab the same 'n' number of elements, but using vectorization instead of loops. There is no jump in performance from 4D to 5D.
Subscripting 4D: 0.118063s
Subscripting 5D: 0.121372s
Code used for testing
n=10000000;
A=rand(2,2,2,2,n);
tic
D1=A(1,1,1,1,1:n);
a=toc;
fprintf('Subscripting 4D: %fs\n',a)
B=rand(2,2,2,2,n);
tic
D2=B(1,1,1,1,1:n);
a=toc;
fprintf('Subscripting 5D: %fs\n',a)
2 Comments
dpb
on 29 Jul 2014
No and in fact Matlab isn't really compiled into machine code but intermediate code that eventually calls the LAPACK or whatever native code there is already compiled for the end resulting calculation.
It's bizzaro that there's such apparent differences between 2012 and 2014 releases as Bruno's case and yours report for base functionality.
I'd suggest submitting these results to TMW official support for comment/resolution. All the hoopla added to the interface and all can be lived with for performance degradation in the UI, but when/if they start killing actual gut-level computational speed it is a serious problem if real and not some figment of the particular machine or the like.
See Also
Categories
Find more on Continuous Waveforms in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!