# How do I speed up my variable lookup algorithm?

2 views (last 30 days)
Tyler Diorio on 6 Oct 2022
Commented: Bruno Luong on 7 Oct 2022
I have two 3D structures I'm trying to analyze. (1) slantvals is a 2847048x1 double containing label values; (2) var is identical size 224x320x320 int16 containing the variable I'm interested in.
Note: slantvals is a deconstruction of a 3D volume and has been ordered by spatial location. slantvals and var come from the same original grid size (224x320x320) but during the process of converting to 1D array I remove the zero values so instead of having a 23937600x1 array I only have a 2847048x1 array in the slantvals variable.
My goal is to use the labeled regions in slantvals to identify the entries in var that correspond to each label (203 total). I would like to end up with a 2D variable violindata that stores the entries from var that correspond to each label, as follows:
violindata
Label 1 | Label 2 | Label 3 | Label ... | Label N
Entry 1 .. .. .. .. ..
Entry 2 .. .. .. .. ..
Entry . .. .. .. .. ..
Entry N .. .. .. .. ..
I'm currently accomplishing this using the following nested loop:
for label=0:max(slantseg,[],'all') %pre-defined upper limit for-loop
locs=find(slantvals==label); %locs is an Nx1 vector with size varying by the amount of entries found to correspond to `label` each loop
coords=slantinds(locs,:); %coords reconstructs the spatial location of every index from `locs` (e.g. 178936 --> [148,68,135])
for point=1:length(coords) %For each identified coordinate,
coords_x = coords(point,1);
coords_y = coords(point,2);
coords_z = coords(point,3);
violindata(point,label+1)=var(coords_x,coords_y,coords_z); %read the data in `var` at that location and store it
end
end
This code seems to work reliably and the variable violindata is correctly constructing as I'm intending, however it is taking an absurdly long time to process. I'm projecting easily more than 5 hours for this set of for-loops to complete.
Question: Is there any way for me to speed up this for-loop or construct this algorithm in a better way?
I've seen documentation on sliced variables but am very uncertain how to address that in my particular case - any help would be greatly appreciated!
Tyler Diorio on 7 Oct 2022
I've described this in the previous comments of this thread and also added this to the note in the original post, thanks for the suggestion. The number comes from a deconstruction of a 3D variable to a 1D array of values indexed by number.
Bruno Luong on 7 Oct 2022
You first post have size 2847048x1, your comment have size 23937600x1, none of that matches the length 22937600, the second digit is a 2 not a 3, and we don't know what is exactly "deconstruction " of 3D variables.

Jiri Hajek on 6 Oct 2022
It seems your use of the find functiois unnecessary. This function takes a lot of time, which you could verify by profiling your code. There's a much faster option, namely to use logical indices instead:
locs = (slantvals==label);
coords = slantinds(locs,:);
Tyler Diorio on 6 Oct 2022
Thank you for the tip, it looks like the primary slowdown is happening in the nest for-loop:
for point=1:length(coords) %For each identified coordinate,
coords_x = coords(point,1);
coords_y = coords(point,2);
coords_z = coords(point,3);
violindata(point,label+1)=var(coords_x,coords_y,coords_z); %read the data in `var` at that location and store it
end
Jiri Hajek on 7 Oct 2022
OK, there are two other tips you might want to try:
• Rename the "var" matrix, since this name is reserved for variance function in Matlab. Using function identifiers for variables is a bad practice generally. In the following I have used a name "myvar".
• I believe that using linear indexing you could rewrite the inner for loop into a two-liner, as for loops are notoriously known for wasting time. See description of function sub2ind. I'd suggest something like this:
idxs = sub2ind([224x320x320],coords(:,1),coords(:,1),coords(:,1));
violindata(1:length(locs),label+1) = myvar(idxs);
Note also that your coords is a 2-dimensional array and it is a bad practice to use the length function for such a variable, which makes the code error-prone.

Bruno Luong on 7 Oct 2022
Edited: Bruno Luong on 7 Oct 2022
Try this
% Fake data
slantvals = randi([0 10], 1, 224*320*320);
var = rand([5,5,5]);
slantinds = randi(length(var),[numel(slantvals),3]);
tic
[valsorted,is] = sort(slantvals); % sort by label
coords = slantinds(is,:); % range slantinds in the same order (sorted by label)
[len, label, ~, subidx] = runlengthencoder(valsorted); % function below, get the subidx by label
gr = label+1;
m = max(subidx);
n = max(gr);
violindata = nan(m,n);
idxdest = sub2ind(size(violindata),subidx,repelem(gr,len));
idxsrc = sub2ind(size(var),coords(:,1),coords(:,2),coords(:,3));
violindata(idxdest) = var(idxsrc);
toc
Elapsed time is 3.985245 seconds.
function [len, v, gr, subidx] = runlengthencoder(X)
% [len, v, gr, subidx] = runlengthencoder(X)
% Run-length encoder
%
% INPUT
% X is (1 x n) row vector, column is also allowed
% OUTPUTS:
% len: integer arrays (1 x m)
% v: (1 x m) ordering subset of X, such that two adjadcent elements are differents
% and X = replelem(v, len)
% gr: 1:m
% subidx: (1 x n) integer, interior indexes of X with in the group
%
if ~isrow(X)
X = reshape(X, 1, []);
end
n = size(X,2);
b = [true, diff(X)~=0];
ij = find([b, true]);
len = diff(ij);
v = X(b);
if nargout >= 3
gr = repelem(1:length(len),len);
if nargout >= 4
if n > 0 % end indexing requires non empty arrays
subidx = ones(1,n);
subidx(ij(2:end-1)) = 1-len(1:end-1);
subidx = cumsum(subidx);
else
subidx = [];
end
end
end
end % runlengthencoder
Bruno Luong on 7 Oct 2022
Check it returns the same thing on toy example
clear
% Fake data
slantvals = randi([0 10], 1, 4*3*5);
var = rand([5,5,5]);
slantinds = randi(length(var),[numel(slantvals),3]);
tic
for label=0:max(slantvals,[],'all') %pre-defined upper limit for-loop
locs=find(slantvals==label); %locs is an Nx1 vector with size varying by the amount of entries found to correspond to `label` each loop
coords=slantinds(locs,:); %coords reconstructs the spatial location of every index from `locs` (e.g. 178936 --> [148,68,135])
for point=1:size(coords,1) %For each identified coordinate,
coords_x = coords(point,1);
coords_y = coords(point,2);
coords_z = coords(point,3);
violindata(point,label+1)=var(coords_x,coords_y,coords_z); %read the data in `var` at that location and store it
end
end
toc
Elapsed time is 0.006676 seconds.
tic
[valsorted,is] = sort(slantvals); % sort by label
coords = slantinds(is,:); % range slantinds in the same order (sorted by label)
[len, label, ~, subidx] = runlengthencoder(valsorted); % function below, get the subidx by label
gr = label+1;
m = max(subidx);
n = max(gr);
violindata2 = zeros(m,n);
idxdest = sub2ind(size(violindata),subidx,repelem(gr,len));
idxsrc = sub2ind(size(var),coords(:,1),coords(:,2),coords(:,3));
violindata2(idxdest) = var(idxsrc);
toc
Elapsed time is 0.016131 seconds.
isequal(violindata,violindata2)
ans = logical
1
%%
function [len, v, gr, subidx] = runlengthencoder(X)
% [len, v, gr, subidx] = runlengthencoder(X)
% Run-length encoder
%
% INPUT
% X is (1 x n) row vector, column is also allowed
% OUTPUTS:
% len: integer arrays (1 x m)
% v: (1 x m) ordering subset of X, such that two adjadcent elements are differents
% and X = replelem(v, len)
% gr: 1:m
% subidx: (1 x n) integer, interior indexes of X with in the group
%
if ~isrow(X)
X = reshape(X, 1, []);
end
n = size(X,2);
if n > 0
b = [true, diff(X)~=0];
ij = find([b, true]);
len = diff(ij);
v = X(b);
if nargout >= 3
gr = repelem(1:length(len),len);
if nargout >= 4
subidx = ones(1,n);
subidx(ij(2:end-1)) = 1-len(1:end-1);
subidx = cumsum(subidx);
end
end
else
[len, v, gr, subidx] = deal([]);
end
end % runlengthencoder