Fast method for findings which rows of a matrix are contained in another
2 views (last 30 days)
Show older comments
I have two matrices, A and B. The rows of B are a subset of the rows of A, and I want to find the row indices of the rows in A that correpsond to rows in B. I can do this using the intersect or ismember functions, but these are both much too slow for my purposes. I need to perform this calculation hundreds of thousands of times.
A is typically varies from a matrix, and can go up to having several hundred rows (still with 8 columns). B also has 8 columns, and varies from 1 row up to the number of rows in A. Are there any other methods I can use to find the rows of A that are also in B?
Alternatively, I perform a set of operations to the matrix A to come up with the matrix B. Is there a way I can use this to my advantage to find the rows of the final matrix (which would be B) relative to the original matrix A.
0 Comments
Accepted Answer
Stephen23
on 27 Aug 2020
Edited: Stephen23
on 27 Aug 2020
>> A = [1,2,3,4;5,6,7,8;9,10,11,12]
A =
1 2 3 4
5 6 7 8
9 10 11 12
>> B = [5,6,7,8;9,10,11,12]
B =
5 6 7 8
9 10 11 12
Using permute and test for equality (for versions >=R2016b bsxfun is not required):
>> X = any(all(bsxfun(@eq,A,permute(B,[3,2,1])),2),3)
X =
0
1
1
>> find(X) % optional
ans =
2
3
0 Comments
More Answers (1)
Bruno Luong
on 27 Aug 2020
Edited: Bruno Luong
on 27 Aug 2020
Not know how much my code is faster than ISMEMBER (which IMO pretty fast) (EDIT: it's indeed > 3 time faster)
% Generate random A, B such that rows of B are subset of that of A
A = randi(10,1000,8);
B = randi(10,1000,8);
A = [A;B];
A = A(randperm(end),:);
tic
% iA contains rows in A that matches with B
[~,is] = sortrows([A;B]);
isB = is>size(A,1);
ismatch = ~isB(1:end-1)&isB(2:end);
iA = is(ismatch); % row index of A that match B
%iB = is([false;ismatch])-size(A,1); % matching index
%isequal(A(iA,:),B(iB,:)) % meaning isequal(A(iA,:),B(iB,:)) is TRUE
toc % Elapsed time is 0.000656 seconds.
Do you have something unchanged during the loop? Is there any special properties of A and B (sorted already)? If yes it might exist faster methods using those charracteristics.
2 Comments
Bruno Luong
on 27 Aug 2020
Edited: Bruno Luong
on 27 Aug 2020
My code is about 3.8 times faster than ismember for 2000/1000 rows
% Generate random A, B such that rows of B are subset of that of A
A = randi(10,1000,8);
B = randi(10,1000,8);
A = [A;B];
A = A(randperm(end),:);
ntries = 1000;
tic
for k=1:ntries
% iA contains rows in A that matches with B
[~,is] = sortrows([A;B]);
isB = is>size(A,1);
ismatch = ~isB(1:end-1)&isB(2:end);
iA = is(ismatch); % row index of A that match B
%iB = is([false;ismatch])-size(A,1); % matching index
%isequal(A(iA,:),B(iB,:)) % meaning isequal(A(iA,:),B(iB,:)) is TRUE
end
tsort=toc; % Elapsed time is 0.000656 seconds.
tic
for k=1:ntries
tf = ismember(A, B, 'rows');
iA = find(tf);
end
tismember=toc;
tic
for k=1:ntries
X = any(all(bsxfun(@eq,A,permute(B,[3,2,1])),2),3);
end
teq=toc;
fprintf('tsort = %f [s]\n', tsort);
fprintf('tismember = %f [s]\n', tismember);
fprintf('teq = %f [s]\n', teq);
% fprintf('tsort/tismember = %f\n', tsort/tismember);
% fprintf('tismember/tsort = %f\n', tismember/tsort);
Result for 2000/1000 rows
tsort = 0.252900 [s]
tismember = 0.934710 [s]
teq = 12.506212 [s]
Result of 12/6 rows
%A = randi(6,10,8);
%B = randi(6,10,8);
%A = [A;B];
%A = A(randperm(end),:);
tsort = 0.009770 [s]
tismember = 0.126905 [s]
teq = 0.006746 [s]
See Also
Categories
Find more on Logical in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!