check condition if array A contain array B or not

Example, I have:
A = [randperm(1e9,1e8)]';% array A contain 100,000,000 element
B = [randperm(1e5,1e4)]';% array B contain 1,000 element
How to determine if array A contain all element in array B or not? (I know the syntax: ismember(B,A). But the execution time is too long If the size of array A,B is large)
Example code(use ismember, and time consuming=43s) :
clear;clc;
A = [randperm(1e9,1e8)]';%generate 100,000,000 non-repeating random integers element with the value from 1-1,000,000,000
B = [randperm(1e5,1e4)]';%generate 1,000 non-repeating random integers element with the value from 1-10,000
if all(ismember(B,A,'rows')==1) %if A contain all element in B
disp('yes');%If condition is right, show "yes"
else
disp('no');%If condition is wrong, show "no"
end

 Accepted Answer

ha ha
ha ha on 31 Mar 2018
Edited: ha ha on 9 Aug 2018
THE ANSWER IS(this below code(20s) is faster than use ismember(43s)):
clear;clc;
A = [randperm(1e9,1e8)]';%generate 100,000,000 non-repeating random integers element between value from 1-1,000,000,000
B = [randperm(1e5,1e4)]';%generate 10,000 non-repeating random integers element between value from 1-100,000
size_B=size(B,1);%define number of element in array B
C=intersect(A,B);%find intersection between A & B
size_C=size(C,1);%define number of element in array C
if size_C==size_B %if A contain all element in B
disp('yes');%If condition is right, show "yes"
else
disp('no');%If condition is wrong, show "no"
end
%the number of element in A will affect the time significantly

More Answers (2)

B will not be contained in A if you do it like that, with random numbers each being generated independently. However if B is in A you might try ismember(), strfind() or norxcorr2().

14 Comments

ha ha
ha ha on 31 Mar 2018
Edited: ha ha on 31 Mar 2018
You can try my above code. My question is If I increase the number of elements in array A to be 100 million(1e8), so the execution time is long. How to solve this problem?(use another syntax, or.....?)
Why even try? I can tell right now that B is NOT found in A just by looking at what B and A is. Now, maybe if you had used randi() instead of rand() there would have been some chance, but not with rand().
Is A sorted or unsorted in your actual application ?
In my "real" application: A is unsorted
ha ha
ha ha on 31 Mar 2018
Edited: ha ha on 31 Mar 2018
@ Image Analyst. In my "real" application, I have two array A & B(each array contain a large number of element,i.e., >10million element). And my question is: How to check A contain B or not? In above code, I just try to create a "fake" array A & B, for you to imagine my question. And, in my above example, If you emphasize that A have no chane to contain B, so You are totally WRONG. You can check it "BY HAND" by reducing the number of element in A & B.
I'm not wrong for floating point numbers but I see you now have floor() so that makes them integers. So now, it's at least possible, though extremely unlikely. I suggest you use xcorr2() or conv() to find where it's closest. If you want to find an exact match, you can use strfind() or even scan with isequal().
Maybe Roger can give you the probability of finding a specific set of 1000 integers in the range of 0-1000 in a larger set of 1o million numbers. I bet it's vanishingly small, though not completely zero. Even for only 3 numbers, you'll never find it. See demo below:
v = randi(1000, 1, 1e7);
pattern = [17, 320, 576] % Try this for example.
location = strfind(v, pattern) % Empty [] for vast majority of vectors and patterns, meaning it was not found.
pattern = v(3:9) % Test to see if it works for some pattern that's guaranteed to be in the image.
location = strfind(v, pattern)
The difference between sorted and unsorted A is efficiency. The time to sort A is proportional to n*log(n) where n is the number of elements. Once sorted, it is possible to search for any given value in log2(n) time. If there are m values to look for then the total would be n*log(n) + m*log2(n). This applies for unsorted B. But can we do better?
Potentially Yes: if B is sorted then every probe into A can be used to update search boundary information for multiple entries in B, reducing the size of the region of A to be searched in binary technique. But how much can that help?
Ignoring the overhead of searching within sorted B for a moment, we might hypothesize that we might be able to reduce the total search to log2(m)*log2(n). But if B is not sorted we have to add the cost of sorting B, namely m*log(m). That would bring us to m*log(m) + n*log(n) + log(m)*log(n). But as size of B starts to approach size of A then this approaches 2*n*log(n) + log(n)^2 and the search time gets swamped by the sort time. But if both are sorted then you can hypothesize that you might be able to do the test in log-squared time.
I suspect that the overhead of updating the search boundaries in B would not permit log2(m) time, but it is still plausible that you might be able to reduce it below linear in the number of elements. But for unsorted...
@Thanks. Walter Roberson
@Image Analyst: I try your code:
clear;clc;v = [8,10,9,6,2,5,5,4];
pattern = [6, 9, 10]; % Try this for example.
location = strfind(v, pattern);
% Empty [] for vast majority of vectors and patterns, meaning it was not found.
The result always show: location= []??????
(What wrong here???. Since in this case: "v" contain "pattern")
The 8 from v is not in pattern and the 9 from pattern is not in v, so I am not sure why you expect a match.
ha ha
ha ha on 31 Mar 2018
Edited: ha ha on 31 Mar 2018
Sorry @Walter Roberson. I type wrong(please see my above edit). It always show the result is location=[]
If you choose random numbers, then most likely they won't appear in a specified order in the larger set of random numbers. For example, If I take [6, 9, and 10] in that precise order and I take, say 100 numbers that can have values of 0 to 1000, what to you think the chances are that 6, 9, and 10 will be among those 100 numbers in that exact order? Yeah - very very small. Look, here are 100 such numbers
r =
Columns 1 through 20
509 967 645 324 512 614 642 162 362 717 889 596 455 801 800 503 212 709 648 702
Columns 21 through 40
143 380 269 419 973 912 985 447 162 471 536 365 660 205 773 256 809 685 56 998
Columns 41 through 60
205 733 269 46 766 990 226 514 268 602 680 53 203 877 737 339 760 30 841 668
Columns 61 through 80
853 33 180 698 718 569 364 747 875 940 288 73 396 474 956 846 260 399 842 26
Columns 81 through 100
740 257 746 110 299 497 684 819 314 718 392 103 16 254 628 86 855 22 653 519
Do you see [6,9,10] in there? No. Do you think you should? Most likely not.
ha ha
ha ha on 2 Apr 2018
Edited: ha ha on 2 Apr 2018
@Image Analyst: Thank so much. But I think you don't understand my question. You pay attention too much to the input data. I just assume the input data for illustration. While my question is: How to check condition if array A contain array B or not?(Sorry, If my question is NOT clear, and make you misunderstood)
No problem. I just interpreted it differently. I thought you wanted to know if B, as given, was in A, whereas you wanted to know if all the individual elements of B were anywhere in A at all regardless if the whole of B was in there or not. In other words, I was thinking that if you looked over A, could you see B embedded in there somewhere? While you didn't care if the elements of B were in there intact - next to each other in the original, given order - you just cared if the elements were there at all even if B were split apart into individual elements. Just a different interpretation of what "A contains B" means.

Sign in to comment.

ismember() first checks if the second parameter is sorted (a linear process), and if it is not sorted then it sorts it. Then ismember does binary searches for the first parameter within the sorted version of the second parameter.
In theory you can do a little better than this if you sort the first parameter as well, as you can use information about the search programs for one element to reduce the amount of searching that needs to be done for another element. However, the cost of doing the sort of the first parameter adds up as well; you would have to write the algorithm fairly carefully to be sure of coming out ahead.

Categories

Asked:

on 31 Mar 2018

Edited:

on 9 Aug 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!