check condition if array A contain array B or not
Show older comments
Example, I have:
A = [randperm(1e9,1e8)]';% array A contain 100,000,000 element
B = [randperm(1e5,1e4)]';% array B contain 1,000 element
How to determine if array A contain all element in array B or not? (I know the syntax: ismember(B,A). But the execution time is too long If the size of array A,B is large)
Example code(use ismember, and time consuming=43s) :
clear;clc;
A = [randperm(1e9,1e8)]';%generate 100,000,000 non-repeating random integers element with the value from 1-1,000,000,000
B = [randperm(1e5,1e4)]';%generate 1,000 non-repeating random integers element with the value from 1-10,000
if all(ismember(B,A,'rows')==1) %if A contain all element in B
disp('yes');%If condition is right, show "yes"
else
disp('no');%If condition is wrong, show "no"
end
Accepted Answer
More Answers (2)
Image Analyst
on 31 Mar 2018
0 votes
B will not be contained in A if you do it like that, with random numbers each being generated independently. However if B is in A you might try ismember(), strfind() or norxcorr2().
14 Comments
Image Analyst
on 31 Mar 2018
Edited: Image Analyst
on 31 Mar 2018
Why even try? I can tell right now that B is NOT found in A just by looking at what B and A is. Now, maybe if you had used randi() instead of rand() there would have been some chance, but not with rand().
Walter Roberson
on 31 Mar 2018
Is A sorted or unsorted in your actual application ?
Image Analyst
on 31 Mar 2018
I'm not wrong for floating point numbers but I see you now have floor() so that makes them integers. So now, it's at least possible, though extremely unlikely. I suggest you use xcorr2() or conv() to find where it's closest. If you want to find an exact match, you can use strfind() or even scan with isequal().
Maybe Roger can give you the probability of finding a specific set of 1000 integers in the range of 0-1000 in a larger set of 1o million numbers. I bet it's vanishingly small, though not completely zero. Even for only 3 numbers, you'll never find it. See demo below:
v = randi(1000, 1, 1e7);
pattern = [17, 320, 576] % Try this for example.
location = strfind(v, pattern) % Empty [] for vast majority of vectors and patterns, meaning it was not found.
pattern = v(3:9) % Test to see if it works for some pattern that's guaranteed to be in the image.
location = strfind(v, pattern)
Walter Roberson
on 31 Mar 2018
The difference between sorted and unsorted A is efficiency. The time to sort A is proportional to n*log(n) where n is the number of elements. Once sorted, it is possible to search for any given value in log2(n) time. If there are m values to look for then the total would be n*log(n) + m*log2(n). This applies for unsorted B. But can we do better?
Potentially Yes: if B is sorted then every probe into A can be used to update search boundary information for multiple entries in B, reducing the size of the region of A to be searched in binary technique. But how much can that help?
Ignoring the overhead of searching within sorted B for a moment, we might hypothesize that we might be able to reduce the total search to log2(m)*log2(n). But if B is not sorted we have to add the cost of sorting B, namely m*log(m). That would bring us to m*log(m) + n*log(n) + log(m)*log(n). But as size of B starts to approach size of A then this approaches 2*n*log(n) + log(n)^2 and the search time gets swamped by the sort time. But if both are sorted then you can hypothesize that you might be able to do the test in log-squared time.
I suspect that the overhead of updating the search boundaries in B would not permit log2(m) time, but it is still plausible that you might be able to reduce it below linear in the number of elements. But for unsorted...
Walter Roberson
on 31 Mar 2018
The 8 from v is not in pattern and the 9 from pattern is not in v, so I am not sure why you expect a match.
Image Analyst
on 31 Mar 2018
If you choose random numbers, then most likely they won't appear in a specified order in the larger set of random numbers. For example, If I take [6, 9, and 10] in that precise order and I take, say 100 numbers that can have values of 0 to 1000, what to you think the chances are that 6, 9, and 10 will be among those 100 numbers in that exact order? Yeah - very very small. Look, here are 100 such numbers
r =
Columns 1 through 20
509 967 645 324 512 614 642 162 362 717 889 596 455 801 800 503 212 709 648 702
Columns 21 through 40
143 380 269 419 973 912 985 447 162 471 536 365 660 205 773 256 809 685 56 998
Columns 41 through 60
205 733 269 46 766 990 226 514 268 602 680 53 203 877 737 339 760 30 841 668
Columns 61 through 80
853 33 180 698 718 569 364 747 875 940 288 73 396 474 956 846 260 399 842 26
Columns 81 through 100
740 257 746 110 299 497 684 819 314 718 392 103 16 254 628 86 855 22 653 519
Do you see [6,9,10] in there? No. Do you think you should? Most likely not.
Image Analyst
on 2 Apr 2018
No problem. I just interpreted it differently. I thought you wanted to know if B, as given, was in A, whereas you wanted to know if all the individual elements of B were anywhere in A at all regardless if the whole of B was in there or not. In other words, I was thinking that if you looked over A, could you see B embedded in there somewhere? While you didn't care if the elements of B were in there intact - next to each other in the original, given order - you just cared if the elements were there at all even if B were split apart into individual elements. Just a different interpretation of what "A contains B" means.
Walter Roberson
on 31 Mar 2018
0 votes
ismember() first checks if the second parameter is sorted (a linear process), and if it is not sorted then it sorts it. Then ismember does binary searches for the first parameter within the sorted version of the second parameter.
In theory you can do a little better than this if you sort the first parameter as well, as you can use information about the search programs for one element to reduce the amount of searching that needs to be done for another element. However, the cost of doing the sort of the first parameter adds up as well; you would have to write the algorithm fairly carefully to be sure of coming out ahead.
If your A is already sorted then you can see https://www.mathworks.com/matlabcentral/answers/92533-how-do-i-perform-a-binary-search-of-a-presorted-array
Categories
Find more on Shifting and Sorting Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!