Comparing each vector in a cell array to another vector

8 views (last 30 days)
Hello. I am wondering if there is an easier way to check vectors in my cell array are subsets of another given vector. I want to keep track of how many times the vectors in C are found in T, for many different T's. For example:
T = [1 2 3 4 5 6]
C = { [1 2], [3 4], [5,6], [7 8], [9, 10] }
% loop through each candidate
for j = 1:numCands
c = Ck{j};
% Candidate Counts. Check if candidate is a subset of the
% current transaction
if ismember(c,t)
candCounts(j) = candCounts(j) + 1;
end
% Candidate Tail Counts. Check if the tail of the candidate is
% a subset of the current transaction
if ismember(c(2:end),t)
candTailCounts(j) = candTailCounts(j) + 1;
end
end
What I have done is loop through each member of C and check using ismember() for each element. But this gets really long when C becomes very large. I was wondering if there was any clever way of vectorization which would result in a faster runtime. I would also like to add that T can be varying sized vectors. Not sure if that information is relevant or not.
Thanks in advance!
Edit: For future references here is what I ended up doing, which is similar to what the answers suggested:
% Make all candidates into matrix for faster calculation
allCands = vertcat(Ck{:});
% Get number of columns to get where there are subsets.
[~,col] = size(allCands);
% Find locations where the candidates are subsets of the
% transactions for both the candidates and tail counts.
memb1 = ismember(allCands,t);
memb2 = ismember(allCands(:,2:end),t);
% Subset check
subCheck1 = find(sum(memb1,2) == col);
subCheck2 = find(sum(memb2,2) == col - 1);
% Increment counts
candCounts(subCheck1) = candCounts(subCheck1) + 1;
candTailCounts(subCheck2) = candTailCounts(subCheck2) + 1;
  1 Comment
Walter Roberson
Walter Roberson on 5 Feb 2017
Edited: Walter Roberson on 5 Feb 2017
Does it happen to be the case that C and T only have integer elements in the range 0 to 65535 ?
Are all of the entries in C the same length? Are they always row vectors?

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 5 Feb 2017
Edited: Stephen23 on 5 Feb 2017
Method one: one ismember call: the function ismember is going to to be slowest operation, so one simple speedup is to reduce how many times you call it: Note that a vector c will only be a member if the tail c(2:end) is a member, so you can test for the tail first, and only if the tail is a member do you need to check if the first element is also a member:
T = [1,2,3,4,5,6];
C = {[1,2],[3,4],[5,6],[7,8],[9,10]};
cntC = 0;
cntT = 0;
for k = 1:numel(C)
if ismember(C{k}(2:end),T)
cntT = cntT+1;
cntC = cntC+any(C{k}(1)==T);
end
end
Method two: Convert to a matrix: if all members of C are the same size vectors, then convert C to a matrix and perform one simple comparison using bsxfun:
M = vertcat(C{:});
U = reshape(T,1,1,[]);
X = any(bsxfun(@eq,M,U),3);
xT = all(X(:,2:end),2);
xC = xT&X(:,1);
cntT = sum(xT)
cntC = sum(xC)
  5 Comments
Stephen23
Stephen23 on 5 Feb 2017
"The problem with that is if the elements occur out of order, you would still get a match."
That is what a subset is, and the original question specifically asks for "check vectors in my cell array are subsets of another given vector".
For a subset the order is irrelevant.
Stephen23
Stephen23 on 5 Feb 2017
Edited: Stephen23 on 5 Feb 2017
@Christopher Tran: I am happy to take a look. Please clarify one point: do you want to match the values in the vectors in the same order, or is the order irrelevant (as per set theory and the definition of subset). For example, is [2,1] a subset of [1,2,3] or not?

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 5 Feb 2017
If all members of C are the same length then you can use buffer() with overlap to create an array containing all the subsequences of that length, and transpose that and then convert C to a row matrix and then ismember() with rows option
  1 Comment
Christopher Tran
Christopher Tran on 5 Feb 2017
Edited: Christopher Tran on 5 Feb 2017
Hi Walter, Thanks for the response! The elements in C are indeed the same sized vectors.
I am not sure what is buffer(), I may take a look at it, but I ended up employing a similar idea to yours using vertcat() (similar to how Stephen did), but used sum(), and find() to get the indices where C is a subset of T (note I copied and pasted my code, so all 't's, are 'T's from my above example. Would it be necessary to use the sum() or can i just use the result from ismember()? ismember() would produce a vector of logicals and using the sum() made more sense to me, but I'm not sure. Also I'm not sure how much of an effect it would be on the speed, and also compared to using buffer().
% Make all candidates into matrix for faster calculation
allCands = vertcat(Ck{:});
% Get number of columns to get where there are subsets.
[~,col] = size(allCands);
% Find locations where the candidates are subsets of the
% transactions for both the candidates and tail counts.
memb1 = ismember(allCands,t);
memb2 = ismember(allCands(:,2:end),t);
% Subset check
subCheck1 = find(sum(memb1,2) == col);
subCheck2 = find(sum(memb2,2) == col - 1);
% Increment counts
candCounts(subCheck1) = candCounts(subCheck1) + 1;
candTailCounts(subCheck2) = candTailCounts(subCheck2) + 1;

Sign in to comment.

Categories

Find more on Operators and Elementary Operations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!