Trouble using strcmp in Matlab
4 views (last 30 days)
Show older comments
Background
I have two datasets and I want to write a function that takes only the data from the two datasets that is in common between the two.
Basically, I currently only have data for a few people and I will be getting the data for the other users later. The issue is that I might not get data for all the users so I need my script to reflect that reality so I need to make the script for now take only data from users that I have data available for.
In addition, there are just random users that show up in the dataset that can't be in the final dataset so that is why I'm going out of my way right now to address this issue. The dataset will get very large as we get more users so that's why I'm doing this.
Introducing... the two datasets I have:
"Users.xlsx"
This dataset has information on ALL the users possible.
"Task.xlsx"
This dataset only has information on a few of the users.
Code
This is the code that I've been trying so far and I haven't been able to get it to work:
root_folder = %put file path in ''
users = readtable([root_folder filesep 'Users.xlsx']);
task = readtable([root_folder filesep 'Task.xlsx']);
for sub=1:length(users.UserId)
Opt_Sub = task(strcmp(task.UserId,users.UserId{sub}),:);
end
Each time I run this command, I get an empty table. What I'm basically trying to do is use strcmp to compare between the two datatables and find out which columns are 1s and 0s (with 1 showing up in both columns). If something is in common in both columns, then I use task(...) to get all the common data points.
If anyone has a solution, please let me know. Thanks in advance.
0 Comments
Accepted Answer
Voss
on 20 Jul 2022
Assigning to Opt_Sub on each iteration of the for loop overwrites whatever was assigned to Opt_Sub on previous iterations of the loop.
If, instead, you make Opt_Sub a cell array and assign to one cell of it each time, then the code seems to do what you intend:
root_folder = '.';
users = readtable(fullfile(root_folder,'Users.xlsx'));
task = readtable(fullfile(root_folder,'Task.xlsx'));
N_users = size(users,1);
Opt_Sub = cell(N_users,1);
for sub = 1:N_users
Opt_Sub{sub} = task(strcmp(task.UserId,users.UserId{sub}),:);
end
Opt_Sub
(You were seeing only the table from the last iteration, which is empty.)
However, for your purpose it may be more convenient to split the task table according to the unique UserId values it contains (i.e., maybe you don't need to use the users table at all for this):
[uid,~,jj] = unique(task.UserId);
N_users = numel(uid);
Opt_Sub = cell(N_users,1);
for ii = 1:N_users
Opt_Sub{ii} = task(jj == ii,:);
end
Opt_Sub
(Note that the Opt_Sub from this method contains two 1x6 tables, corresponding to the two task.UserID values that are not in users.UserID.)
I don't know which way makes more sense for what you're ultimately trying to do.
2 Comments
Voss
on 20 Jul 2022
You're welcome!
Both of those approaches give you a cell array of tables, where each cell contains one table corresponding to one UserID. If I understand what you're trying to do, you'll end up with multiple tables one way or another, right? How would you prefer those tables to be stored? They don't have to be in a cell array, but to me that seemed like the most natural way.
Neither of those approaches depends on any days or dates. They'll each do whatever they do, regardless of whether the users show up for multiple days or one day. Try to run the second approach and see if what you get makes sense.
More Answers (0)
See Also
Categories
Find more on Resizing and Reshaping Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!