Faster/Simpler way to update my data in a table

42 views (last 30 days)
I have the following code:
H = height(T);
count = 0;
z=num2cell(0);
for i=1:1000
if strcmp(T{i,2},'sage')
count = count+1;
z = num2cell(20*count);
T{i,1} = z;
else
T{i,1} = z;
end
end
And it does precisely what I want it to. My issue is this loop is being applied to a massive table. The table has just over 658k rows in it, and I will need this code to apply to any table I upload, so the row count could potentially be even more for other datasets. After the complete if statement, I added a second if statement of "if i = 1000; break; end" and then I did the "run and time". That portion took about 36 seconds, and I did the math and found that would mean that it takes about 6.7 hours for this one for loop to execute throughout the entire table.
Here is a screenshot of part of the table as an example:
I inlclude this only to show that the string 'sage' shows up at random intervals, so I don't know what commands I could use to skip using a for loop altogether (as that would be the faster option).
  4 Comments
Ruger28
Ruger28 on 18 Feb 2020
is the delta between 'grey' always the same? so once you find grey, the index from the first to the next will always be 20?
stone_ward
stone_ward on 18 Feb 2020
No. And I just realized, it was supposed to be 'sage', not 'grey'... I apologize, I changed the character strings to colors as this is for a work project, and got mixed up. I've updated the code above to reflect that. But as you can see in the screenshot I included, sage first occurs on row 19, the second occurance is on row 30, and the third occurance is row 33. It's random.

Sign in to comment.

Accepted Answer

Jon
Jon on 18 Feb 2020
Edited: Jon on 18 Feb 2020
Here's an approach that I think does what you want.
You would have to benchmark the run time, but the approach is quite vectorized and I would assume it is quite a bit more efficient than what you were doing
% create some data just to demonstrate
% you would read from your actual table
time = 1:100;
colorIdx = randi([1,4],100,1)
colorChoices = {'red','sage','fuschia','gray'}
color = colorChoices(colorIdx)'
% make the table
T = table(time(:),color)
% make the new column incrementing by 20 for each occurrence of sage
% make a logical vector of indices (rows) where color is gray
idlColorMatch = strcmp(T.color,'sage')
% make the new time column
T.newTime = 20*(cumsum(idlColorMatch))
  2 Comments
stone_ward
stone_ward on 21 Feb 2020
This worked perfect, thank you SO much!
Jon
Jon on 21 Feb 2020
Glad to hear that was what you needed. Thanks for letting me know. The cumsum counting of the logical indices comes in handy anytime you need to count occurences, and is so amazingly simple.

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!