Creating ID using year and event number

7 views (last 30 days)
Hi,
I have a set of cyclone data that I am trying to preprocess before I can do some stats on it. In order to do this i need to create a unique id for each system using the year it happened and which event in the year it was. In the sample data i have attached, every time Var1 returns to 1, a new system is registered. So Var1(1:3) should have id 1990S1, Var1(4:5) should be 1990S2, Var(6:7) should be 1990S3 and so forth. I want to create a column of Id and put it in Col1 of my time table so that it easy for me to use 'varfun' to do the stats.
Thanks in advance

Accepted Answer

Star Strider
Star Strider on 27 May 2024
Try this —
load('sample.mat')
whos('-file','sample.mat')
Name Size Bytes Class Attributes sample - 25859 timetable
sample
sample = 50x32 timetable
time Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 ________________ ____ ____ ____ _____ ____ ____ ____ _____ ____ _____ _____ _____ _____ ________ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ 1990-01-01 00:00 1 1 0 -11.5 71.5 92.1 16.5 104.9 6.7 4 3 0 1 1.99e+09 {'T'} 1 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 2 1 0 -11.8 71.8 88.8 16.1 125.1 8.4 6 3 0 4 1.99e+09 {'T'} 1 1 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 3 1 0 -12.5 72 95.4 17 82.8 4.7 2 0 0 7 1.99e+09 {'F'} 1 1 0 0 0 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 00:00 1 2 0 -11 87 87.2 15.3 89.4 5.3 1 0 0 3 1.99e+09 {'F'} 0 1 0 0 0 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 2 2 0 -12 86 90 15.4 52.3 8.5 1 0 0 5 1.99e+09 {'F'} 1 0 0 0 0 0 1 1 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 1 3 0 -18 170 90.1 16.6 70.9 3.5 1 0 0 8 1.99e+09 {'F'} 0 0 0 0 1 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 2 3 0 -17 177 89.1 15.1 64.7 1.6 1 0 0 17 1.99e+09 {'F'} 0 1 0 0 1 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-02 12:00 1 4 0 -11.5 86 95.2 16.4 103.8 9.8 2 2 0 9 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 2 4 0 -11 86 93.4 16.5 154.1 6 4 3 0 12 1.99e+09 {'T'} 0 1 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 12:00 3 4 0 -13.9 86.1 91.3 16.3 182.4 11.3 10 9 0 20 1.99e+09 {'T'} 0 0 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 00:00 4 4 0 -16.1 85.2 92.7 16.6 200.6 15 13 12 0 22 1.99e+09 {'T'} 0 0 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 12:00 5 4 1 -17.8 85 94.9 17 225.2 16.1 10 10 0 25 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 00:00 6 4 1 -20.1 84.2 97.8 16.4 244.4 16.9 10 5 0 28 1.99e+09 {'T'} 0 1 0 0 5 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 12:00 7 4 1 -21.7 84.3 99.5 16.3 301.9 17.7 6 2 0 33 1.99e+09 {'T'} 0 1 0 0 4 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 00:00 8 4 1 -22 82.5 97.8 16.6 235.7 10.3 4 0 0 38 1.99e+09 {'F'} 0 1 0 1 4 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 12:00 9 4 1 -21.7 85.7 97.5 16.5 161.6 15.6 3 1 0 42 1.99e+09 {'F'} 1 0 0 2 0 0 1 0 0 0 0 60 50 85 70 12.5 14
count = cumsum([1; diff(sample.Var1) < 0]);
ID = year(sample.time) + "S" + count;
sample = addvars(sample, ID, 'Before',1)
sample = 50x33 timetable
time ID Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 ________________ ________ ____ ____ ____ _____ ____ ____ ____ _____ ____ _____ _____ _____ _____ ________ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ 1990-01-01 00:00 "1990S1" 1 1 0 -11.5 71.5 92.1 16.5 104.9 6.7 4 3 0 1 1.99e+09 {'T'} 1 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 "1990S1" 2 1 0 -11.8 71.8 88.8 16.1 125.1 8.4 6 3 0 4 1.99e+09 {'T'} 1 1 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 "1990S1" 3 1 0 -12.5 72 95.4 17 82.8 4.7 2 0 0 7 1.99e+09 {'F'} 1 1 0 0 0 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 00:00 "1990S2" 1 2 0 -11 87 87.2 15.3 89.4 5.3 1 0 0 3 1.99e+09 {'F'} 0 1 0 0 0 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 "1990S2" 2 2 0 -12 86 90 15.4 52.3 8.5 1 0 0 5 1.99e+09 {'F'} 1 0 0 0 0 0 1 1 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 "1990S3" 1 3 0 -18 170 90.1 16.6 70.9 3.5 1 0 0 8 1.99e+09 {'F'} 0 0 0 0 1 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 "1990S3" 2 3 0 -17 177 89.1 15.1 64.7 1.6 1 0 0 17 1.99e+09 {'F'} 0 1 0 0 1 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-02 12:00 "1990S4" 1 4 0 -11.5 86 95.2 16.4 103.8 9.8 2 2 0 9 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 "1990S4" 2 4 0 -11 86 93.4 16.5 154.1 6 4 3 0 12 1.99e+09 {'T'} 0 1 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 12:00 "1990S4" 3 4 0 -13.9 86.1 91.3 16.3 182.4 11.3 10 9 0 20 1.99e+09 {'T'} 0 0 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 00:00 "1990S4" 4 4 0 -16.1 85.2 92.7 16.6 200.6 15 13 12 0 22 1.99e+09 {'T'} 0 0 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 12:00 "1990S4" 5 4 1 -17.8 85 94.9 17 225.2 16.1 10 10 0 25 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 00:00 "1990S4" 6 4 1 -20.1 84.2 97.8 16.4 244.4 16.9 10 5 0 28 1.99e+09 {'T'} 0 1 0 0 5 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 12:00 "1990S4" 7 4 1 -21.7 84.3 99.5 16.3 301.9 17.7 6 2 0 33 1.99e+09 {'T'} 0 1 0 0 4 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 00:00 "1990S4" 8 4 1 -22 82.5 97.8 16.6 235.7 10.3 4 0 0 38 1.99e+09 {'F'} 0 1 0 1 4 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 12:00 "1990S4" 9 4 1 -21.7 85.7 97.5 16.5 161.6 15.6 3 1 0 42 1.99e+09 {'F'} 1 0 0 2 0 0 1 0 0 0 0 60 50 85 70 12.5 14
The logic is straightforward. The ‘count’ assignment cumulatively adds every instance where the difference in ‘Var1’ decrements, so the result is true or 1. Logic values become numeric when used in calculations, so ‘count’ increments accordingly. The ‘ID’ variable is easy to create using the year and the value for ‘count’ and string variable syntax. (If you do not have string variables, use sprintf to create a similar array.)
I kept all the other variables here. Use removevars to eliminate ‘Var1’ if you no longer want it in your timetablle.
.
  4 Comments
Sarvesh
Sarvesh on 28 May 2024
Thanks for this once again.
I noticed that there were some issues when i ran the previous code, especially when there was a change in the year. It did what i wanted but the system number was not resetting to 1 when a new year began. So i created a for loop to loop through each year to solve it. But this seems more robust (like you mentioned) and better then the loop approach.
Star Strider
Star Strider on 28 May 2024
As always, my pleasure!
The accumarray approach is efficient as well as fast.
A complete function using this would be:
function newTable = createID(oldTable) % Complete Function
newTable = oldTable;
[Uyear,idx1,idx2] = unique(year(newTable.time)); % Required
Yr_Var1_count = accumarray(idx2, (1:numel(idx2)).', [], @(x){cumsum([1; diff(newTable.Var1(x)) < 0])}); % Separate Vectors For Each Year (Function)
ID = year(newTable.time) + "S" + cat(1,cell2mat(Yr_Var1_count));
newTable = addvars(newTable, ID, 'Before',1);
end
You can either copy it and paste it in a specific script, or save it as createID.m in your MATLAB search path, and then call it when you need it, or both, depending on what you want to do. It does not change the original timetable, creating a new version as the output.
My apologies for not considering the multi-year problem earlier.
Illustrating how to use ‘createID’ —
% Create Test Timetable —
time = datetime(reshape(repmat([1990; 1991; 1992], 1, 7).', [], 1), repmat((1:7).',3,1), ones(21,1), 'Format','yyyy-MM-dd');
Var1 = repmat([1;2;1;2;3;1;2],3,1);
T2 = timetable(time,Var1)
T2 = 21x1 timetable
time Var1 __________ ____ 1990-01-01 1 1990-02-01 2 1990-03-01 1 1990-04-01 2 1990-05-01 3 1990-06-01 1 1990-07-01 2 1991-01-01 1 1991-02-01 2 1991-03-01 1 1991-04-01 2 1991-05-01 3 1991-06-01 1 1991-07-01 2 1992-01-01 1 1992-02-01 2
tic
T2new = createID(T2)
T2new = 21x2 timetable
time ID Var1 __________ ________ ____ 1990-01-01 "1990S1" 1 1990-02-01 "1990S1" 2 1990-03-01 "1990S2" 1 1990-04-01 "1990S2" 2 1990-05-01 "1990S2" 3 1990-06-01 "1990S3" 1 1990-07-01 "1990S3" 2 1991-01-01 "1991S1" 1 1991-02-01 "1991S1" 2 1991-03-01 "1991S2" 1 1991-04-01 "1991S2" 2 1991-05-01 "1991S2" 3 1991-06-01 "1991S3" 1 1991-07-01 "1991S3" 2 1992-01-01 "1992S1" 1 1992-02-01 "1992S1" 2
toc
Elapsed time is 0.057196 seconds.
% Use Ofiginal Timetable —
load('sample.mat')
sample
sample = 50x32 timetable
time Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 ________________ ____ ____ ____ _____ ____ ____ ____ _____ ____ _____ _____ _____ _____ ________ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ 1990-01-01 00:00 1 1 0 -11.5 71.5 92.1 16.5 104.9 6.7 4 3 0 1 1.99e+09 {'T'} 1 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 2 1 0 -11.8 71.8 88.8 16.1 125.1 8.4 6 3 0 4 1.99e+09 {'T'} 1 1 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 3 1 0 -12.5 72 95.4 17 82.8 4.7 2 0 0 7 1.99e+09 {'F'} 1 1 0 0 0 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 00:00 1 2 0 -11 87 87.2 15.3 89.4 5.3 1 0 0 3 1.99e+09 {'F'} 0 1 0 0 0 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 2 2 0 -12 86 90 15.4 52.3 8.5 1 0 0 5 1.99e+09 {'F'} 1 0 0 0 0 0 1 1 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 1 3 0 -18 170 90.1 16.6 70.9 3.5 1 0 0 8 1.99e+09 {'F'} 0 0 0 0 1 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 2 3 0 -17 177 89.1 15.1 64.7 1.6 1 0 0 17 1.99e+09 {'F'} 0 1 0 0 1 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-02 12:00 1 4 0 -11.5 86 95.2 16.4 103.8 9.8 2 2 0 9 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 2 4 0 -11 86 93.4 16.5 154.1 6 4 3 0 12 1.99e+09 {'T'} 0 1 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 12:00 3 4 0 -13.9 86.1 91.3 16.3 182.4 11.3 10 9 0 20 1.99e+09 {'T'} 0 0 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 00:00 4 4 0 -16.1 85.2 92.7 16.6 200.6 15 13 12 0 22 1.99e+09 {'T'} 0 0 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 12:00 5 4 1 -17.8 85 94.9 17 225.2 16.1 10 10 0 25 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 00:00 6 4 1 -20.1 84.2 97.8 16.4 244.4 16.9 10 5 0 28 1.99e+09 {'T'} 0 1 0 0 5 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 12:00 7 4 1 -21.7 84.3 99.5 16.3 301.9 17.7 6 2 0 33 1.99e+09 {'T'} 0 1 0 0 4 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 00:00 8 4 1 -22 82.5 97.8 16.6 235.7 10.3 4 0 0 38 1.99e+09 {'F'} 0 1 0 1 4 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 12:00 9 4 1 -21.7 85.7 97.5 16.5 161.6 15.6 3 1 0 42 1.99e+09 {'F'} 1 0 0 2 0 0 1 0 0 0 0 60 50 85 70 12.5 14
tic
sampleNew = createID(sample)
sampleNew = 50x33 timetable
time ID Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25 Var26 Var27 Var28 Var29 Var30 Var31 Var32 ________________ ________ ____ ____ ____ _____ ____ ____ ____ _____ ____ _____ _____ _____ _____ ________ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ 1990-01-01 00:00 "1990S1" 1 1 0 -11.5 71.5 92.1 16.5 104.9 6.7 4 3 0 1 1.99e+09 {'T'} 1 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 "1990S1" 2 1 0 -11.8 71.8 88.8 16.1 125.1 8.4 6 3 0 4 1.99e+09 {'T'} 1 1 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 "1990S1" 3 1 0 -12.5 72 95.4 17 82.8 4.7 2 0 0 7 1.99e+09 {'F'} 1 1 0 0 0 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-01 00:00 "1990S2" 1 2 0 -11 87 87.2 15.3 89.4 5.3 1 0 0 3 1.99e+09 {'F'} 0 1 0 0 0 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-01 12:00 "1990S2" 2 2 0 -12 86 90 15.4 52.3 8.5 1 0 0 5 1.99e+09 {'F'} 1 0 0 0 0 0 1 1 0 0 0 60 50 85 70 12.5 14 1990-01-02 00:00 "1990S3" 1 3 0 -18 170 90.1 16.6 70.9 3.5 1 0 0 8 1.99e+09 {'F'} 0 0 0 0 1 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 "1990S3" 2 3 0 -17 177 89.1 15.1 64.7 1.6 1 0 0 17 1.99e+09 {'F'} 0 1 0 0 1 0 1 0 1 0 0 60 50 85 70 12.5 14 1990-01-02 12:00 "1990S4" 1 4 0 -11.5 86 95.2 16.4 103.8 9.8 2 2 0 9 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 00:00 "1990S4" 2 4 0 -11 86 93.4 16.5 154.1 6 4 3 0 12 1.99e+09 {'T'} 0 1 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-03 12:00 "1990S4" 3 4 0 -13.9 86.1 91.3 16.3 182.4 11.3 10 9 0 20 1.99e+09 {'T'} 0 0 1 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 00:00 "1990S4" 4 4 0 -16.1 85.2 92.7 16.6 200.6 15 13 12 0 22 1.99e+09 {'T'} 0 0 0 0 1 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-04 12:00 "1990S4" 5 4 1 -17.8 85 94.9 17 225.2 16.1 10 10 0 25 1.99e+09 {'T'} 0 0 0 0 0 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 00:00 "1990S4" 6 4 1 -20.1 84.2 97.8 16.4 244.4 16.9 10 5 0 28 1.99e+09 {'T'} 0 1 0 0 5 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-05 12:00 "1990S4" 7 4 1 -21.7 84.3 99.5 16.3 301.9 17.7 6 2 0 33 1.99e+09 {'T'} 0 1 0 0 4 0 0 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 00:00 "1990S4" 8 4 1 -22 82.5 97.8 16.6 235.7 10.3 4 0 0 38 1.99e+09 {'F'} 0 1 0 1 4 0 1 0 0 0 0 60 50 85 70 12.5 14 1990-01-06 12:00 "1990S4" 9 4 1 -21.7 85.7 97.5 16.5 161.6 15.6 3 1 0 42 1.99e+09 {'F'} 1 0 0 2 0 0 1 0 0 0 0 60 50 85 70 12.5 14
toc
Elapsed time is 0.013666 seconds.
function newTable = createID(oldTable) % Complete Function
newTable = oldTable;
[Uyear,idx1,idx2] = unique(year(newTable.time)); % Required
Yr_Var1_count = accumarray(idx2, (1:numel(idx2)).', [], @(x){cumsum([1; diff(newTable.Var1(x)) < 0])}); % Separate Vectors For Each Year (Function)
ID = year(newTable.time) + "S" + cat(1,cell2mat(Yr_Var1_count));
newTable = addvars(newTable, ID, 'Before',1);
end
.

Sign in to comment.

More Answers (0)

Categories

Find more on Timetables in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!