60 years of daily data; need to group separately by month and year

9 views (last 30 days)
I am completely new to MATLAB and have about 60 years of daily data (22246 rows) and four columns of interest.
I've imported the data into MATLAB as a table from a .csv.
I want to start by grouping this into months so that I can then analyse averages, minimums, maximums, general trends, plot graphs etc. but I don't want to add up all monthly data into a single value for each month for each column. Instead, I want to see a single value for each month of every year (so condensing this into about 720 rows), not 12 rows where all January data is aggregated and then all February etc.
I.e.
J F M A M J J A S O N D
2000
2001
2002
...
I know that this cannot go into the original table and that a new one will need to be created.
[Later on, I want to do this by year but once I know how to do it by month, I should be able to amend the code accordingly.]
Another problem is that the data starts mid-year and ends mid-year (i.e. start date is not 1st Jan and end date is not 31st Dec), although the data does run consecutively without a break once it starts.
The original dates are in the form DD-MMM-YYYY.
I already know how to do the analysis (e.g. averages) on the data but my problem is getting to a starting point in terms of the values that I want to analyse.
I've tried searching forums already and can't find the answer or at least one that my limited knowledge of MATLAB can understand. Please help as I'm tearing my hair out!
Thank you!
  3 Comments
Butterfly
Butterfly on 26 Feb 2023
Thank you.
I've attached the original .csv
The columns are rates and I know that I will need to convert them later but that isn't an issue, it's more how I group the dates to be able to do something with them.
Stephen23
Stephen23 on 26 Feb 2023
Edited: Stephen23 on 26 Feb 2023
Do NOT solve this with loops, lots of UNIQUE calls, padding arrays with ZEROS and the like.
Learn to use MATLAB instead of fighting it. The most important thing is to get the data design right, which does take some practice. But in many cases, a good place to start is to import data as a table:
You can solve this task in just a few table commands: the more you browse the documentation, the more you will find and know what is available to help you solve your problems:

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 26 Feb 2023
One approach for the mean values —
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1307475/Flows.csv')
T1 = 22246×5 table
Date TotalDailyFlowRate_m3s DailyGlacialFlowRate_m3s DailyBaseFlowRate_m3s DailyOverlandFlowRate_m3s ___________ ______________________ ________________________ _____________________ _________________________ 01-Oct-1958 262.98 249.21 6.3874 7.3869 02-Oct-1958 246.17 233.64 6.3796 6.1467 03-Oct-1958 231.41 215.65 6.3968 9.3586 04-Oct-1958 214.99 200.39 6.3953 8.2047 05-Oct-1958 215.77 193.59 6.4526 15.731 06-Oct-1958 188.71 168.12 6.4688 14.117 07-Oct-1958 166.33 147.33 6.4654 12.533 08-Oct-1958 158.55 136.94 6.4832 15.129 09-Oct-1958 152.23 128.36 6.5255 17.345 10-Oct-1958 132.55 110.14 6.5371 15.87 11-Oct-1958 112.16 93.14 6.5222 12.5 12-Oct-1958 96.257 79.93 6.4984 9.8288 13-Oct-1958 83.62 69.34 6.4726 7.8076 14-Oct-1958 77.848 62.82 6.4694 8.5586 15-Oct-1958 68.215 54.904 6.451 6.8597 16-Oct-1958 61.185 48.27 6.4314 6.4832
TT1 = table2timetable(T1)
TT1 = 22246×4 timetable
Date TotalDailyFlowRate_m3s DailyGlacialFlowRate_m3s DailyBaseFlowRate_m3s DailyOverlandFlowRate_m3s ___________ ______________________ ________________________ _____________________ _________________________ 01-Oct-1958 262.98 249.21 6.3874 7.3869 02-Oct-1958 246.17 233.64 6.3796 6.1467 03-Oct-1958 231.41 215.65 6.3968 9.3586 04-Oct-1958 214.99 200.39 6.3953 8.2047 05-Oct-1958 215.77 193.59 6.4526 15.731 06-Oct-1958 188.71 168.12 6.4688 14.117 07-Oct-1958 166.33 147.33 6.4654 12.533 08-Oct-1958 158.55 136.94 6.4832 15.129 09-Oct-1958 152.23 128.36 6.5255 17.345 10-Oct-1958 132.55 110.14 6.5371 15.87 11-Oct-1958 112.16 93.14 6.5222 12.5 12-Oct-1958 96.257 79.93 6.4984 9.8288 13-Oct-1958 83.62 69.34 6.4726 7.8076 14-Oct-1958 77.848 62.82 6.4694 8.5586 15-Oct-1958 68.215 54.904 6.451 6.8597 16-Oct-1958 61.185 48.27 6.4314 6.4832
TT1m = retime(TT1, 'monthly', 'mean') % Aggregate On 'mean' Values
TT1m = 731×4 timetable
Date TotalDailyFlowRate_m3s DailyGlacialFlowRate_m3s DailyBaseFlowRate_m3s DailyOverlandFlowRate_m3s ___________ ______________________ ________________________ _____________________ _________________________ 01-Oct-1958 129.2 114.23 6.3913 8.5827 01-Nov-1958 89.939 76.009 6.1106 7.8195 01-Dec-1958 34.794 26.32 5.8581 2.6153 01-Jan-1959 9.7249 2.5744 5.6172 1.5334 01-Feb-1959 73.6 39.205 5.9106 28.484 01-Mar-1959 56.556 20.537 6.1204 29.899 01-Apr-1959 18.935 3.884 6.0532 8.9979 01-May-1959 163.63 127.44 6.1644 30.022 01-Jun-1959 193.65 167.27 6.3397 20.043 01-Jul-1959 269.78 258.27 6.5708 4.9427 01-Aug-1959 273.15 259.91 6.9512 6.2894 01-Sep-1959 290.11 280.38 7.1199 2.6187 01-Oct-1959 309.5 292.84 7.1115 9.543 01-Nov-1959 46.408 34.154 6.8302 5.4234 01-Dec-1959 26.45 8.4658 6.769 11.215 01-Jan-1960 14.466 2.8862 6.4795 5.1
VN = T1.Properties.VariableNames;
T1maDate1 = datetime('01-Jan-1958') + calmonths(0:8).'; % Pad First Nine Months
T1ma1 = zeros(size(T1maDate1,1),4);
T1ma1 = [table(T1maDate1) array2table(T1ma1)];
T1ma1.Properties.VariableNames = VN;
T1maDate2 = TT1m.Date(end) + calmonths(1:4).'; % PAd Last Four Months
T1ma2 = zeros(size(T1maDate2,1),4);
T1ma2 = [table(T1maDate2) array2table(T1ma2)];
T1ma2.Properties.VariableNames = VN;
TT1m = [table2timetable(T1ma1); TT1m; table2timetable(T1ma2)] % Pad Array To Fill Out Months
TT1m = 744×4 timetable
Date TotalDailyFlowRate_m3s DailyGlacialFlowRate_m3s DailyBaseFlowRate_m3s DailyOverlandFlowRate_m3s ___________ ______________________ ________________________ _____________________ _________________________ 01-Jan-1958 0 0 0 0 01-Feb-1958 0 0 0 0 01-Mar-1958 0 0 0 0 01-Apr-1958 0 0 0 0 01-May-1958 0 0 0 0 01-Jun-1958 0 0 0 0 01-Jul-1958 0 0 0 0 01-Aug-1958 0 0 0 0 01-Sep-1958 0 0 0 0 01-Oct-1958 129.2 114.23 6.3913 8.5827 01-Nov-1958 89.939 76.009 6.1106 7.8195 01-Dec-1958 34.794 26.32 5.8581 2.6153 01-Jan-1959 9.7249 2.5744 5.6172 1.5334 01-Feb-1959 73.6 39.205 5.9106 28.484 01-Mar-1959 56.556 20.537 6.1204 29.899 01-Apr-1959 18.935 3.884 6.0532 8.9979
Monthsc = unique(month(TT1m.Date, 'shortname'),'stable');
Yearsc = unique(year(TT1m.Date),'stable');
for k = 1:size(TT1m,2)
T1var{k,1} = array2table(reshape(TT1m{:,k}, 12, []).', 'RowNames',string(Yearsc), 'VariableNames',Monthsc);
T1var{k,2} = VN{k+1};
end
T1var{1,:}
ans = 62×12 table
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ 1958 0 0 0 0 0 0 0 0 0 129.2 89.939 34.794 1959 9.7249 73.6 56.556 18.935 163.63 193.65 269.78 273.15 290.11 309.5 46.408 26.45 1960 14.466 27.609 40.214 61.283 193.39 256.49 293.34 192.61 156.99 53.905 35.807 24.932 1961 19.586 61.275 44.534 33.466 150.12 198.28 253.94 240.21 253.13 128.59 38.756 14.241 1962 13.667 18.143 9.0158 57.854 103.59 209.15 288.26 203.01 100.41 91.773 27.435 32.287 1963 9.3682 10.313 28.836 27.881 62.027 237.22 250.54 235.35 87.257 41.598 32.768 24.114 1964 28.141 17.098 47.333 38.92 112.4 185.05 261.22 192.24 91.762 93.745 48.909 14.042 1965 6.8003 12.15 9.5058 30.933 87.688 223.38 348.93 277.74 82.767 182.96 60.853 17.038 1966 17.744 6.3796 6.7594 27.768 60.589 251.67 348.15 208.17 120.65 38.748 16.603 7.2417 1967 14.554 13.792 6.2027 37.679 49.591 172.03 200.29 207.8 190.33 40.039 15.872 22.393 1968 7.7021 15.294 22.73 47.951 58.555 198.43 273.33 359.3 257.67 45.569 109.88 28.456 1969 7.9723 6.8581 14.836 31.601 77.444 318.6 290.79 413.61 185.35 56.564 10.3 7.0286 1970 20.799 7.8935 7.7874 24.117 106.78 261.32 165.99 233.28 130.14 135.45 16.521 14.676 1971 8.9001 9.1384 16.412 32.803 91.044 163.4 260.7 308.83 213.12 88.138 55.509 17.113 1972 23.795 17.257 20.932 44.053 107.2 136.74 257.66 223.04 207.29 79.439 14.373 50.723 1973 44.745 8.5222 22.302 50.595 44.739 104.35 232.83 222.83 224.81 119.32 25.683 10.874
ans = 'TotalDailyFlowRate_m3s'
Repeat this for the other statistics.
.
  2 Comments
Butterfly
Butterfly on 26 Feb 2023
Thank you very much! This helps me no end! It was completely different to how I was trying to do it (I know that there are probably many ways to skin a cat with this type of stuff). I have been struggling with this for over 2 weeks now and every step forward seemed like ten steps back. I was too proud to ask for help and have just been getting more and more wound up with it. Thank you!
Star Strider
Star Strider on 26 Feb 2023
As always, my pleasure!
I’ve had some experience with this sort of problem recently with another thread, so I know that there do not appear to be existing functions that can do this sort of operation. (It would be nice if there were!) There might be other ways of preallocating the table to avoid padding both ends of it, however that was not immediately obvious either. The matrix approach seems to work best here.
.

Sign in to comment.

More Answers (2)

Image Analyst
Image Analyst on 26 Feb 2023
First see if you can do it yourself using one of these functions: splitapply, grpstats, groupsummary
If you still can't figure it out, let us know if you have the stats toolbox, and someone will do it for you.
  1 Comment
Butterfly
Butterfly on 26 Feb 2023
Thank you. I've just been trying with those but was getting in even more of a mess. I will bookmark them to go back and review later though. I think with Star Strider's help, I've got enough to go on now.

Sign in to comment.


Stephen23
Stephen23 on 26 Feb 2023
Edited: Stephen23 on 27 Feb 2023
The simple MATLAB approach using GROUPSUMMARY and UNSTACK:
M = categorical(datetime(1,1,1,"Format","MMM"):calmonths(1):datetime(1,12,31));
T = readtable('Flows.csv');
T.Year = T.Date.Year;
T.Month = M(T.Date.Month).';
S = groupsummary(T,["Year","Month"],"mean","TotalDailyFlowRate_m3s");
U = unstack(S,"mean_TotalDailyFlowRate_m3s","Month", "GroupingVariables","Year")
U = 62×13 table
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ____ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ 1958 NaN NaN NaN NaN NaN NaN NaN NaN NaN 129.2 89.939 34.794 1959 9.7249 73.6 56.556 18.935 163.63 193.65 269.78 273.15 290.11 309.5 46.408 26.45 1960 14.466 27.609 40.214 61.283 193.39 256.49 293.34 192.61 156.99 53.905 35.807 24.932 1961 19.586 61.275 44.534 33.466 150.12 198.28 253.94 240.21 253.13 128.59 38.756 14.241 1962 13.667 18.143 9.0158 57.854 103.59 209.15 288.26 203.01 100.41 91.773 27.435 32.287 1963 9.3682 10.313 28.836 27.881 62.027 237.22 250.54 235.35 87.257 41.598 32.768 24.114 1964 28.141 17.098 47.333 38.92 112.4 185.05 261.22 192.24 91.762 93.745 48.909 14.042 1965 6.8003 12.15 9.5058 30.933 87.688 223.38 348.93 277.74 82.767 182.96 60.853 17.038 1966 17.744 6.3796 6.7594 27.768 60.589 251.67 348.15 208.17 120.65 38.748 16.603 7.2417 1967 14.554 13.792 6.2027 37.679 49.591 172.03 200.29 207.8 190.33 40.039 15.872 22.393 1968 7.7021 15.294 22.73 47.951 58.555 198.43 273.33 359.3 257.67 45.569 109.88 28.456 1969 7.9723 6.8581 14.836 31.601 77.444 318.6 290.79 413.61 185.35 56.564 10.3 7.0286 1970 20.799 7.8935 7.7874 24.117 106.78 261.32 165.99 233.28 130.14 135.45 16.521 14.676 1971 8.9001 9.1384 16.412 32.803 91.044 163.4 260.7 308.83 213.12 88.138 55.509 17.113 1972 23.795 17.257 20.932 44.053 107.2 136.74 257.66 223.04 207.29 79.439 14.373 50.723 1973 44.745 8.5222 22.302 50.595 44.739 104.35 232.83 222.83 224.81 119.32 25.683 10.874
  4 Comments
Stephen23
Stephen23 on 27 Feb 2023
Edited: Stephen23 on 27 Feb 2023
@Star Strider: thank you! For some reason I didn't think of a non-scalar input to CALMONTHS.
M = categorical(datetime(1,1,1,"Format","MMM")+calmonths(0:11))
M = 1×12 categorical array
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
sort(M) % sorts into category order (not alphabetic order)
ans = 1×12 categorical array
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Sign in to comment.

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!