60 years of daily data; need to group separately by month and year
9 views (last 30 days)
Show older comments
I am completely new to MATLAB and have about 60 years of daily data (22246 rows) and four columns of interest.
I've imported the data into MATLAB as a table from a .csv.
I want to start by grouping this into months so that I can then analyse averages, minimums, maximums, general trends, plot graphs etc. but I don't want to add up all monthly data into a single value for each month for each column. Instead, I want to see a single value for each month of every year (so condensing this into about 720 rows), not 12 rows where all January data is aggregated and then all February etc.
I.e.
J F M A M J J A S O N D
2000
2001
2002
...
I know that this cannot go into the original table and that a new one will need to be created.
[Later on, I want to do this by year but once I know how to do it by month, I should be able to amend the code accordingly.]
Another problem is that the data starts mid-year and ends mid-year (i.e. start date is not 1st Jan and end date is not 31st Dec), although the data does run consecutively without a break once it starts.
The original dates are in the form DD-MMM-YYYY.
I already know how to do the analysis (e.g. averages) on the data but my problem is getting to a starting point in terms of the values that I want to analyse.
I've tried searching forums already and can't find the answer or at least one that my limited knowledge of MATLAB can understand. Please help as I'm tearing my hair out!
Thank you!
3 Comments
Stephen23
on 26 Feb 2023
Edited: Stephen23
on 26 Feb 2023
Do NOT solve this with loops, lots of UNIQUE calls, padding arrays with ZEROS and the like.
Learn to use MATLAB instead of fighting it. The most important thing is to get the data design right, which does take some practice. But in many cases, a good place to start is to import data as a table:
You can solve this task in just a few table commands: the more you browse the documentation, the more you will find and know what is available to help you solve your problems:
Accepted Answer
Star Strider
on 26 Feb 2023
One approach for the mean values —
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1307475/Flows.csv')
TT1 = table2timetable(T1)
TT1m = retime(TT1, 'monthly', 'mean') % Aggregate On 'mean' Values
VN = T1.Properties.VariableNames;
T1maDate1 = datetime('01-Jan-1958') + calmonths(0:8).'; % Pad First Nine Months
T1ma1 = zeros(size(T1maDate1,1),4);
T1ma1 = [table(T1maDate1) array2table(T1ma1)];
T1ma1.Properties.VariableNames = VN;
T1maDate2 = TT1m.Date(end) + calmonths(1:4).'; % PAd Last Four Months
T1ma2 = zeros(size(T1maDate2,1),4);
T1ma2 = [table(T1maDate2) array2table(T1ma2)];
T1ma2.Properties.VariableNames = VN;
TT1m = [table2timetable(T1ma1); TT1m; table2timetable(T1ma2)] % Pad Array To Fill Out Months
Monthsc = unique(month(TT1m.Date, 'shortname'),'stable');
Yearsc = unique(year(TT1m.Date),'stable');
for k = 1:size(TT1m,2)
T1var{k,1} = array2table(reshape(TT1m{:,k}, 12, []).', 'RowNames',string(Yearsc), 'VariableNames',Monthsc);
T1var{k,2} = VN{k+1};
end
T1var{1,:}
Repeat this for the other statistics.
.
2 Comments
Star Strider
on 26 Feb 2023
As always, my pleasure!
I’ve had some experience with this sort of problem recently with another thread, so I know that there do not appear to be existing functions that can do this sort of operation. (It would be nice if there were!) There might be other ways of preallocating the table to avoid padding both ends of it, however that was not immediately obvious either. The matrix approach seems to work best here.
.
More Answers (2)
Image Analyst
on 26 Feb 2023
First see if you can do it yourself using one of these functions: splitapply, grpstats, groupsummary
If you still can't figure it out, let us know if you have the stats toolbox, and someone will do it for you.
Stephen23
on 26 Feb 2023
Edited: Stephen23
on 27 Feb 2023
The simple MATLAB approach using GROUPSUMMARY and UNSTACK:
M = categorical(datetime(1,1,1,"Format","MMM"):calmonths(1):datetime(1,12,31));
T = readtable('Flows.csv');
T.Year = T.Date.Year;
T.Month = M(T.Date.Month).';
S = groupsummary(T,["Year","Month"],"mean","TotalDailyFlowRate_m3s");
U = unstack(S,"mean_TotalDailyFlowRate_m3s","Month", "GroupingVariables","Year")
4 Comments
Stephen23
on 27 Feb 2023
Edited: Stephen23
on 27 Feb 2023
@Star Strider: thank you! For some reason I didn't think of a non-scalar input to CALMONTHS.
M = categorical(datetime(1,1,1,"Format","MMM")+calmonths(0:11))
sort(M) % sorts into category order (not alphabetic order)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!