Extracting and Manipulating Data
10 views (last 30 days)
Show older comments
hello,
i've a question concerning how to make statistical calculations on a large NOAA data set that is not uniform..so to speak.
For instance, i have a set of parameters at an hourly resolution, and need to get the daily average for those multiple parameters over an entire month. However, some days have around 30 entries, whereas others have 24.
I'm wondering how to write a code to average values that were recorded in the same day.
The data set looks like so..(simplified)
[x1, y1, z1, 1; x2, y2, z2, 1; x3, y3, z3, 1; ... ... x1020, y1020, z1020, 31; x1021, y1021, z1021, 31]
where the last column is the day of the month in which the measurements were taken.
is there any way to do this quickly?
i really do not wish to go through my csv file and manually do this..
thanks in advance.
2 Comments
Answers (2)
KSSV
on 19 Sep 2017
Convert you dates to to date vector using datevec. From these pick the required month you want and find the average.
0 Comments
Cedric
on 19 Sep 2017
Edited: Cedric
on 19 Sep 2017
Looking at your data:
data = xlsread( 'data2016.csv' ) ;
and displaying a slice:
0.0001 0.0026 0.0021 0.0074 9.8340 0.0027
0.0002 0.0024 0.0021 0.0079 9.8340 0.0027
0.0002 0.0024 0.0021 0.0082 9.8340 0.0027
0.0004 0.0024 0.0019 0.0071 9.8340 0.0027
NaN NaN NaN NaN NaN 0.0027
0.0002 0.0024 0.0017 0.0066 9.8340 0.0027
0.0002 0.0023 0.0018 0.0074 9.8310 0.0027
0.0002 0.0023 0.0017 0.0066 9.8310 0.0027
you can see that some entries have NaN values, which correspond to the empty cells in the Excel spreadsheet.
You may need to count these days in your averages or not, I don't know, but for now I assume that you just need to remove these entries (the rows with NaNs). This can be done by flagging rows with NaN values in any column and removing them:
isRowToRemove = any( isnan( data ), 2 ) ;
data(isRowToRemove,:) = [] ;
where you can check that each true (1) element of isRowToRemove corresponds to a row with at least one NaN, and that data has these rows removed after second operation.
Now there are many ways to aggregate data per day and compute an average for example. One is using a basic FOR loop and some indexing. Another, more concise, relies on ACCUMARRAY, using the day ID (in 1:31) as a group index, and performing a mean per group/day. For means per day for column 1, we proceed as follows:
>> meansCol1 = accumarray( data(:,end), data(:,1), [], @mean )
meansCol1 =
3.6996
2.4215
1.6764
2.7800
2.5183
3.3167
2.7547
4.2788
4.2469
2.1607
1.5915
3.0578
3.8337
5.0324
3.8477
2.7001
2.3180
4.2206
2.7195
1.8254
2.6286
3.6322
1.8840
2.3040
3.1650
2.2166
1.5646
2.6683
0.9563
2.5061
1.3568
where the first argument in the call to ACCUMARRAY is the vector of group IDs (the day IDs which are stored in the last column of data), the second is the vector of values (in this example column 1 of data but it can be any column), and the last argument is the function to apply for accumulating/aggregating each group. By default it is a sum, but hear we want a mean, so we pass a handle (@) on the MEAN function.
With that you should be able to perform most types of statistics per day or per anything else as long as you are able to define a vector of group IDs.
0 Comments
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!