Extracting and Manipulating Data

10 views (last 30 days)
skunkezpzeke
skunkezpzeke on 19 Sep 2017
Edited: Cedric on 19 Sep 2017
hello,
i've a question concerning how to make statistical calculations on a large NOAA data set that is not uniform..so to speak.
For instance, i have a set of parameters at an hourly resolution, and need to get the daily average for those multiple parameters over an entire month. However, some days have around 30 entries, whereas others have 24.
I'm wondering how to write a code to average values that were recorded in the same day.
The data set looks like so..(simplified)
[x1, y1, z1, 1; x2, y2, z2, 1; x3, y3, z3, 1; ... ... x1020, y1020, z1020, 31; x1021, y1021, z1021, 31]
where the last column is the day of the month in which the measurements were taken.
is there any way to do this quickly?
i really do not wish to go through my csv file and manually do this..
thanks in advance.
  2 Comments
Cedric
Cedric on 19 Sep 2017
Edited: Cedric on 19 Sep 2017
Don't simplify, give us a slice of this data set, i.e. the header if there is one and a bunch of rows. You can attach a text or netcdf file for example, so the content is not altered.
skunkezpzeke
skunkezpzeke on 19 Sep 2017
here is a cvs file that I had to edit from the original text file in excel..

Sign in to comment.

Answers (2)

KSSV
KSSV on 19 Sep 2017
Convert you dates to to date vector using datevec. From these pick the required month you want and find the average.

Cedric
Cedric on 19 Sep 2017
Edited: Cedric on 19 Sep 2017
Looking at your data:
data = xlsread( 'data2016.csv' ) ;
and displaying a slice:
0.0001 0.0026 0.0021 0.0074 9.8340 0.0027
0.0002 0.0024 0.0021 0.0079 9.8340 0.0027
0.0002 0.0024 0.0021 0.0082 9.8340 0.0027
0.0004 0.0024 0.0019 0.0071 9.8340 0.0027
NaN NaN NaN NaN NaN 0.0027
0.0002 0.0024 0.0017 0.0066 9.8340 0.0027
0.0002 0.0023 0.0018 0.0074 9.8310 0.0027
0.0002 0.0023 0.0017 0.0066 9.8310 0.0027
you can see that some entries have NaN values, which correspond to the empty cells in the Excel spreadsheet.
You may need to count these days in your averages or not, I don't know, but for now I assume that you just need to remove these entries (the rows with NaNs). This can be done by flagging rows with NaN values in any column and removing them:
isRowToRemove = any( isnan( data ), 2 ) ;
data(isRowToRemove,:) = [] ;
where you can check that each true (1) element of isRowToRemove corresponds to a row with at least one NaN, and that data has these rows removed after second operation.
Now there are many ways to aggregate data per day and compute an average for example. One is using a basic FOR loop and some indexing. Another, more concise, relies on ACCUMARRAY, using the day ID (in 1:31) as a group index, and performing a mean per group/day. For means per day for column 1, we proceed as follows:
>> meansCol1 = accumarray( data(:,end), data(:,1), [], @mean )
meansCol1 =
3.6996
2.4215
1.6764
2.7800
2.5183
3.3167
2.7547
4.2788
4.2469
2.1607
1.5915
3.0578
3.8337
5.0324
3.8477
2.7001
2.3180
4.2206
2.7195
1.8254
2.6286
3.6322
1.8840
2.3040
3.1650
2.2166
1.5646
2.6683
0.9563
2.5061
1.3568
where the first argument in the call to ACCUMARRAY is the vector of group IDs (the day IDs which are stored in the last column of data), the second is the vector of values (in this example column 1 of data but it can be any column), and the last argument is the function to apply for accumulating/aggregating each group. By default it is a sum, but hear we want a mean, so we pass a handle (@) on the MEAN function.
With that you should be able to perform most types of statistics per day or per anything else as long as you are able to define a vector of group IDs.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!