Extracting and Manipulating Data

Question

skunkezpzeke on 19 Sep 2017

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/357280-extracting-and-manipulating-data

Edited: Cedric on 19 Sep 2017

hello,

i've a question concerning how to make statistical calculations on a large NOAA data set that is not uniform..so to speak.

For instance, i have a set of parameters at an hourly resolution, and need to get the daily average for those multiple parameters over an entire month. However, some days have around 30 entries, whereas others have 24.

I'm wondering how to write a code to average values that were recorded in the same day.

The data set looks like so..(simplified)

[x1, y1, z1, 1; x2, y2, z2, 1; x3, y3, z3, 1; ... ... x1020, y1020, z1020, 31; x1021, y1021, z1021, 31]

where the last column is the day of the month in which the measurements were taken.

is there any way to do this quickly?

i really do not wish to go through my csv file and manually do this..

thanks in advance.

2 Comments
Show NoneHide None

Cedric on 19 Sep 2017

Edited: Cedric on 19 Sep 2017

Don't simplify, give us a slice of this data set, i.e. the header if there is one and a bunch of rows. You can attach a text or netcdf file for example, so the content is not altered.

skunkezpzeke on 19 Sep 2017

data2016.csv

here is a cvs file that I had to edit from the original text file in excel..

Sign in to comment.

Sign in to answer this question.

Answer 1

KSSV on 19 Sep 2017

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/357280-extracting-and-manipulating-data#answer_282152

Convert you dates to to date vector using datevec. From these pick the required month you want and find the average.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Cedric on 19 Sep 2017

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/357280-extracting-and-manipulating-data#answer_282311

Edited: Cedric on 19 Sep 2017

Open in MATLAB Online

Looking at your data:

data = xlsread( 'data2016.csv' ) ;

and displaying a slice:

0001    0.0026    0.0021    0.0074    9.8340    0.0027
0002    0.0024    0.0021    0.0079    9.8340    0.0027
0002    0.0024    0.0021    0.0082    9.8340    0.0027
0004    0.0024    0.0019    0.0071    9.8340    0.0027
       NaN       NaN       NaN       NaN       NaN    0.0027
0002    0.0024    0.0017    0.0066    9.8340    0.0027
0002    0.0023    0.0018    0.0074    9.8310    0.0027
0002    0.0023    0.0017    0.0066    9.8310    0.0027

you can see that some entries have NaN values, which correspond to the empty cells in the Excel spreadsheet.

You may need to count these days in your averages or not, I don't know, but for now I assume that you just need to remove these entries (the rows with NaNs). This can be done by flagging rows with NaN values in any column and removing them:

 isRowToRemove = any( isnan( data ), 2 ) ;
 data(isRowToRemove,:) = [] ;

where you can check that each true (1) element of isRowToRemove corresponds to a row with at least one NaN, and that data has these rows removed after second operation.

Now there are many ways to aggregate data per day and compute an average for example. One is using a basic FOR loop and some indexing. Another, more concise, relies on ACCUMARRAY, using the day ID (in 1:31) as a group index, and performing a mean per group/day. For means per day for column 1, we proceed as follows:

 >> meansCol1 = accumarray( data(:,end), data(:,1), [], @mean )
 meansCol1 =
6996
4215
6764
7800
5183
3167
7547
2788
2469
1607
5915
0578
8337
0324
8477
7001
3180
2206
7195
8254
6286
6322
8840
3040
1650
2166
5646
6683
9563
5061
3568

where the first argument in the call to ACCUMARRAY is the vector of group IDs (the day IDs which are stored in the last column of data), the second is the vector of values (in this example column 1 of data but it can be any column), and the last argument is the function to apply for accumulating/aggregating each group. By default it is a sum, but hear we want a mean, so we pass a handle (@) on the MEAN function.

With that you should be able to perform most types of statistics per day or per anything else as long as you are able to define a vector of group IDs.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Extracting and Manipulating Data

2 Comments
Show NoneHide None

Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Extracting and Manipulating Data

2 Comments Show NoneHide None

Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments