Reducing nonuniform-timed data to x-minute averages

Question

ndb on 14 Jun 2012

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/41209-reducing-nonuniform-timed-data-to-x-minute-averages

I have an incredible amount of data recorded at nonuniform time intervals. I'll like to boil the data down to a multi-minute averages, which can be defined by a variable within the code, say "windowAvg" or "interval". I also have large gaps within the data where no data was recorded, and I'd need to represent that data as well with zeroes or "NaN" or something similar. Here's an example of what the data may look like:

DATE TIME (HH:MM:SS) DATA
1-Mar-12 21:40:58 7.714284452
1-Mar-12 21:40:59 7.076270192
1-Mar-12 21:41:00 6.852246256
1-Mar-12 21:41:01 8.065426585
1-Mar-12 21:41:11 6.17580769
1-Mar-12 21:41:12 8.216123385
1-Mar-12 21:41:14 7.811008001
1-Mar-12 21:41:15 6.888230101....etc

And then what I need it to look like (for example, 5-min avg):

DATE TIME (HH:MM:SS) DATA
1-Mar-12 21:40:00 7.825590127
1-Mar-12 21:45:00 7.488390079
1-Mar-12 21:50:00 NaN
1-Mar-12 21:55:00 7.314028628

My general plan was to create a full day's worth of a timestamp variable (I already have it in MATLAB serial time form), and then "find" where the data fall into the different time intervals, average, and then map the averaged value to the appropriate interval. I have a feeling, though, that there's an easier way to do this with obscure functions that I don't know about. I would appreciate any help.

Thanks, Nic

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 14 Jun 2012

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/41209-reducing-nonuniform-timed-data-to-x-minute-averages#answer_50788

Use the two-output version of histc() on the serial date numbers in order to find out which interval each time is in. Feed the bin numbers and the actual data into accumarray(), specifying @mean as the function it should use, and specifying whichever fill value you want (default is 0). When you get the accumarray() result, use the row index to look up the serial date number of the bin (the one you handed to histc).

The code is quite straight forward once you understand histc() and accumarray()

5 Comments
Show 3 older commentsHide 3 older comments

Walter Roberson on 26 Jun 2012

What that sounds like to me is roundoff error. 1/(24*60*60) is not representable as a finite binary number, so there could be round-off depending on how exactly the values were generated.

You could experiment to see if you get more accuracy by using datevec() on the string times, as that breaks out the minutes field individually.

ndb on 26 Jun 2012

Here's what I do have, for the "brute force" method:

clear all; clc; close all;

%get files in current dir

inputFile_870 = dir('*_870.txt');

%%Other input files

%define the number of files

size_870 = size(inputFile_870);

%%Other variables

%set up 5 minute interval bins

BIN{1} = 0:4;

BIN{2} = 5:9;

BIN{3} = 10:14;

BIN{4} = 15:19;

BIN{5} = 20:24;

BIN{6} = 25:29;

BIN{7} = 30:34;

BIN{8} = 35:39;

BIN{9} = 40:44;

BIN{10} = 45:49;

BIN{11} = 50:54;

BIN{12} = 55:59;

%preallocate variables for the instrument

data_870 = cell(size_870);

babs_avg_870 = cell(size_870);

%%Other Variables

for i=1:size_870

%import data from the i_th file

data_870{i} = importdata(inputFile_870(i).name);

data_870{i} = data_870{i}.data;

%create a flag variable for each day

%flag will be a number defining which bin your date falls into

flag = nan(length(data_870{i}),1);

%search through file and define where each measurement falls into each

%5 minute bin

for j = 1:length(data_870{i})

if isempty(find(BIN{1} == data_870{i}(j,5), 1)) == 0

flag(j) = 0;

elseif isempty(find(BIN{2} == data_870{i}(j,5), 1)) == 0

flag(j) = 5;

elseif isempty(find(BIN{3} == data_870{i}(j,5), 1)) == 0

flag(j) = 10;

elseif isempty(find(BIN{4} == data_870{i}(j,5), 1)) == 0

flag(j) = 15;

elseif isempty(find(BIN{5} == data_870{i}(j,5), 1)) == 0

flag(j) = 20;

elseif isempty(find(BIN{6} == data_870{i}(j,5), 1)) == 0

flag(j) = 25;

elseif isempty(find(BIN{7} == data_870{i}(j,5), 1)) == 0

flag(j) = 30;

elseif isempty(find(BIN{8} == data_870{i}(j,5), 1)) == 0

flag(j) = 35;

elseif isempty(find(BIN{9} == data_870{i}(j,5), 1)) == 0

flag(j) = 40;

elseif isempty(find(BIN{10} == data_870{i}(j,5), 1)) == 0

flag(j) = 45;

elseif isempty(find(BIN{11} == data_870{i}(j,5), 1)) == 0

flag(j) = 50;

elseif isempty(find(BIN{12} == data_870{i}(j,5), 1)) == 0

flag(j) = 55;

end

%assign a datenumber to each valid date of data

dnum = datenum(data_870{i}(:,1),data_870{i}(:,2),data_870{i}(:,3),...

data_870{i}(:,4),flag(:,1),0);

%find the unique date values

uniqDate = unique(dnum);

%preallocate

babs_avg_870{i} = nan(length(uniqDate),1);

%%Other Variables

%find and perform operations on data

for j = 1:length(uniqDate);

loc = find(dnum == uniqDate(j));

babs_avg_870{i}(j) = mean(data_870{i}(loc,7));

%%Other operations

end

----------------

The problem with doing it this way is, of course, speed, because I have many large data files and variables to parse and to perform operations on. Secondly, I liked how the method that you suggested inserted zeros or NaNs or whatever for periods of time that were missing from the data. I need that. Thanks for all your suggestions

Sign in to comment.

Reducing nonuniform-timed data to x-minute averages

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Reducing nonuniform-timed data to x-minute averages

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments