## Reducing nonuniform-timed data to x-minute averages

on 14 Jun 2012

### Walter Roberson (view profile)

I have an incredible amount of data recorded at nonuniform time intervals. I'll like to boil the data down to a multi-minute averages, which can be defined by a variable within the code, say "windowAvg" or "interval". I also have large gaps within the data where no data was recorded, and I'd need to represent that data as well with zeroes or "NaN" or something similar. Here's an example of what the data may look like:

• DATE TIME (HH:MM:SS) DATA
• 1-Mar-12 21:40:58 7.714284452
• 1-Mar-12 21:40:59 7.076270192
• 1-Mar-12 21:41:00 6.852246256
• 1-Mar-12 21:41:01 8.065426585
• 1-Mar-12 21:41:11 6.17580769
• 1-Mar-12 21:41:12 8.216123385
• 1-Mar-12 21:41:14 7.811008001
• 1-Mar-12 21:41:15 6.888230101 ....etc

And then what I need it to look like (for example, 5-min avg):

• DATE TIME (HH:MM:SS) DATA
• 1-Mar-12 21:40:00 7.825590127
• 1-Mar-12 21:45:00 7.488390079
• 1-Mar-12 21:50:00 NaN
• 1-Mar-12 21:55:00 7.314028628

My general plan was to create a full day's worth of a timestamp variable (I already have it in MATLAB serial time form), and then "find" where the data fall into the different time intervals, average, and then map the averaged value to the appropriate interval. I have a feeling, though, that there's an easier way to do this with obscure functions that I don't know about. I would appreciate any help.

Thanks, Nic

## Products

No products are associated with this question.

### Walter Roberson (view profile)

on 14 Jun 2012

Use the two-output version of histc() on the serial date numbers in order to find out which interval each time is in. Feed the bin numbers and the actual data into accumarray(), specifying @mean as the function it should use, and specifying whichever fill value you want (default is 0). When you get the accumarray() result, use the row index to look up the serial date number of the bin (the one you handed to histc).

The code is quite straight forward once you understand histc() and accumarray()

Nic

### Nic (view profile)

on 26 Jun 2012

Hi Walter:
Thank you for your reply; however this is similar to the code that I was working with after you first suggested histc() and accumarry(). The problem that I'm running into using binning/accumarry method is some issue with the precision of datenumbers and/or floor() and/or mod(). For example: Mar 21 2012 01:05:00 needs to be in a bin above 01:04:59 of the same date, but the code won't do that as is. It places 01:05:00 in the same bin as 01:04:59, which throws the whole averaging off. And then, I tried to check if the seconds column of the raw datenums == 0, put in previous bin (except 0:00:00, of course), but then I discovered that it wouldn't have the binning problem described above *every* time a measurement landed on the :00 seconds mark...

Walter Roberson

### Walter Roberson (view profile)

on 26 Jun 2012

What that sounds like to me is roundoff error. 1/(24*60*60) is not representable as a finite binary number, so there could be round-off depending on how exactly the values were generated.

You could experiment to see if you get more accuracy by using datevec() on the string times, as that breaks out the minutes field individually.

Nic

### Nic (view profile)

on 26 Jun 2012

Here's what I do have, for the "brute force" method:

clear all; clc; close all;

%get files in current dir
inputFile_870 = dir('*_870.txt');
%%Other input files

%define the number of files
size_870 = size(inputFile_870);
%%Other variables

%set up 5 minute interval bins
BIN{1} = 0:4;
BIN{2} = 5:9;
BIN{3} = 10:14;
BIN{4} = 15:19;
BIN{5} = 20:24;
BIN{6} = 25:29;
BIN{7} = 30:34;
BIN{8} = 35:39;
BIN{9} = 40:44;
BIN{10} = 45:49;
BIN{11} = 50:54;
BIN{12} = 55:59;

%preallocate variables for the instrument
data_870 = cell(size_870);
babs_avg_870 = cell(size_870);
%%Other Variables

for i=1:size_870
%import data from the i_th file
data_870{i} = importdata(inputFile_870(i).name);
data_870{i} = data_870{i}.data;

%create a flag variable for each day
%flag will be a number defining which bin your date falls into
flag = nan(length(data_870{i}),1);

%search through file and define where each measurement falls into each
%5 minute bin
for j = 1:length(data_870{i})
if isempty(find(BIN{1} == data_870{i}(j,5), 1)) == 0
flag(j) = 0;
elseif isempty(find(BIN{2} == data_870{i}(j,5), 1)) == 0
flag(j) = 5;
elseif isempty(find(BIN{3} == data_870{i}(j,5), 1)) == 0
flag(j) = 10;
elseif isempty(find(BIN{4} == data_870{i}(j,5), 1)) == 0
flag(j) = 15;
elseif isempty(find(BIN{5} == data_870{i}(j,5), 1)) == 0
flag(j) = 20;
elseif isempty(find(BIN{6} == data_870{i}(j,5), 1)) == 0
flag(j) = 25;
elseif isempty(find(BIN{7} == data_870{i}(j,5), 1)) == 0
flag(j) = 30;
elseif isempty(find(BIN{8} == data_870{i}(j,5), 1)) == 0
flag(j) = 35;
elseif isempty(find(BIN{9} == data_870{i}(j,5), 1)) == 0
flag(j) = 40;
elseif isempty(find(BIN{10} == data_870{i}(j,5), 1)) == 0
flag(j) = 45;
elseif isempty(find(BIN{11} == data_870{i}(j,5), 1)) == 0
flag(j) = 50;
elseif isempty(find(BIN{12} == data_870{i}(j,5), 1)) == 0
flag(j) = 55;
end
end

%assign a datenumber to each valid date of data
dnum = datenum(data_870{i}(:,1),data_870{i}(:,2),data_870{i}(:,3),...
data_870{i}(:,4),flag(:,1),0);
%find the unique date values
uniqDate = unique(dnum);
%preallocate
babs_avg_870{i} = nan(length(uniqDate),1);
%%Other Variables

%find and perform operations on data
for j = 1:length(uniqDate);
loc = find(dnum == uniqDate(j));

babs_avg_870{i}(j) = mean(data_870{i}(loc,7));
%%Other operations
end
end
----------------
The problem with doing it this way is, of course, speed, because I have many large data files and variables to parse and to perform operations on. Secondly, I liked how the method that you suggested inserted zeros or NaNs or whatever for periods of time that were missing from the data. I need that. Thanks for all your suggestions

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi