How to create a new matrix adding vectors of different size based on time

3 views (last 30 days)
Hi,
Wondering if I could get some ideas, or guide me to a similar example, on how I can organize 300 times series of data, each time series has data in column 1 (numerical, double) and date/time (01-Jan-2000 00:00:00) in column 2. These time series can have from ~10,000 to ~115,000 data points, and can have both different start and end date/time and time steps between data points. Ideally I would like to end with a matrix that can have in column 1 the date/time, and then consecutive 300 columns with the data organized respect to date/time (column 1).
I believe my first step should be to use the function datetime to create a vector (column 1) with NaN's having the star and end date/time from the longest time series of these 300 time series, and an interval of time that represent the smallest time step from the time series. Then, add it as column 1 to a large matrix generated with just NaN’s that has the # of rows from column 1 and a total of 301 columns (date/time column plus 300 data columns). Thus a “background” matrix is generated. This is something I can do.
But my specific question is how I could run a loop that extract the index, based on date/time, for each of the 300 time series, and then insert the data of each time series into the “background” matrix, column by column.
This is a simple example of how my data looks like (date/time represented as serial date number from matlab):
730486.000 0.500 730486.014 0.800 730486.000 0.500
730486.007 NaN 730486.028 4.000 730486.021 NaN
730486.014 0.700 730486.042 2.100 730486.042 0.700
730486.021 3.000 730486.056 4.100 730486.063 3.000
730486.028 2.000 730486.083 2.000
730486.035 4.000 730486.104 4.000
730486.042 1.000 730486.125 1.000
730486.049 0.100 730486.140 0.100
730486.056 10.000 730486.167 10.000
730486.063 -1.000
And this is what I would like to generate:
730486.000 0.500 NaN 0.500
730486.007 NaN NaN NaN
730486.014 0.700 0.800 NaN
730486.021 3.000 NaN NaN
730486.028 2.000 4.000 NaN
730486.035 4.000 NaN NaN
730486.042 1.000 2.100 0.700
730486.049 0.100 NaN NaN
730486.056 10.000 4.100 NaN
730486.063 -1.000 NaN 3.000
730486.069 NaN NaN NaN
730486.076 NaN NaN NaN
730486.083 NaN NaN 2.000
730486.090 NaN NaN NaN
730486.097 NaN NaN NaN
730486.104 NaN NaN 4.000
730486.111 NaN NaN NaN
730486.118 NaN NaN NaN
730486.125 NaN NaN 1.000
730486.132 NaN NaN NaN
730486.139 NaN NaN NaN
730486.146 NaN NaN 0.100
730486.153 NaN NaN NaN
730486.160 NaN NaN NaN
730486.167 NaN NaN 10.000

Accepted Answer

David J. Mack
David J. Mack on 3 Mar 2017
Hey Robert!
First, a conceptual question: Can you make sure you have the same sampling interval & phase across all your signals (e.g. are the timestamps always like in your example?). If not, things will get really complicated (you have to resample which might increase your data amount by A LOT). If it is, then I would do two steps:
  1. Create a nTimeStampsTotal x nSignals data-matrix, where nTimeStampsTotal is the number of total timestamps (e.g. samples) from the first to the last timestamp among all signals (again: assuming constant sampling rate & phase) and nSignals is the number of signals.
  2. For each signal's timestamp find the according sample in the background matrix and assign them to yourmatrix.
The following code snippet exemplifies this.
data = NaN(nTimeStampsTotal,nSignals); % Data matrix.
for i = 1:nSignals
isSample = ismember(signalTimestamps{i},bgTimestamps); % Signal timestamp cell & background timestamp matrix.
data(isSample,i) = signalData{i}; % Signal data cell.
end
Be aware, that the ismember call assumes that ALL your timestamps in the signal are present in the background matrix!
Hope that helps, Greetings, David

More Answers (2)

Peter Perkins
Peter Perkins on 3 Mar 2017
If you have access to R2016b, I recommend looking at timetables. Even if you don't, I recommend looking at datetime and duration. This
>> .007*86400
ans =
604.8
seems like kind of a funny sampling rate, I guess it's the closest you can get to 10 min using only thousandths of days. And the fractional part of 730486.167 is not an even multiple of the apparent step size. Perhaps that's all just a display artifact, but things like this are one reason to switch from datenums to datetimes and durations.
>> datetime(2000,1,1,0,0:10:240,0)'
ans =
25×1 datetime array
01-Jan-2000 00:00:00
01-Jan-2000 00:10:00
[snip]
01-Jan-2000 04:00:00
or
>> datetime(2000,1,1) + (0:minutes(10):hours(4))';
Anyway, given the three matrices of datenums and values you show, stored as x1, x2, and x3, if you convert them to timetables
>> tt1 = timetable(datetime(x1(:,1),'ConvertFrom','datenum'),x1(:,2),'VariableNames',{'X1'})
tt1 =
10×1 timetable
Time X1
____________________ ___
01-Jan-2000 00:00:00 0.5
01-Jan-2000 00:10:04 NaN
01-Jan-2000 00:20:09 0.7
01-Jan-2000 00:30:14 3
01-Jan-2000 00:40:19 2
01-Jan-2000 00:50:24 4
01-Jan-2000 01:00:28 1
01-Jan-2000 01:10:33 0.1
01-Jan-2000 01:20:38 10
01-Jan-2000 01:30:43 -1
>> tt2 = timetable(datetime(x2(:,1),'ConvertFrom','datenum'),x2(:,2),'VariableNames',{'X2'});
>> tt3 = timetable(datetime(x3(:,1),'ConvertFrom','datenum'),x3(:,2),'VariableNames',{'X3'});
then this gets you one timetable, at the actual sampled times:
>> tt123 = synchronize(tt1,tt2,tt3,'union')
tt123 =
15×3 timetable
Time X1 X2 X3
____________________ ___ ___ ___
01-Jan-2000 00:00:00 0.5 NaN 0.5
01-Jan-2000 00:10:04 NaN NaN NaN
01-Jan-2000 00:20:09 0.7 0.8 NaN
01-Jan-2000 00:30:14 3 NaN NaN
01-Jan-2000 00:40:19 2 4 NaN
01-Jan-2000 00:50:24 4 NaN NaN
01-Jan-2000 01:00:28 1 2.1 0.7
01-Jan-2000 01:10:33 0.1 NaN NaN
01-Jan-2000 01:20:38 10 4.1 NaN
01-Jan-2000 01:30:43 -1 NaN 3
01-Jan-2000 01:59:31 NaN NaN 2
01-Jan-2000 02:29:45 NaN NaN 4
01-Jan-2000 03:00:00 NaN NaN 1
01-Jan-2000 03:21:36 NaN NaN 0.1
01-Jan-2000 04:00:28 NaN NaN 10
and this gets you the same data put onto the vector of datenums you showed:
>> dn = [730486.000 730486.007 730486.014 730486.021 730486.028 730486.035 730486.042 730486.049 730486.056 730486.063 730486.069 730486.076 730486.083 730486.090 730486.097 730486.104 730486.111 730486.118 730486.125 730486.132 730486.139 730486.146 730486.153 730486.160 730486.167];
tt123 = synchronize(tt1,tt2,tt3,datetime(dn,'ConvertFrom','datenum'))
tt123 =
25×3 timetable
Time X1 X2 X3
____________________ ___ ___ ___
01-Jan-2000 00:00:00 0.5 NaN 0.5
01-Jan-2000 00:10:04 NaN NaN NaN
01-Jan-2000 00:20:09 0.7 0.8 NaN
01-Jan-2000 00:30:14 3 NaN NaN
01-Jan-2000 00:40:19 2 4 NaN
01-Jan-2000 00:50:24 4 NaN NaN
01-Jan-2000 01:00:28 1 2.1 0.7
01-Jan-2000 01:10:33 0.1 NaN NaN
01-Jan-2000 01:20:38 10 4.1 NaN
01-Jan-2000 01:30:43 -1 NaN 3
01-Jan-2000 01:39:21 NaN NaN NaN
01-Jan-2000 01:49:26 NaN NaN NaN
01-Jan-2000 01:59:31 NaN NaN 2
01-Jan-2000 02:09:36 NaN NaN NaN
01-Jan-2000 02:19:40 NaN NaN NaN
01-Jan-2000 02:29:45 NaN NaN 4
01-Jan-2000 02:39:50 NaN NaN NaN
01-Jan-2000 02:49:55 NaN NaN NaN
01-Jan-2000 03:00:00 NaN NaN 1
01-Jan-2000 03:10:04 NaN NaN NaN
01-Jan-2000 03:20:09 NaN NaN NaN
01-Jan-2000 03:30:14 NaN NaN NaN
01-Jan-2000 03:40:19 NaN NaN NaN
01-Jan-2000 03:50:24 NaN NaN NaN
01-Jan-2000 04:00:28 NaN NaN 10
But I'm skeptical that that's what you actually want. If your datenums are just rounded for display, and you've really got a 10 min sample interval, both of the above work nicely. But if you really do have those irregular sample times, and want to project them on something more regular, aggregation within synchronize will do that too:
>> timesteps = datetime(2000,1,1) + (0:minutes(10):(hours(4)+minutes(10)));
>> tt123 = synchronize(tt1,tt2,tt3,timesteps,'firstvalue')
tt123 =
26×3 timetable
Time X1 X2 X3
____________________ ___ ___ ___
01-Jan-2000 00:00:00 0.5 NaN 0.5
01-Jan-2000 00:10:00 NaN NaN NaN
01-Jan-2000 00:20:00 0.7 0.8 NaN
01-Jan-2000 00:30:00 3 NaN NaN
01-Jan-2000 00:40:00 2 4 NaN
01-Jan-2000 00:50:00 4 NaN NaN
01-Jan-2000 01:00:00 1 2.1 0.7
01-Jan-2000 01:10:00 0.1 NaN NaN
01-Jan-2000 01:20:00 10 4.1 NaN
01-Jan-2000 01:30:00 -1 NaN 3
01-Jan-2000 01:40:00 NaN NaN NaN
01-Jan-2000 01:50:00 NaN NaN 2
01-Jan-2000 02:00:00 NaN NaN NaN
01-Jan-2000 02:10:00 NaN NaN NaN
01-Jan-2000 02:20:00 NaN NaN 4
01-Jan-2000 02:30:00 NaN NaN NaN
01-Jan-2000 02:40:00 NaN NaN NaN
01-Jan-2000 02:50:00 NaN NaN NaN
01-Jan-2000 03:00:00 NaN NaN 1
01-Jan-2000 03:10:00 NaN NaN NaN
01-Jan-2000 03:20:00 NaN NaN 0.1
01-Jan-2000 03:30:00 NaN NaN NaN
01-Jan-2000 03:40:00 NaN NaN NaN
01-Jan-2000 03:50:00 NaN NaN NaN
01-Jan-2000 04:00:00 NaN NaN 10
01-Jan-2000 04:10:00 NaN NaN NaN
Hope this helps.
  2 Comments
Robert
Robert on 6 Mar 2017
Thank you Peter, I am looking at your code and trying to apply it to my real data set, which is a little different than my example above. Yes, my data has 10 minute sample interval, in some cases, during 15 years. When I create the timetable and proceed to synchronize as you suggested, because my original data has NaN’s, now they were replaced by NaT’s in the timetable, and the synchronize function doesn't work. I tried to use the function isnat and get an index, but once it is a timetable cannot find the way to replace the NaT's for NaN's, to make the synchronization to work. Any suggestions?
Peter Perkins
Peter Perkins on 7 Mar 2017
Edited: Peter Perkins on 7 Mar 2017
Robert, I'm not able to follow your description. Can you give a short example of what's happening? I'm puzzled how you'd get NaTs in the time vector.

Sign in to comment.


Andrei Bobrov
Andrei Bobrov on 3 Mar 2017
Edited: Andrei Bobrov on 3 Mar 2017
[EDIT]
Let your data - A (cell array):
A = {[ 730486.000 0.500
730486.007 NaN
730486.014 0.700
730486.021 3.000
730486.028 2.000
730486.035 4.000
730486.042 1.000
730486.049 0.100
730486.056 10.000
730486.063 -1.000];
[ 730486.014 0.800
730486.028 4.000
730486.042 2.100
730486.056 4.100];
[ 730486.000 0.500
730486.021 NaN
730486.042 0.700
730486.063 3.000
730486.084 2.000
730486.105 4.000
730486.126 1.000
730486.140 0.100
730486.168 10.000]};
n = cellfun('size',A,1);
data = [cat(1,A{:}),repelem((1:numel(A))',n)];
[a,~,c] = unique(data(:,1));
out = [a,accumarray([c,data(:,3)],data(:,2),[],[],nan)];
out1 = (out(1,1):7e-3:out(end,1))';
out1(:,2:4) = nan;
out1(ismembertol(out1(:,1),out(:,1),eps),:) = out;
  3 Comments
Robert
Robert on 3 Mar 2017
Andrei, if I may, could I ask if you could guide with this related step? Originally the data comes in two files, file1 = columns are data, per site, while file2 = columns are dates, per site. How could I create a similar cell arrangement as A in your example, but having pair like [file1(:,1) file2(:,1) file2(:,2) file2(:,2) … etc]. I tried this and some modifications, but don’t do what I need:
for gg=1:size(file1,2)
B=([file2(:,gg) file1(:,gg)]);
G{gg,:}=num2cell(B);
end
Your comments are welcome.
Robert
Robert on 3 Mar 2017
I found the solution to this problem, see below. However I am having a size data problem when using accumarray. "Error using accumarray array exceeds maximum array size preference".
Wondering if you may have a solution to go around this problem by using maybe cell arrays or structures, when creating "out"?
gg=1:size(file1,2)
B=([file2(:,gg) file1(:,gg)]);
G{:,gg}=(B);
end

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!