Cut timetable data following certain conditions

8 views (last 30 days)
I have created a timetable that has one minute timesteps. I would like to take this timetable and create most likely a cell object of chunks of the data following these conditions listed below. If there is a better way to get this data, I'd love to hear suggestions. Thanks!
Criteria:
-to keep data
  1. At least 5 min of continuous data (so positive, non-zero values)
  2. Maximum of 5 minutes of no non-zero values between times whne data is present.
-to break up data into chunks.
  1. No data for at least 60 minutes.
So, if there are five or more records with data, it should be taken and saved until the continuous records stop or, less than 6 values of no data between records occur. Then, after finding all these groups of data, I'd like to separate the groups with 60 minutes or more of no data between them.
Here's some sample data (timetable format):
'06-Feb-2019 03:40:00' 0 0
'06-Feb-2019 03:41:00' 0 0
'06-Feb-2019 03:42:00' 0 0
'06-Feb-2019 03:43:00' 0 0
'06-Feb-2019 03:44:00' 0 0
'06-Feb-2019 03:45:00' 0 0
'06-Feb-2019 03:46:00' 0 0
'06-Feb-2019 03:47:00' 0 0
'06-Feb-2019 03:48:00' 0 0
'06-Feb-2019 03:49:00' 0 0
'06-Feb-2019 03:50:00' 0.000900000000000000 0.00266666666666667
'06-Feb-2019 03:51:00' 0.00218333333333333 0.0195000000000000
'06-Feb-2019 03:52:00' 0 0
'06-Feb-2019 03:53:00' 0.00213333333333333 0.0118333333333333
'06-Feb-2019 03:54:00' 0.00286666666666667 0.0161666666666667
'06-Feb-2019 03:55:00' 0.00311666666666667 0.0213333333333333
'06-Feb-2019 03:56:00' 0.00161666666666667 0.00833333333333333
'06-Feb-2019 03:57:00' 0.00313333333333333 0.0173333333333333
'06-Feb-2019 03:58:00' 0.000633333333333333 0.00200000000000000
'06-Feb-2019 03:59:00' 0.00451666666666667 0.0521666666666667
'06-Feb-2019 04:00:00' 0 0
'06-Feb-2019 04:01:00' 0.000616666666666667 0.00183333333333333
'06-Feb-2019 04:02:00' 0 0
'06-Feb-2019 04:03:00' 0.000466666666666667 0.00133333333333333
'06-Feb-2019 04:04:00' 0.000500000000000000 0.00116666666666667
'06-Feb-2019 04:05:00' 0.00281666666666667 0.0141666666666667
'06-Feb-2019 04:06:00' 0.00376666666666667 0.0195000000000000
'06-Feb-2019 04:07:00' 0.00371666666666667 0.0150000000000000
'06-Feb-2019 04:08:00' 0.00235000000000000 0.00800000000000000
'06-Feb-2019 04:09:00' 0.00370000000000000 0.0151666666666667
'06-Feb-2019 04:10:00' 0.00341666666666667 0.0245000000000000
'06-Feb-2019 04:11:00' 0.00320000000000000 0.0170000000000000
'06-Feb-2019 04:12:00' 0.00410000000000000 0.0181666666666667
'06-Feb-2019 04:13:00' 0.00535000000000000 0.0396666666666667
'06-Feb-2019 04:14:00' 0.0116666666666667 0.0598333333333333
'06-Feb-2019 04:15:00' 0.00908333333333333 0.0375000000000000
'06-Feb-2019 04:16:00' 0.00708333333333333 0.0265000000000000
'06-Feb-2019 04:17:00' 0.00408333333333333 0.0126666666666667
'06-Feb-2019 04:18:00' 0.00598333333333333 0.0451666666666667
'06-Feb-2019 04:19:00' 0.0410166666666667 0.397333333333333
'06-Feb-2019 04:20:00' 0.0596166666666667 0.680333333333333
'06-Feb-2019 04:21:00' 0.0482500000000000 0.554166666666667
'06-Feb-2019 04:22:00' 0.0347333333333333 0.447500000000000
'06-Feb-2019 04:23:00' 0.00551666666666667 0.0335000000000000
'06-Feb-2019 04:24:00' 0.00425000000000000 0.0196666666666667
'06-Feb-2019 04:25:00' 0.00553333333333333 0.0450000000000000
'06-Feb-2019 04:26:00' 0.00823333333333333 0.0691666666666667
'06-Feb-2019 04:27:00' 0.00661666666666667 0.0560000000000000
'06-Feb-2019 04:28:00' 0.00968333333333333 0.0676666666666667
'06-Feb-2019 04:29:00' 0.00733333333333333 0.0455000000000000
'06-Feb-2019 04:30:00' 0.00471666666666667 0.0340000000000000
'06-Feb-2019 04:31:00' 0.00295000000000000 0.0191666666666667
'06-Feb-2019 04:32:00' 0.00441666666666667 0.0326666666666667
'06-Feb-2019 04:33:00' 0.00618333333333333 0.0575000000000000
'06-Feb-2019 04:34:00' 0.00696666666666667 0.0793333333333333
'06-Feb-2019 04:35:00' 0 0
'06-Feb-2019 04:36:00' 0 0
'06-Feb-2019 04:37:00' 0.000966666666666667 0.00616666666666667
'06-Feb-2019 04:38:00' 0 0
'06-Feb-2019 04:39:00' 0 0
'06-Feb-2019 04:40:00' 0 0
The data above should all be grouped together becasue breaks from 3:50 to 4:37 are less than 5 minutes. Let's assume that between the first timestep and ending timestep there were 60 consecutive zero values. I should metion that in all cases if one Variable is = 0 the second one will as well.
  8 Comments
Eric Escoto
Eric Escoto on 6 Dec 2019
Edited: Eric Escoto on 10 Dec 2019
@Adam Danz, Hi, not yet. Ill try it out soon. Finals week! Had to push this back a few days. Soon.

Sign in to comment.

Accepted Answer

Adam Danz
Adam Danz on 3 Dec 2019
Edited: Adam Danz on 5 Dec 2019
Well, that took a little more time than I anticipated!
Here's a demo with inline comments explaining the details.
I added a few lines to the demo data from your question so that some data at the end would be excluded. Here I just convert the data into a timetable.
% Create demo data (Column vector)
t = {'06-Feb-2019 03:40:00' 0 0
'06-Feb-2019 03:41:00' 0 0
'06-Feb-2019 03:42:00' 0 0
'06-Feb-2019 03:43:00' 0 0
'06-Feb-2019 03:44:00' 0 0
'06-Feb-2019 03:45:00' 0 0
'06-Feb-2019 03:46:00' 0 0
'06-Feb-2019 03:47:00' 0 0
'06-Feb-2019 03:48:00' 0 0
'06-Feb-2019 03:49:00' 0 0
'06-Feb-2019 03:50:00' 0.000900000000000000 0.00266666666666667
'06-Feb-2019 03:51:00' 0.00218333333333333 0.0195000000000000
'06-Feb-2019 03:52:00' 0 0
'06-Feb-2019 03:53:00' 0.00213333333333333 0.0118333333333333
'06-Feb-2019 03:54:00' 0.00286666666666667 0.0161666666666667
'06-Feb-2019 03:55:00' 0.00311666666666667 0.0213333333333333
'06-Feb-2019 03:56:00' 0.00161666666666667 0.00833333333333333
'06-Feb-2019 03:57:00' 0.00313333333333333 0.0173333333333333
'06-Feb-2019 03:58:00' 0.000633333333333333 0.00200000000000000
'06-Feb-2019 03:59:00' 0.00451666666666667 0.0521666666666667
'06-Feb-2019 04:00:00' 0 0
'06-Feb-2019 04:01:00' 0.000616666666666667 0.00183333333333333
'06-Feb-2019 04:02:00' 0 0
'06-Feb-2019 04:03:00' 0.000466666666666667 0.00133333333333333
'06-Feb-2019 04:04:00' 0.000500000000000000 0.00116666666666667
'06-Feb-2019 04:05:00' 0.00281666666666667 0.0141666666666667
'06-Feb-2019 04:06:00' 0.00376666666666667 0.0195000000000000
'06-Feb-2019 04:07:00' 0.00371666666666667 0.0150000000000000
'06-Feb-2019 04:08:00' 0.00235000000000000 0.00800000000000000
'06-Feb-2019 04:09:00' 0.00370000000000000 0.0151666666666667
'06-Feb-2019 04:10:00' 0.00341666666666667 0.0245000000000000
'06-Feb-2019 04:11:00' 0.00320000000000000 0.0170000000000000
'06-Feb-2019 04:12:00' 0.00410000000000000 0.0181666666666667
'06-Feb-2019 04:13:00' 0.00535000000000000 0.0396666666666667
'06-Feb-2019 04:14:00' 0.0116666666666667 0.0598333333333333
'06-Feb-2019 04:15:00' 0.00908333333333333 0.0375000000000000
'06-Feb-2019 04:16:00' 0.00708333333333333 0.0265000000000000
'06-Feb-2019 04:17:00' 0.00408333333333333 0.0126666666666667
'06-Feb-2019 04:18:00' 0.00598333333333333 0.0451666666666667
'06-Feb-2019 04:19:00' 0.0410166666666667 0.397333333333333
'06-Feb-2019 04:20:00' 0.0596166666666667 0.680333333333333
'06-Feb-2019 04:21:00' 0.0482500000000000 0.554166666666667
'06-Feb-2019 04:22:00' 0.0347333333333333 0.447500000000000
'06-Feb-2019 04:23:00' 0.00551666666666667 0.0335000000000000
'06-Feb-2019 04:24:00' 0.00425000000000000 0.0196666666666667
'06-Feb-2019 04:25:00' 0.00553333333333333 0.0450000000000000
'06-Feb-2019 04:26:00' 0.00823333333333333 0.0691666666666667
'06-Feb-2019 04:27:00' 0.00661666666666667 0.0560000000000000
'06-Feb-2019 04:28:00' 0.00968333333333333 0.0676666666666667
'06-Feb-2019 04:29:00' 0.00733333333333333 0.0455000000000000
'06-Feb-2019 04:30:00' 0.00471666666666667 0.0340000000000000
'06-Feb-2019 04:31:00' 0.00295000000000000 0.0191666666666667
'06-Feb-2019 04:32:00' 0.00441666666666667 0.0326666666666667
'06-Feb-2019 04:33:00' 0.00618333333333333 0.0575000000000000
'06-Feb-2019 04:34:00' 0.00696666666666667 0.0793333333333333
'06-Feb-2019 04:35:00' 0 0
'06-Feb-2019 04:36:00' 0 0
'06-Feb-2019 04:37:00' 0.000966666666666667 0.00616666666666667
'06-Feb-2019 04:38:00' 0 0
'06-Feb-2019 04:39:00' 0 0
'06-Feb-2019 04:40:00' 0 0
'06-Feb-2019 04:41:00' 0 0
'06-Feb-2019 04:42:00' 0 0
'06-Feb-2019 04:43:00' 0 0
'06-Feb-2019 04:44:00' 0 0
'06-Feb-2019 04:45:00' 0 0
'06-Feb-2019 04:46:00' 0.000966666666666667 0.00616666666666667
'06-Feb-2019 04:47:00' 0.000966666666666667 0.00616666666666667
'06-Feb-2019 04:48:00' 0 0};
dates = datetime(t(:,1),'InputFormat','dd-MMM-yyyy HH:mm:ss'); % datetime vector
TT = array2timetable(cell2mat(t(:,2:end)),'RowTimes', dates); % timetable
From here forward the only input is TT which is the timetable produced above. This adds two new columns to the timetable:
  • Safe which is a logical vector identifying the groups of consecutive data +/- 5 rows.
  • Group which identifies the group number for each consecutive block (this example data only has 1 block).
% Identify rows with missing data in any column. Missing data
% are considered to be values of 0.
missingIdx = any(TT{:,:}==0,2);
% Identify groups of consecutive data
consecGroups = bwlabel(~missingIdx); %req image processing toolbox
consecGroups(consecGroups==0) = nan;
groupDuration = splitapply(@range, TT.Properties.RowTimes, consecGroups);
% mark grouped data that spans time greater than minTime as safe
minTime = minutes(5);
safe = false(size(consecGroups));
safe(ismember(consecGroups,find(groupDuration > minTime))) = true;
% Locate not-safe data that are within maxRows number of rows from safe data
maxRows = 5;
notSafeRows = find(~safe);
select = min(abs(find(safe)-notSafeRows.')) < maxRows; % implicit expansion requires >=r2016b; 'safe' is col vec
safe(notSafeRows(select)) = true;
% Assign safe vector to timetable and group number
TT.Safe = safe;
TT.Group = bwlabel(safe); %requires image proc toolbox
Now if you want to perform statistics on a certain group,
sum(TT.Var1(TT.Group==1))
  21 Comments
Adam Danz
Adam Danz on 5 Jan 2020
Ok.... I think this should do the trick.
Replace this section
% Locate not-safe data that are within maxRows number of rows from safe data
maxRows = 5;
notSafeRows = find(~safe);
select = min(abs(find(safe)-notSafeRows.')) < maxRows; % implicit expansion requires >=r2016b; 'safe' is col vec
safe(notSafeRows(select)) = true;
with this section
% Sliding window method
maxRow = 5;
for i = 1:numel(safe)-maxRow
tempWindow = safe(i:i+maxRow-1);
if tempWindow(1) && tempWindow(end)
safe(i:i+maxRow-1) = true;
end
end
The slidiing window is a window whose length is defined by maxRows. It starts at the top of the safe vector and slides through the vector marking everying within the window as TRUE as long as the first and last elements of the window are TRUE.
Eric Escoto
Eric Escoto on 7 Jan 2020
Edited: Eric Escoto on 7 Jan 2020
I've been playing around with these lines and still don't get what I'm after. I've attached a better example of data to illustrate the desired outcome.
In the attacehd data, we have a timetable with 4420 rows. Using the first set of code, it parses out into three groups as can be seen in the fourth column. In row 4413 there is data in column 1 and 2, but it's more than 60 rows away from the previous data, and is not sandwiched by four other consecutive rows so this will be ignored as a group.
The first three groups should be only one because there is less than 60 rows of no data between groups.
Data locations.
Group 1 in rows 535:550
Group 2 in rows 555:654
Group 3 rows in 692:741
Single record at row 4413
In this case, the safe vector would also be TRUE for rows where data is sandwiched by less than 60 rows of no data. For the atached data set, the outcome should be one grouping from rows 538:740.
It may wind up that the first four rows of each group would be included but I'm thinking that I could then just manually remove them from each group.

Sign in to comment.

More Answers (0)

Categories

Find more on Data Preprocessing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!