How to convert dates and times from txt to numbers?

2 views (last 30 days)
kpl
kpl on 10 Jun 2017
Edited: dpb on 15 Jun 2017
I have two txt files! The first one contains these data. It displays the activities of a person for different times.
2011-11-28 02:27:59 2011-11-28 10:18:11 Sleeping
2011-11-28 10:21:24 2011-11-28 10:23:36 Toileting
2011-11-28 10:25:44 2011-11-28 10:33:00 Showering
2011-11-28 10:34:23 2011-11-28 10:43:00 Breakfast
2011-11-28 10:49:48 2011-11-28 10:51:13 Grooming
2011-11-28 10:51:41 2011-11-28 13:05:07 Spare_Time/TV
2011-11-28 13:06:04 2011-11-28 13:06:31 Toileting
2011-11-28 13:09:31 2011-11-28 13:29:09 Leaving
This is my second txt file.This txt displays the time where a sensor is open for different times.. I want to read the whole txts and convert the dates and times from each row to numbers, in order to compare them in the end. I assume the activities with these kind of settings: Leaving --> 1 Toileting --> 2 Showering --> 3 Sleeping --> 4 Breakfast --> 5 Lunch --> 6 Dinner --> 7 Snack --> 8 Spare_Time/TV --> 9 Grooming --> 10. So my array would look like this: target(p,:) = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. I put 1, because the sensor(bed) is open for the activity sleeping the curent time and 0 where the sensor is not open.
Start time End time Location Type Place
-------------------- -------------------- -------- -------- -----
2011-11-28 02:27:59 2011-11-28 10:18:11 Bed Pressure Bedroom
2011-11-28 10:21:24 2011-11-28 10:21:31 Cabinet Magnetic Bathroom
2011-11-28 10:21:44 2011-11-28 10:23:31 Basin PIR Bathroom
2011-11-28 10:23:02 2011-11-28 10:23:36 Toilet Flush Bathroom
2011-11-28 10:25:44 2011-11-28 10:32:06 Shower PIR Bathroom
2011-11-28 10:34:23 2011-11-28 10:34:41 Fridge Magnetic Kitchen
2011-11-28 10:34:44 2011-11-28 10:37:17 Cupboard Magnetic Kitchen
2011-11-28 10:38:00 2011-11-28 10:42:41 Toaster Electric Kitchen
2011-11-28 10:38:33 2011-11-28 10:38:40 Fridge Magnetic Kitchen
2011-11-28 10:41:29 2011-11-28 10:41:36 Cupboard Magnetic Kitchen
2011-11-28 10:41:43 2011-11-28 10:41:59 Cooktop PIR Kitchen
2011-11-28 10:41:59 2011-11-28 10:42:55 Microwave Electric Kitchen
The sensors should be like this: Shower --> 1 Basin --> 2 Cooktop --> 3 Maindoor --> 4 Fridge --> 5 Cabinet --> 6 Cupboard --> 7 Toilet --> 8 Seat --> 9 Bed --> 10 Microwave --> 11 Toaster --> 12
This array should look like this: x(p,:) = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], because sensor bed is open this current time. I tried to use a different method, but i doesn't seem to work properly. Below is my code,but i want to use the above method.
clear; clc; close all;
fh = fopen('OrdonezA_ADLs.txt','r');
Data=textscan(fh, '%s%s%s%s%s');
fclose(fh);
n=size(Data,1);
n = n-4;
P = length(Data{1});
x = zeros(n,P);
for pat=1:P
for i=1:n
x(i,pat) = double(Data{i+4}(pat));
end
end
t = zeros(1,P); % targets
for pat=1:P
if strcmp(Data{1,5}(pat),'Sleeping')
t(pat) = 1;
elseif strcmp(Data{1,5}(pat),'Toileting')
t(pat) = 1;
elseif strcmp(Data{1,5}(pat),'Spare_Time/TV')
t(pat) = 1;
elseif strcmp(Data{1,5}(pat),'Grooming')
t(pat) = 1;
elseif strcmp(Data{1,5}(pat),'Snack')
t(pat) = 1;
elseif strcmp(Data{1,5}(pat),'Showering')
t(pat) = 1;
else
t(pat) = 0;
end
end
  1 Comment
dpb
dpb on 11 Jun 2017
"Sensors: Shower--> 1 Basin --> 2 Cooktop --> 3 ... Toaster --> 12"
>> locs={'Shower' 'Basin' 'Cooktop' 'Maindoor' 'Fridge' 'Cabinet' 'Cupboard' 'Toilet' 'Seat' 'Bed' 'Microwave' 'Toaster'}.';
>> idx=[1:length(sensorlocs)].';
>> sensors=categorical(idx,idx,locs)
sensors =
Shower
Basin
Cooktop
Maindoor
Fridge
Cabinet
Cupboard
Toilet
Seat
Bed
Microwave
Toaster
>> double(sensors)
ans =
1
2
3
4
5
6
7
8
9
10
11
12
>>
Best of both worlds--the variables are identified by something meaningful but the values can still be 1:N if you think they must or the outside data comes in as numeric instead of the string representation. But, that isn't what your data file indicates, it's using the string so really would be easier to use it as is. It doesn't really appear there's any need for the ordinal sequence number other than that's just what you started with...what does the number matter if you can keep track of them by name instead?

Sign in to comment.

Answers (2)

dpb
dpb on 10 Jun 2017
I'd suggest using table for such data/analyses. I put your first file into a local file 'kpl.txt' and manipulated it with a few simple machinations to achieve--
>> kpl=readtable('kpl.txt','delimiter','\t','readvariablenames',0); % read the data
>> kpl=kpl(:,~all(ismissing(kpl))); % eliminate empty from multiple delimiters
>> kpl.Properties.VariableNames={'Start' 'End' 'Activity'} % meaningful column names
kpl =
Start End Activity
_____________________ _____________________ _______________
'2011-11-28 02:27:59' '2011-11-28 10:18:11' 'Sleeping'
'2011-11-28 10:21:24' '2011-11-28 10:23:36' 'Toileting'
'2011-11-28 10:25:44' '2011-11-28 10:33:00' 'Showering'
'2011-11-28 10:34:23' '2011-11-28 10:43:00' 'Breakfast'
'2011-11-28 10:49:48' '2011-11-28 10:51:13' 'Grooming'
'2011-11-28 10:51:41' '2011-11-28 13:05:07' 'Spare_Time/TV'
'2011-11-28 13:06:04' '2011-11-28 13:06:31' 'Toileting'
'2011-11-28 13:09:31' '2011-11-28 13:29:09' 'Leaving'
>>
At this point all are cell arrays; let's turn into useful forms of datetime and categorical for the activities--
>> kpl.Start=datetime(kpl.Start,'inputformat','yyyy-MM-dd HH:mm:ss');
>> kpl.End=datetime(kpl.End,'inputformat','yyyy-MM-dd HH:mm:ss');
>> kpl.Activity=categorical(kpl.Activity);
>> kpl.Duration=(kpl.End-kpl.Start) % also would like the time each takes
kpl =
Start End Activity Duration
____________________ ____________________ _____________ ________
28-Nov-2011 02:27:59 28-Nov-2011 10:18:11 Sleeping 07:50:12
28-Nov-2011 10:21:24 28-Nov-2011 10:23:36 Toileting 00:02:12
28-Nov-2011 10:25:44 28-Nov-2011 10:33:00 Showering 00:07:16
28-Nov-2011 10:34:23 28-Nov-2011 10:43:00 Breakfast 00:08:37
28-Nov-2011 10:49:48 28-Nov-2011 10:51:13 Grooming 00:01:25
28-Nov-2011 10:51:41 28-Nov-2011 13:05:07 Spare_Time/TV 02:13:26
28-Nov-2011 13:06:04 28-Nov-2011 13:06:31 Toileting 00:00:27
28-Nov-2011 13:09:31 28-Nov-2011 13:29:09 Leaving 00:19:38
>>
At this point you've got the above table to query for any activity or interval as desired.
>> varfun(@sum,kpl,'groupingvariable','Activity','inputvariables','Duration')
ans =
Activity GroupCount sum_Duration
_____________ __________ ____________
Breakfast Breakfast 1 00:08:37
Grooming Grooming 1 00:01:25
Leaving Leaving 1 00:19:38
Showering Showering 1 00:07:16
Sleeping Sleeping 1 07:50:12
Spare_Time/TV Spare_Time/TV 1 02:13:26
Toileting Toileting 2 00:02:39
>>
Only slight modifications would be needed for the other file to create corollary table from it.
  12 Comments
kpl
kpl on 12 Jun 2017
It works.The only thing that matters now is table x. I want to fill x with all the values from the first file like this:
P = length(kpl{1,:});
n = size(kpl,1);
x = zeros(n,P);
for pat=1:P
for i=1:n
x(i,pat) =(kpl{i,:}(pat));
end
end
Although, i get an error on this line
x(i,pat) =(kpl{i,:}(pat));
I don't want to use this command
x = readtable('OrdonezA_ADLs.txt');
dpb
dpb on 12 Jun 2017
I can't even get the first line to work to determine P. We built the tables, let's use them; they're much more convenient than trying to remember indices and stuff.

Sign in to comment.


dpb
dpb on 12 Jun 2017
Edited: dpb on 12 Jun 2017
OK, with the updated information on what is wanted the overall code is accumulated here in one spot instead of in the string of comments piecemeal--
First, I'll presume you can clean up the data files to eliminate the duplicated delimiters and so on--that's outside the domain of solving the actual problem. There's only one detail that was omitted in creating the tables initially--maintaining the desired order of categorical variables as you're designated them to be; initially I really didn't think it would matter and it really doesn't other than you do need a correspondence so we can use the one you've already defined--
sensorlocs={'Shower' 'Basin' 'Cooktop' 'Maindoor' 'Fridge' 'Cabinet' 'Cupboard' 'Toilet' 'Seat' 'Bed' 'Microwave' 'Toaster'}.';
activities={'Leaving' 'Toileting' 'Showering' 'Sleeping' 'Breakfast' 'Lunch' 'Dinner' 'Snack' 'Spare_Time/TV' 'Grooming'}.';
act=readtable('kpl.txt','delimiter','\t','readvariablenames',0); % read the activity data
act=act(:,~all(ismissing(act))); % eliminate empty rows
act.Properties.VariableNames={'Start' 'End' 'Activity'}; % meaningful column names
act.Start=datetime(act.Start,'inputformat','yyyy-MM-dd HH:mm:ss');
act.End=datetime(act.End,'inputformat','yyyy-MM-dd HH:mm:ss');
act.Activity=categorical(act.Activity,activities); % use assigned numerical order
act.Duration=(act.End-kpl.Start);
I cleaned up the other dataset(*); with that I could simply
sens=readtable('OrdonezA_Sensors.txt');
sens.Location=categorical(sens.Location,sensorlocs);
sens.Type=categorical(sens.Type);
sens.Place=categorical(sens.Place);
and end up with the usable table. If there are specific orders for Location and Place, define and use them analogously as to Location; you didn't give those correlations. From this morning for the time table--
fnOverlap=@(ts,te,Ts,Te) (te>=Ts) & (ts<=Te); % t interval inside interval T
timtab=false(height(act),height(sens);
for i=1:height(act)
timtab(i,:)=fnOverlap(act{i,'Start'},act{i,'End'},sens.Start,sens.End).';
end
% compute the sensor truth table from the time sequencing
senstab=false(height(act),height(sens);
for i=1:height(act)
senstab(i,sens{timtab(i,:),'Location'))=true;
end
The result for the two datasets provided looks like:
>> senstab
senstab =
0 0 0 0 0 0 0 0 0 1 0 0
0 1 0 0 0 1 0 1 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 1 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
>>
NB: The two loops both operate over the first table rows and the locations of interest are found in the output from the anonymous function so there's really no reason one couldn't merge the two and even do away with the time overlap array entirely altho I figured it would likely be of interest somewhere on down the road.
Anyway, that seems to me far simpler than trying to deal with cell arrays and combinations of different data types and such explicitly--the table is really suited to handle such disparate types compactly and with highly readable (and hence maintainable) code plus expanding it to handle more sensors or locations or whatever is essentially trivial. Plus, the other analysis functions can be brought to bear to do much more in-depth probing of the data.
Granted, it will take a little bit of reading up on the table data class, but think the investment will save much more time going forward than the initial effort will take.
(*) See followup comment
  6 Comments
dpb
dpb on 13 Jun 2017
Edited: dpb on 13 Jun 2017
See the sections under the link at <DATETIME Operations>, particularly the two on date arithmetic and comparisons; it'll explain the concepts and show examples and you'll continue to beat your head against a dead end alley ignoring the concepts therein.
Similarly, read and study the sections under <Data Types> particularly the categorical and table types. With R2016b there's also a timetable type, but even if you have that late a release it is designed for a specific time per record, not intervals such as your data so don't think it'll help much; the table really is the best structure here imo, you need to allow it to help instead of fighting against it.
dpb
dpb on 14 Jun 2017
Edited: dpb on 15 Jun 2017
One last effort...
"The only code i managed to get it το work is this:"
fh = fopen('OrdonezA_ADLs.txt','r');
Data=textscan(fh, '%s%s%s%s%s');
fclose(fh);
There's no point in the above; readtable following reads the data and we subsequently massage it into the needed forms of datetime for the dates and categorical for the other variables of locations, activities, etc., etc., that they are that gives the best of both worlds as well as does the datetime class--a numeric representation internally that allows for comparisons and tests for equality easily plus the ability to actual have human-readable outputs for reporting and debugging.
Plus, it's simple to build the "regular" array of "when who does what and where" by just straight logical addressing making use of the fact the categorical arrays ARE numeric, their values are then the indices into the array directly.
So, as noted before, once you do read the data into the table, there's no reason at all to revert to trying to access cell arrays of strings with all the perils of only having numeric column indexing to keep track of--the table gives you meaningful names to use for each as well as for the values.
"Use the tools available at hand, Luke!!!" :)

Sign in to comment.

Categories

Find more on Data Type Identification in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!