How do I extract numbers from mixed data to create a table?

2 views (last 30 days)
I have a bunch of data in the following format.
190208:124540.835217 Tx:MOV x pos: 1637941µm prec: 1000 acc: 0 mode: 00 id: 1691
190208:124541.140924 Rx:MOV x status: 0 pos: 1637130µm id: 1691
190208:124541.517632 Tx:MOV x pos: 979337µm prec: 0 acc: 0 mode: 00 id: 1719
190208:124541.657078 Rx:MOV x status: 0 pos: 979341µm id: 1719
190208:124541.797484 Tx:MOV x pos: 1637941µm prec: 1000 acc: 0 mode: 00 id: 1741
190208:124542.104695 Rx:MOV x status: 0 pos: 1637113µm id: 1741
What I would like to do is to extract the time (in bold) and the position (underlined) and organize them in a table. Each row should contain the position responding to a specific time. Is there any way to code so that I don't have to do this manually?
  9 Comments
madhan ravi
madhan ravi on 11 Feb 2019
Edited: madhan ravi on 11 Feb 2019
Tx has eof in some places instead of pos: so what do you want to extract there?

Sign in to comment.

Answers (2)

madhan ravi
madhan ravi on 11 Feb 2019
s={'190208:123346.128737 Tx:MOV x pos: 1668224µm prec: 2 acc: 0 mode: 00 id: 7056'
'190208:123346.577784 Rx:MOV x status: 0 pos: 1668224µm id: 7056'
'190208:123347.191182 Tx:MOV x pos: 1668232µm prec: 2 acc: 0 mode: 00 id: 7066'
'190208:123347.233088 Rx:MOV x status: 0 pos: 1668233µm id: 7066'
'190208:123347.414860 Tx:MOV x pos: 1668231µm prec: 2 acc: 0 mode: 00 id: 7076'
'190208:123347.453385 Rx:MOV x status: 0 pos: 1668231µm id: 7076'
'190208:123347.512106 Tx:MOV x pos: 1552037µm prec: 2 acc: 0 mode: 00 id: 7111'
'190208:123347.691557 Rx:MOV x status: 0 pos: 1552036µm id: 7111'
'190208:123348.056873 Tx:MOV x pos: 1552114µm prec: 2 acc: 0 mode: 00 id: 7121'
'190208:123348.121175 Rx:MOV x status: 0 pos: 1552114µm id: 7121'
'190208:123348.288643 Tx:MOV x pos: 1552114µm prec: 2 acc: 0 mode: 00 id: 7131'
'190208:123348.328085 Rx:MOV x status: 0 pos: 1552114µm id: 7131'
'190208:123348.370362 Tx:MOV x pos: 1668316µm prec: 2 acc: 0 mode: 00 id: 7166'
'190208:123348.550054 Rx:MOV x status: 0 pos: 1668318µm id: 7166'
'190208:123348.698950 Tx:MOV x pos: 1668399µm prec: 2 acc: 0 mode: 00 id: 7176'};
% code starts here
Tx=table;
tx=s(contains(s,'Tx'));
Tx.TIME=str2double(regexp(tx,'\d+.?\d*(?=.Tx)','match','once'));
Tx.POSITION=str2double(regexp(tx,'(?<=pos:.*)\d*','match','once'));
Rx=table;
rx=s(contains(s,'Rx'));
Rx.TIME=str2double(regexp(rx,'\d+.?\d*(?=.Rx)','match','once'));
Rx.POSITION=str2double(regexp(rx,'(?<=pos:.*)\d*','match','once'));
Gives:
>> Rx
Rx =
7×2 table
TIME POSITION
_____________ ________
123346.577784 1668224
123347.233088 1668233
123347.453385 1668231
123347.691557 1552036
123348.121175 1552114
123348.328085 1552114
123348.550054 1668318
>> Tx
Tx =
8×2 table
TIME POSITION
_____________ ________
123346.128737 1668224
123347.191182 1668232
123347.41486 1668231
123347.512106 1552037
123348.056873 1552114
123348.288643 1552114
123348.370362 1668316
123348.69895 1668399
>>
  10 Comments
Guillaume
Guillaume on 12 Feb 2019
One thing to be aware of with this solution, is that table decode the file using your OS locale which is unlikely to be UTF8 (It certainly won't be on Windows). It will therefore mangle some characters at the file should be read as UTF8.

Sign in to comment.


Guillaume
Guillaume on 11 Feb 2019
Your log1.txt file is encoded in UTF8 and starts with a BOM (a bit unusual). Reading as text without telling matlab it's utf8 may mangle some text.
The following should decode your file correctly:
fid = fopen('log1.txt', 'rt', 'n', 'UTF8'); %tells matlab it's encoded in utf8
text = fread(fid, [1 Inf], '*char');
timepos = regexp(text, '(\d+:\d+\.\d+).*?pos: (\d+)', 'tokens');
timepos = vertcat(timepos{:});
time = datetime(timepos(:, 1), 'InputFormat', 'ddMMyy:HHmmss.SSSSSS'); %I'm assuming that's the correct format
position = str2double(timepos(:, 2));
result = table(time, position)
I'm assuming I've got the format for the date\time correct. As the year is encoded on two digits only, you may also want to specify the 'PivotYear' for the conversion. Otherwise, when decoding in 2019, it is assumed that any year above 69 is in the 19xx, and anything below is 20xx.
  6 Comments
Amanda Hillström
Amanda Hillström on 12 Feb 2019
Thanks. Could you help me organize it so that the three colums on the first row are: time, position (Tx), position (Rx)? Also, is there any way to include the decimals after seconds that were visible in my log?
yyMMmm:HHmmss.SSSSSS <- currently they do not appear in my table
Guillaume
Guillaume on 12 Feb 2019
You can specify the display format of the datetime during the import to whatever you want, e.g.:
time = datetime(timedirpos(:, 1), 'InputFormat', 'yyMMdd:HHmmss.SSSSSS', 'Format', 'dd-MMM-yyyy HH:mm:ss.SSSSSS');
or afterward change it to whatever you want, eg:
result.time.Format = 'dd-MMM-yyyy HH:mm:ss.SSSSSS';
With regards to "so that the three colums on the first row are: time, position (Tx), position (Rx)", I'm not really clear how that would work. No Tx and Rx have the same timing. I suppose you could round them up (to the nearest what?) but still, you don't always have a TX followed by a RX, so what should happen in that case? E.g. looking at the end of the table:
>> tail(result)
ans =
8×3 table
time direction position
___________________________ _________ __________
08-Feb-2019 12:34:33.111048 Tx 1.5161e+06
08-Feb-2019 12:34:33.122383 Rx 1.5161e+06
08-Feb-2019 12:34:33.375990 Tx 1.5161e+06
08-Feb-2019 12:34:33.381392 Rx 1.5161e+06
08-Feb-2019 12:34:33.563973 Tx 1.1615e+06
08-Feb-2019 12:34:33.822953 Tx 1.8105e+06
08-Feb-2019 12:34:33.822953 Tx 1.456e+06
08-Feb-2019 12:34:33.826743 Rx 7.7424e+05
You've got three consecutive TX, two of them at the same time but with different position. What would the corresponding RX for each?

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!