How to parse text data
11 views (last 30 days)
Show older comments
Life is Wonderful
on 17 Jul 2019
Commented: Life is Wonderful
on 2 Aug 2019
Hi
I have data in the below format. I need the mechanism to parse the data from below format with expected output.
Input data format:
07/16 12:55:22.012 INFO | test_runner_utils:0812| Began logging to /tmp/test_that_results_hatch_deL3lZ
07/16 12:55:27.477 INFO | test_runner_utils:0259| autoserv| Processing control file
Expected Output format:
Define level of message extraction based on the marker sign ==> |
-Step 1: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|
-Step 2: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>| extract full text in a variable, option to grab variable if associated with value
-Step 3: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|
-Step 4: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|extract full text in a variable, option to grab variable if associated with value
-Step 5: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<string>|
-Step 6: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<string>|extract full text in a variable, option to grab variable if associated with value
Input data format:
07/16 12:55:27.620 DEBUG| utils:0287| [stdout] CHROMEOS_RELEASE_BOARD=hatch
07/16 13:28:58.330 INFO | mode_switcher:0673| -[FAFT]-[ start wait_for_client ]---
Expected Output format:
-Step 1: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]>
-Step 2: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]> extract full text in a variable, option to grab variable if associated with value
-Step 3: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]>
-Step 4: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]> [string] extract full text in a variable, option to grab variable if associated with value
Input data format:
2019-07-16 12:55:30 > string
2019-07-16 12:55:30 powerbtn: released
Expected Output format:
Note the marker >
-Step 1: Extract Timestamp in YYYY:MM:DD HH:mm:sec > < string>
-Step 2: Extract Timestamp in YYYY:MM:DD HH:mm:sec < full string>
Input data format
2019-07-16 12:55:31 > [12074.734997 HC 0x121 err 1]
Expected Output format
-Step 1: Extract Timestamp in YYYY:MM:DD HH:mm:sec > [< %1.3f string extract full text in a variable, option to grab variable if associated with value>]
Thanks a lot
5 Comments
Accepted Answer
Guillaume
on 23 Jul 2019
Edited: Guillaume
on 23 Jul 2019
Are you still on very old version (please fill the release field next to the question)?. If on a modern version, the file can easily be read with:
VariableNames = {'Date', 'Level', 'delim1', 'PID', 'delim2', 'Message'};
VariableWidths = [19, 5, 1, 23, 2, 5000];
VariableTypes = {'datetime', 'char', 'char', 'char', 'char', 'char'};
opts = fixedWidthImportOptions('VariableNames', VariableNames, 'VariableWidths', VariableWidths, 'VariableTypes', VariableTypes, 'SelectedVariableNames', [1, 2, 4, 6]);
opts = setvaropts(opts, 'Date', 'InputFormat', 'MM/yy hh:mm:ss.SSS');
content = readtable('test_that.txt', opts);
results in:

If on a version fo matlab that doesn't have tables, use textscan with fixed width fields:
fid = fopen('test_that.txt', 'rt');
content = textscan(fid, '%18c%*c%5c%*c%23c%*2c%s', 'Delimiter', '', 'Whitespace', '');
fclose(fid);
content = [cellstr(content{1}), cellstr(content{2}), cellstr(content{3}), content{4}]
23 Comments
More Answers (2)
Bob Thompson
on 18 Jul 2019
I need next steps
◾Convert Datacontent into cell's - like timestamp , message data-1,message data-2
◾Put cell in proper format
◾Create Matlab variables
◾Display Matlab variable for good analysis
1) regexp automatically outputs all results in a cell, each containing a string.
2) You can convert strings to date time formats using datetime. To do this 'quickly' I suggest using a loop through your regexp results, or by using cellfun (which is really still a loop).
3) What exactly do you mean by this? I personally do not know of a way to dynamically create variables within Matlab, and I think you would be better served to keep the information in a cell array, or to make a table out of it. It is certainly possible to create new variables in a table from a captured string from regexp.
4) Displaying Matlab variables is simply a matter of not suppressing them, or if specifically wanting to display them then you can use fprintf with no target so it defaults to the command window.
5 Comments
Bob Thompson
on 19 Jul 2019
Are you only looking to capture the timestamp? It seems like the issue is more in the initial regexp processing than in the date time conversion.
If you are only looking to capture the timestamp I would suggest doing a regexp call like this:
filedata = regexp(filecontent'(\d\d.\d\d\s\d\d.\d\d.\d\d.\d\d\d)\D+\d\d\d\d\D+\n','tokens');
dates = datetime([filedata{:}], 'InputFormat', 'MM/dd HH:mm:ss.SSS');
If you are looking to capture more than the timestamps then please explain more. I know you outline some more in your OP, but I'm not entirely sure what you're referring to.
See Also
Categories
Find more on Cell Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!