How to match and take the part of the string between two specified characters

35 views (last 30 days)
Hi,
I have the text file, and read some items form the text file:
for start time: as in test file: Start Time : 2020-06-08 10:12:01.02.653-starting VNA: FindSlot3 startTime: 2020-06-08 10:12:01.02.65 I used the following command: startTime = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?=*-starting)', 'match', 'once' )) ; but I do not getting my required output: I use the following code:
% - Define output header.
header = {'RainFallID', 'IINT', 'Rain Result', 'Start Time', 'Param1.pipe', ...
'10 Un Para2.pipe', 'Verti 2 mixing.dis', 'Rate.alarm times'} ;
nHeaderCols = numel( header ) ;
% - Build listing sub-folders of main folder.
D_main = dir( 'mainfolder' ) ;
D_main = D_main(3:end) ; % Eliminate "." and ".."
% - Iterate through sub-folders and process.
for dId = 1 : numel( D_main )
% - Build listing files of sub-folder.
D_sub = dir( fullfile( 'mainfolder', D_main(dId).name, '*.txt' )) ;
nFiles = numel( D_sub ) ;
% - Prealloc output cell array.
data = cell( nFiles, nHeaderCols ) ;
% - Iterate through files and process.
for fId = 1 : nFiles
% - Read input text file.
inLocator = fullfile( 'mainfolder', D_main(dId).name, D_sub(fId).name ) ;
content = fileread( inLocator ) ;
% - Extract relevant data.
rainfallId = str2double( regexp( content, '(?<=RainFallID\s+:\s*)\d+', 'match', 'once' )) ;
iint = regexp( content, '(?<=IINT\s+:\s*)\S+', 'match', 'once' ) ;
rainResult = regexp( content, '(?<=Rain Result\s+:\s*)\S+', 'match', 'once' ) ;
startTime = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?=*-starting)', 'match', 'once' )) ;
endTime = strtrim( regexp( content, '(?<=End Time\s+:\s*).*?(?= -)', 'match', 'once' )) ;
chamber=regexp( content, '(?<=chamber\s+:\s*)\S+', 'match', 'once' ) ;
end
% - Output to XLSX.
outLocator = fullfile( 'outputfolder', sprintf( '%s.xlsx', D_main(dId).name )) ;
fprintf( 'Output XLSX: %s ..\n', outLocator ) ;
xlswrite( outLocator, [header; data] ) ;
end
my desired output is:
  2 Comments
Mekala balaji
Mekala balaji on 2 Oct 2017
Edited: Stephen23 on 2 Oct 2017
Sir,
I want search "Start Time" and get its required data: 2020-06-08 10:12:01.02.653, similarly: for "Duration" and get its required data: 00:01:00 for "chamber" get its required data:1 (Slot12) (line =8)

Sign in to comment.

Answers (2)

Cedric Wannaz
Cedric Wannaz on 2 Oct 2017
Edited: Cedric Wannaz on 2 Oct 2017
Replace the line that extracts the start time with:
startTime = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?= - )', 'match', 'once')) ;
and you can extract the duration with:
duration = strtrim( regexp( content, '(?<=Duration\s+:\s*)\S+', 'match', 'once' )) ;
where the pattern (?<=Duration\s+:\s*)\S+ extracts
  • one or more non-white-spaces: \S+
  • preceded by: (?<=...) which is a look behind
  • the literal Duration followed by one or white-spaces \s+ followed by the literal : followede by zero or more white-spaces \s*
Finally, for chamber, you almost did it, it's good! The problem is that there can be some white-spaces in the middle of what you are trying to extract, and \S+ will break at the first white-space. Here there are several options for getting the end of the line. One would be based on anchoring the end of the line, and the other is based on picking all characters until it find a carriage return (\r) or a new line (\n) [which are not displayed in your text editor unless you ask for it, but we can use them]:
chamber = strtrim( regexp( content, '(?<=chamber\s+:\s*)[^\r\n]+', 'match', 'once' )) ;
where [^..] defines a set of characters not to match, [^..]+ matches one or more of anything that is not in this set, and \r and \n code the carriage return and the new line. So the whole thing reads: match one or more of anything that is not a carriage return or a new line.
  3 Comments
Cedric Wannaz
Cedric Wannaz on 4 Oct 2017
Edited: Cedric Wannaz on 4 Oct 2017
Well, let me bring a correction actually, because I realize, looking a second time at your last example, that there is no white-space before the dash in
Start Time : 2020-06-08 10:12:01.02.653-starting VNA: FindSlot3
so one way to catch '-starting' or ' -' is to look for a dash followed by a character that is not a number. If these are all the cases present in all versions of your file, the following should work (to test):
'(?<=Start Time\s+:\s*).*?(?=-\D)'
where \D means "anything but a numeric digit" (if is the complement of \d).

Sign in to comment.


Kian Azami
Kian Azami on 2 Oct 2017
You can use the following code to extract the lines relevant to 'Start Time' and the 'Duration' and then acquire the required data. The command 'textscan' helps to acquire data from text files.
clc
clear all
close all
fid = fopen('RainFallReport5.txt');
Start = textscan(fid,'%q%q%q%q%q%q%q%q',1,'HeaderLines',9);
Duration = textscan(fid,'%q%q%q%q%q%q%q%q',1,'HeaderLines',2);
Start_Time = ['Start Time:' strcat(Start{1,[4 5]})]
Duration = ['Duration:' strcat(Duration{1,3})]
fclose(fid);

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!