use regexp to extract index

12 views (last 30 days)
Nit C
Nit C on 10 Sep 2021
Commented: Nit C on 15 Sep 2021
trans_setup_2=ASSIGN {
# Blower speed in rpm
variable = SPEED
value = 1234
}
ASSIGN {
# Resulting time increment
variable = TIME_INCREMENT
value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE
}
trans_setup_2 is a cell array created from the text file containing the above text.
I would like to extract an index of ASSIGN { # Blower speed in rpm (only first occurence ) and along with an index of string value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE .
I have tried following:
ex='(?=ASSIGN).*|(?<=value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_B).*'
ones=~cellfun(@isempty,regexp(trans_setup_2, ex, 'match'))
How should adapt 'ex' search pattern extactly to get only first occurence of the ASSIGN and then value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE.
Please help me on this.
  7 Comments
Walter Roberson
Walter Roberson on 13 Sep 2021
trans_setup_path = fullfile('D:\timpts' ,'trans_setup_2.txt');
S = fileread(trans_setup_path);
S = regexp(S, '^ASSIGN\s', 'split', 'lineanchors');
S = regexprep(S, '^{', 'ASSIGN {');
Now S should be a cell array of character vectors. The first one should start with
trans_setup_2=ASSIGN {
and the others should start with
ASSIGN {
and each of them should be an exact copy of a {} block of text.
You probably do not need to know the line numbers to copy: you have the blocks of text right there, so you can copy out of the blocks.
You can parse each block,
vals = regexp(S, 'variable = (?<variable>\S+).*value = (?<value>[^\r\n]+)', 'names');
and that should get you a struct array with fields 'variable' and 'value' . You can search those for the variable names you are looking for to determine whether you are interested in copying the block or not.
Copying the block is
number_of_blocks_written = 0;
stuff
if number_of_blocks_written > 0
fprintf(outfid, '\n');
end
fwrite(outfid, S{K});
number_of_blocks_written = number_of_blocks_written + 1;
The care about writing \n or not is to avoid writing extra newlines. A newline has probably been eaten by the the process of finding the lines beginning with ASSIGN.
Nit C
Nit C on 15 Sep 2021
@Walter Roberson, Thanks. This solved my problem.

Sign in to comment.

Accepted Answer

Mathieu NOE
Mathieu NOE on 13 Sep 2021
hello
my 2 cents suggestion using readlines and working on strings :
this simple code can be expanded / modified according to what you need.
rr = readlines('trans_setup_2.txt');
rr_strip = strip(rr,'left'); % remove left blanks
a = find(strcmp(rr_strip,'ASSIGN {'));
b = find(strcmp(rr_strip,'value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE'));
text_extract1 = rr(a(1):b);
  1 Comment
Nit C
Nit C on 13 Sep 2021
I had strcmp used. But i am intersted to go with 'regexp' becuase there are many selection of text to make based on general pattern, keywords instead of extact text.

Sign in to comment.

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!