Clear Filters
Clear Filters

Read text file after a specific text line but avoiding only the next line

4 views (last 30 days)
Hello, I am collecting data after "# HHE HHN HHZ" (I only copy the first 3 rows after "# HHE HHN HHZ" as an example as there could be hundreds) and the position of these columns can vary. I have made a script for a specific text file (see example 1)
Example1:
#
# 4. COMMENTS
# BASELINE CORRECTED
#
# 5. ACCELERATION DATA
# HHE HHN HHZ
-0.02104708 -0.02134472 0.00412299
-0.00340606 0.08357343 0.02083563
-0.02940362 0.00093856 0.00505147
The script is the following for the case of one combination of columns defined as textline1, textline2 and so on, which are neccesary so that the data can be unified (rearranged) to a specific position as output:
textline1 = '# HNE HNN HNZ';
%First mixed data%
if index==0
index = strcmp(tline,textline1); %%EO NS UD
if index ==1; index=1; end
elseif index ==1
tmp=sscanf(tline,'%f %f %f %f');
tmp1 = [tmp(1); tmp(2); tmp(3)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp1'];
end
However, the some records present the following text format where there is a "T" before the data to be collected (after "# HHE HHN HHZ"):
#
# 4. COMMENTS
# BASELINE CORRECTED
#
# 5. ACCELERATION DATA
# HHE HHN HHZ
T
-0.02104708 -0.02134472 0.00412299
-0.00340606 0.08357343 0.02083563
-0.02940362 0.00093856 0.00505147
Any help to fix the coding for that case. Thank you very much.
  11 Comments
Jorge Luis Paredes Estacio
Thank you very much for your detail explanation. I really appreciated. I am going to modify the code as you suggested and try to fix the issue of getting more data.
dpb
dpb on 8 May 2023
NOTA BENE: In initial code above there was a typo/mismatch between the returned indexing variable and the variable used as the return value in the sort call -- I fixed above, but the original would have an issue...

Sign in to comment.

Accepted Answer

dpb
dpb on 7 May 2023
Edited: dpb on 8 May 2023
fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1376874/CISMID_SC_SCARQ_NEW_TOCHECH.txt';
data=readmatrix(fn,'CommentStyle',{'#','T'});
whos data
Name Size Bytes Class Attributes data 49626x4 1588032 double
[data(1:5,:); nan(1,size(data,2)) ; data(end-4:end,:)]
ans = 11×4
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -20.2870 NaN NaN NaN NaN NaN NaN 0.0448 -0.0758 -0.0541 NaN -0.0259 -0.0098 0.0058 NaN -0.0848 0.0277 -0.0031 NaN -0.0596 0.0094 -0.0153 NaN
Well, that's a spectacular failure in that the published/documented comment style didn't seem to work well at all...would have to delve into that some more, but may be worthy of a support ticket if don't find an obvious cause that I don't see just looking at the file in the browser.
BTW, since there isn't anythng after the section, you could shorten the file significantly before posting and not lose anything; I was presuming there were probably other sections after the data.
Anyways, let's do something a little different...
opt=detectImportOptions(fn,'Readvariablenames',0,'ExpectedNumVariables',3)
opt =
DelimitedTextImportOptions with properties: Format Properties: Delimiter: {'\t' ' '} Whitespace: '\b' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} ConsecutiveDelimitersRule: 'join' LeadingDelimitersRule: 'ignore' TrailingDelimitersRule: 'ignore' EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'ignore' Variable Import Properties: Set types by name using setvartype VariableNames: {'Var1', 'Var2', 'Var3'} VariableTypes: {'double', 'double', 'double'} SelectedVariableNames: {'Var1', 'Var2', 'Var3'} VariableOptions: Show all 3 VariableOptions Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [15 Inf] VariableNamesLine: 0 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview
data=readmatrix(fn,opt);
whos data
Name Size Bytes Class Attributes data 49631x3 1191144 double
data(1:5,:)
ans = 5×3
NaN NaN NaN NaN 2.0000 NaN NaN NaN NaN NaN NaN NaN NaN NaN -20.2870
Well, now we've again illustrated the import detection tool isn't all that great sometimes; particularly for text files...always like to try the higher-level things first, but when they don't work, revert to brute force to find the header..
fid=fopen('CISMID_SC_SCAR...W_TOCHECH.txt','r'); % opent the file for low-level i/o
n=0; % initialize line counter
l=''; % preset line content to nothing
while ~contains(l,'ACCELERATION DATA') % look for the acceleration data section
l=fgetl(fid);
n=n+1;
end
for i=1:3 % after found it, look for the data with, without a "T" record
l=fgetl(fid);
if strcmp(l(1),blanks(1)) | i>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
l = '# HNE HNN HNZ'
l = 'T'
l = ' 0.02165885 -0.06615625 0.00254670'
ans = 32
n = 37
fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(fn,'NumHeaderLines',n);
whos data
Name Size Bytes Class Attributes data 49608x3 1190592 double
data(1:5,:)
ans = 5×3
0.0217 -0.0662 0.0025 0.1372 -0.0853 -0.0040 0.0745 -0.0395 0.0133 -0.0195 0.0550 0.0390 -0.0766 0.0929 0.0681
Could also use low-level read to scan the rest of the file from that point on, but it's somewhat of a pain to resynch the filepointer to the betinning of the previous record to resan it, so I just saved the header line count and read with high-level routine.

More Answers (0)

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!