How to delete rows of characters in Text files?
4 views (last 30 days)
Show older comments
I was trying to input the data from lots of TXT files, but there were rows of characters. How can I delete the rows with characters? How can I create a new txt file with just numerical data? The example of the txt data is as follows:
*****************************************
* Log File Started 11:29:05 Wed Dec 31 2014
* Using PFC3D 4.00-182 (64-bit)
* Serial Number: 262-000-0000-00000
* By:
*
*****************************************
Fish>
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
Fish>
*****************************************
* Log File Ended 11:29:07 Wed Dec 31 2014
*****************************************
I would like to delete all the headers and footers. I was trying to use fgetl function, but only the headers was deleted.
0 Comments
Answers (2)
Pourya Alinezhad
on 1 Jan 2015
hi there, u can use the following lines of code:
fid=fopen('txtfile.extention');
textdata=textscan(fid,'%n%n%n%n%n%n','headerlines',8,'delimiter','\b\t');
per isakson
on 2 Jan 2015
Edited: per isakson
on 9 Jan 2015
There is no easy way to read blocks of numerical data, which are embedded in text. That might not be quite true, I just learned
Here are three different functions, which read and parse the numerical block of the the example file, cssm.txt, of the question.
cssm_1   is a straight forward use of textscan. There are no problems to use it in this case because it is easy to determine the numbers of lines in the header and the block of data, respectively.   Matlab evolves gradually and it is easy to miss new behavior. With R2013a it is not neccessary to set rows_of_data, the number of time the formatspec is used. "[...] and stops when it cannot match formatSpec to the data." is new in the documentation of R2014a.
cssm_2   is based on a different approach. The entire file is read to a string and regexp extracts the blocks of numerical data. str2num converts the blocks to numerical arrays. This function can handle many blocks.
cssm_3   Sometimes the beginning and end of the blocks of tabular data are indicated with special strings. In this case Fish> indicates both beginning and end. fileread reads entire file to a string and regexp extracts the blocks bewteen the beginning and end markers. textscan parses the blocks.
Run on R2013a
>> num = read_block_demo( )
num(:,:,1) =
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
num(:,:,2) =
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
num(:,:,3) =
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
>>
where
function num = read_block_demo()
filespec = 'cssm.txt';
data_frmt = '%f%f%f%f%f%f%f%f%f';
rows_of_data = 5;
header_lines = 8;
begin_xpr = '\*{20,}\s+Fish>\s+';
end_xpr = '\s+Fish>\s+\*{20,}';
num(:,:,1) = cssm_1( filespec,data_frmt, rows_of_data, header_lines);
num(:,:,2) = cssm_2( filespec, 50 );
num(:,:,3) = cssm_3( filespec, data_frmt, begin_xpr, end_xpr );
assert( all(all(num(:,:,2)==num(:,:,1))) ...
&& all(all(num(:,:,3)==num(:,:,1))) ...
, 'The methods don''t return indentical results' )
end
function num = cssm_1(filespec, data_frmt, rows_of_data, header_lines )
fid = fopen( filespec );
cac = textscan( fid, data_frmt, rows_of_data ...
, 'Headerlines' , header_lines ...
, 'CollectOutput' , true );
fclose( fid );
num = cac{1};
end
function num = cssm_2( filespec, block_size )
cac = read_blocks_of_numerical_data( filespec, block_size );
num = cac{1};
end
function num = cssm_3( filespec, data_frmt, begin_xpr, end_xpr )
str = fileread( filespec );
cac = regexp( str, ['(?<=',begin_xpr,').+(?=',end_xpr,')'], 'match' );
cac = textscan( cac{1}, data_frmt, 'CollectOutput', true );
num = cac{1};
end
function out=read_blocks_of_numerical_data(filespec,block_size,delimiter )
% block_size lower limit of number of characters in numerical block
%
% Within a block all rows must have the same number of "columns".
narginchk( 2, 3 )
buffer = fileread( filespec );
if nargin == 2
del_xpr = '[ ]+';
trl_xpr = '[ ]*';
else
del_xpr = ['([ ]*',delimiter,'[ ]*)'];
trl_xpr = ['([ ]*',delimiter,'?[ ]*)'];
end
num_xpr = '([+-]?(\d+(\.\d*)?)|(\.\d+))';
sen_xpr = '([EeDd](\+|-)\d{1,3})?'; % optional scientific E notation
num_xpr = [ num_xpr, sen_xpr ];
nl_xpr = '((\r\n)|\n)';
row_xpr = cat( 2, '(^|', nl_xpr, ')[ ]*(' ...
, num_xpr, del_xpr, ')*' ...
, num_xpr, trl_xpr, '(?=' ...
, nl_xpr,'|$)' );
blk_xpr = ['(',row_xpr,')+'];
blocks = regexp( buffer, blk_xpr, 'match' );
is_long = cellfun( @(str) length(str)>=block_size, blocks );
blocks(not(is_long)) = [];
out = cell( 1, length( blocks ) );
for jj = 1 : length( blocks )
out{jj} = str2num( blocks{jj} );
end
end
 
I learned that textscan in this case handles the free text at the end of a file better than I thought it would.
@Lei [...] there is still only method for removing the headers, no footers actually   textscan actually removes (/ignores) the footer automagically in your example.
0 Comments
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!