Removing unwanted lines from text file

Question

jgillis16 on 6 Aug 2015

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/232544-removing-unwanted-lines-from-text-file

Commented: Cedric on 6 Aug 2015

Accepted Answer: Cedric

virgorm.txt

Open in MATLAB Online

I am trying to remove all the NaN from column 7 of the attached text file and move them into a new text file.

I have written the code below:

% - Read original.
 content = fileread( 'virgorm.txt' ) ;
 % - Match and eliminate lines without pattern matching.
 sepId = reshape( strfind( content, '|' ), 7, [] ) ;
 match = content(sepId(7,:)+1) == 'NaN' ;
 lines = strsplit( content, '\n' ) ;
 lines(match) = [] ;
 % - Export updated content.
 fId = fopen( 'virgormwou.txt', 'w' ) ;
 fprintf( fId, strjoin( lines, '\n' )) ;
 fclose( fId ) ;

But, it doesn't seem to be working. I suspect it is because of line:

match = content(sepId(7,:)+1) == 'NaN' ;

The error I get is:

Error using reshape Product of known dimensions, 7, not divisible into total number of elements, 6492.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 6 Aug 2015

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/232544-removing-unwanted-lines-from-text-file#answer_188402

Edited: Cedric on 6 Aug 2015

Open in MATLAB Online

Not far! You made two small mistakes actually. The first is that you have 7 columns, and hence 6 separators, so the array of separators IDs must be reshaped using 6 rows:

sepId = reshape( strfind( content, '|' ), 6, [] ) ;

Then you cannot test is one char/element equals 'NaN' the way you do. I would just check for the presence of 'N' after the 6th separator:

found = content(sepId(6,:)+1) == 'N' ;

Finally, and I renamed the variable match into found for that purpose (which should remind you one of your previous questions), you can split and export to two files as follows:

 lines = strsplit( content, '\n' ) ;   *** UPDATED: I forgot to copy this line.
 fId = fopen( 'output_nan.txt', 'w' ) ;
 fprintf( fId, strjoin( lines(found), '\n' )) ;
 fclose( fId ) ;
 fId = fopen( 'output_noNan.txt', 'w' ) ;
 fprintf( fId, strjoin( lines(~found), '\n' )) ;
 fclose( fId ) ;

4 Comments
Show 2 older commentsHide 2 older comments

Cedric on 6 Aug 2015

Did you change the sepId(7,:) intto sepId(6,:) as well?

Cedric on 6 Aug 2015

Open in MATLAB Online

You should work on an small example actually, to get a better understanding of what we do:

 >> buffer = sprintf( '1|3|2|~|7\n2|1|5|~|12\n3|2|28|~|137' )
 buffer =
        1|3|2|~|7
        2|1|5|100|12
        3|2|28|~|137

This creates a string of characters which has the same structure as your files. The \n is an escape code that creates a new line.

Now we can look for the positions/IDs of | in this string:

 >> strfind( buffer, '|' )
 ans =
     2     4     6     8    12    14    16    20    25    27    30    32

and you can check that it works if you count the new line as a single character. You can see what is the ASCII code of all these characters by the way, by converting to numeric (adding 0 triggers an automatic conversion to numeric):

 >> buffer + 0 
 ans =
     49   124    51   124    50   124   126   124    55    10    50   124    49   124    53   124    49    48    48   124    49    50    10    51   124    50   124    50    56   124   126   124    49    51    55

Here, 49 is the ASCII code of '1', 51 is the ASCII code of '3', 124 is the ASCII code of '|', and 10 is the ASCII code that codes for new lines. The shows that SPRINTF codes '\n' with 10, which is a single character.

Back to positions, accounting for the fact that new lines are single characters, you can check that positions work. Now if we want to get the position of all 3rd | on each line, we can compute the start and the step for extracting relevant positions. Another way is to create an array whose number of columns equals the number of | on a line, which means to reshape the vector of positions as follows:

 >> sepId = reshape( strfind( buffer, '|' ), 4, [] )
 sepId =
     2    12    25
     4    14    27
     6    16    30
     8    20    32

Here we get is transposed, but you recognize in the first column all positions associated with line 1, in the second column all positions associated with line 2, etc. So getting positions/IDs associated with the 3rd | means extracting row 3 of this array:

 >> sepId(3,:)
 ans =
     6    16    30

Now we can get the character that follows immediately by extracting elements of buffer at these positions +1 :

 >> buffer(sepId(3,:)+1)
 ans =
 ~1~

and we can test whether these characters are '~' or not:

 >> found = buffer(sepId(3,:)+1) == '~'
 found =
     1     0     1

Note that found is a vector of logicals (booleans: true noted 1, and false noted 0):

 >> class( found )
 ans =
     logical

which means that we can create "not found":

 >> ~found
 ans =
     0     1     0

We can use both for indexing arrays (logical indexing). If we want to index lines, we have to split buffer into lines, which we do with STRSPLIT using the new line as delimiter:

 >> lines = strsplit( buffer, '\n' )
 lines = 
    '1|3|2|~|7'    '2|1|5|100|12'    '3|2|28|~|137'

This is a cell array of lines/strings:

 >> class( lines )
 ans =
     cell

and we can index its cells using a logical index (true=1 elements flag cells to extract):

 >> lines(found)
 ans = 
    '1|3|2|~|7'    '3|2|28|~|137'
 >> lines(~found)
 ans = 
    '2|1|5|100|12'

Now we can export these to files, but we have to join lines with a new line character:

 >> strjoin( lines(found), '\n' )
 ans =
     1|3|2|~|7
     3|2|28|~|137

and the rest you know well, it's opening files for writing, writing, and closing files.

Sign in to comment.

Removing unwanted lines from text file

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Removing unwanted lines from text file

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments