Export matched lines from two text files

6 views (last 30 days)
I need to identify the same lines between the two text files, mwithrm21.txt and virgomrmdist.txt, based on column 7 of each files. These matches should then be exported into a new text file, while removing the matched lines from mwithrm21.txt.
I have attached the text files.
I drafted the code below:
content1 = fileread( 'mwithrm21.txt' ) ;
content2_rows = strsplit( fileread( 'virgomrmdist.txt' ), sprintf( '\n' )) ;
found = cellfun( @(s)~isempty(strfind(content1, s)), content2_rows ) ;
output_rows = content2_rows(found) ;
fId = fopen( 'similarvclf.txt', 'w' ) ;
fprintf( fId, '%s\n', output_rows{:} ) ;
fclose( fId ) ;
output_rows = content2_rows(~found) ;
fId = fopen( 'mwithrm21_new.txt', 'w' ) ; % Remove the '_new' for overwriting original.
fprintf( fId, '%s\n', output_rows{:} ) ;
fclose( fId ) ;
But, I do not know how to make it specific to only searching column 7 and then exporting the entire matched line to a new text file.
  6 Comments
jgillis16
jgillis16 on 15 Aug 2015
I understand, which is why I worked through it (thanks for the reminder email!!).
It doesn't matter honestly, since the output material in the lines is the same, except in different order. But, since my main focus is to export lines matching in mwithrm21 to a new text file while removing the matched lines from the original mwithrm21 text file, I would like the exported lines to come from mwithrm21.txt

Sign in to comment.

Accepted Answer

Cedric Wannaz
Cedric Wannaz on 15 Aug 2015
Edited: Cedric Wannaz on 16 Aug 2015
Here is a first draft. Test it and let me know if anything is unclear or doesn't work.
% - Read files content as strings.
content1 = fileread( 'mwithrm21.txt' ) ;
content2 = fileread( 'virgomrmdist.txt' ) ;
% - Extract last column of each content.
codes1 = regexp( content1, '[^|]+(?=(\s|$))', 'match' ) ;
codes2 = regexp( content2, '[^|]+(?=(\s|$))', 'match' ) ;
% - Matches codes.
[isMatch_1in2, match_posIn2] = ismember( codes1, codes2 ) ;
% - Split content. Careful, whatever generates these files still uses
% carriage returns (\r) only.
rows1 = strsplit( content1, char(13) ) ;
rows2 = strsplit( content2, char(13) ) ;
% - Output matches (version mwithrm21.txt). Use new line chars (\n) as
% joint instead of carriage returns, change if you prefer \r.
fId = fopen( 'matches.txt', 'w' ) ;
fwrite( fId, strjoin( rows1(isMatch_1in2), '\n' )) ;
fclose( fId ) ;
% - Output non-matching rows of file 1.
fId = fopen( 'mwithrm21_reduced.txt', 'w' ) ;
fwrite( fId, strjoin( rows1(~isMatch_1in2), '\n' )) ;
fclose( fId ) ;
% - Output non-matching rows of file 2. Eliminate matching rows first.
rows2(nonzeros( match_posIn2 )) = [] ;
fId = fopen( 'virgomrmdist_reduced.txt', 'w' ) ;
fwrite( fId, strjoin( rows2, '\n' )) ;
fclose( fId ) ;
EDITs :
  • Replaced match_posIn2(match_posIn2~=0) with nonzeros( match_posIn2 ) after reading an answer by Matt J in another thread that mentions NONZEROS.
  7 Comments
jgillis16
jgillis16 on 17 Aug 2015
OK! That was very helpful!!! Thanks!
Next time, I'll have a little more clue to what I need to code and maybe I might get it done myself without bugging you guys :)

Sign in to comment.

More Answers (2)

per isakson
per isakson on 16 Aug 2015
Edited: per isakson on 16 Aug 2015
Here is an example of a different approach to solve the task. The two output files, mwithrm21_reduced.txt and matches.txt, are identical besides the new line characters.
function et = cssm()
% et(1) = cssm_1();
et(2) = cssm_2();
end
function et = cssm_2()
tic
fid = fopen( 'mwithrm21.txt', 'rt' );
rows1 = textscan( fid, '%s', 'Delimiter','\n' );
fseek( fid, 0, 'bof' );
codes1 = textscan( fid, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid );
%
fid = fopen( 'virgomrmdist.txt', 'rt' );
codes2 = textscan( fid, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid );
%
ism = ismember( codes1{1}, codes2{1} );
%
fid = fopen( 'matches.txt', 'wt' );
fprintf( fid, '%s\n', rows1{1}{ism} );
fclose( fid ) ;
%
fid = fopen( 'mwithrm21_reduced.txt', 'wt' );
fprintf( fid, '%s\n', rows1{1}{not(ism)} );
fclose( fid );
et = toc;
end
  1 Comment
jgillis16
jgillis16 on 17 Aug 2015
Hey thanks! I appreciate the alternative approach!

Sign in to comment.


r r
r r on 11 May 2021
I have two files in which there are numbers in the first column that are similar and I want to print the line that matches and differs in the number of the first column in the two files:
%%%%%%%%%%%%%%%%%%%%%%% Fiel.1
fid1 = fopen( 'E1.txt', 'rt' );
T1 = textscan(fid1,'%s', 'delimiter', '\n');
%codes1 = textscan( fid1, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid1 );
%%%%%%%%%%%%%%%%%%%%%%%%%%Fiel.2
fid2 = fopen( 'G1.txt', 'rt' );
T2 = textscan(fid2,'%s', 'delimiter', '\n');
%codes2 = textscan( fid2, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
fclose( fid2 );
%%%%%%%%%%%%%%%%%%%%%%%%%%%
T1s = char(T1{:});
T2s = char(T2{:});
%Similar data between two files::
%[C,ix,ic] = intersect(T1s,T2s,'rows')
%Differences data between two files::
[B,ib,ib] = visdiff(T1s,T2s,'rows')
%%%%%%%%%%%%%%%%%%%%print output:::
fid = fopen( 'Similar.txt', 'wt' );%Print all similar lines
fprintf('%s\n',C)
fclose( fid ) ;
fid = fopen( 'Different.txt', 'wt' );%Print all different lines
fprintf('%s\n',B)
fclose( fid );

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!