searching a given line in a text file

1 view (last 30 days)
Ram
Ram on 28 Feb 2011
The following file is a txt file in sdf format(chemical structures) It looks sumthin lik this
7 9 1 0 0 0 0
7 14 1 0 0 0 0
8 10 1 0 0 0 0
8 15 1 0 0 0 0
9 10 2 0 0 0 0
9 16 1 0 0 0 0
10 17 1 0 0 0 0
12 13 1 0 0 0 0
13 18 1 0 0 0 0
13 19 1 0 0 0 0
13 20 1 0 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
2244
> <PUBCHEM_COMPOUND_CANONICALIZED>
1
> <PUBCHEM_CACTVS_COMPLEXITY>
212
I need to extract just the information under the CID number field and there could be multiple CID number fields in a single file.. How should I go about this?? Any help would be appreciated..

Accepted Answer

Ram
Ram on 1 Mar 2011
I tried sumthin lik this
[A,B]=uigetfile('*.sdf','sdf');
C=fopen(A,'r');
n=0;
i=<ui>; %number of structures -- wil be obtained from the user
pubchem_id=[];
z=<ui>*300; %rough approximation-- 300lines for each structure
for j=1:1:z
D=fgetl(C);
if strcmp('> <PUBCHEM_COMPOUND_CID>',D)
E=fgetl(C);
E = str2double(E);
pubchem_id=[pubchem_id; E]
end
end
and it worked :)
  2 Comments
David Young
David Young on 1 Mar 2011
The for loop that looks at 300 lines only is a hostage to fortune: what if there are more than 300 lines for a structure? You could avoid this by using a while loop that kept looking until it either found a particular line, or came to the end of the file, and that would be far more robust.
Ram
Ram on 4 Mar 2011
I din use while loop because there is no such thing in an sdf that marks the end of the file.. lik for instance $$$$ marks the end of each structure and there could be multiple $$$$'s depending on the number of structures.. a structure averagely has about 180 lines so 300 is actually redundant and when thr are more 300 lines it wil be compensated by the ones that have less than 300..

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 28 Feb 2011
Not much you can do except fgetl() through the file until you encounter the M END line, and do the extraction work from there. The ease of extracting after that would depend upon the regularity of the data after that and upon which fields you were interested in.
  1 Comment
Ram
Ram on 1 Mar 2011
thank u so much:) i have built my code based on ur reply only :)

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!