How to extract part of a text file in MATLAB?
4 views (last 30 days)
Show older comments
Okay so I have opened an xml file and want to get the relevant text stored in those files. I tried the following code (noting that the relevant text started after a certain string of characters in the xml file, I tried to use an if statement to extract the text from that point till they reached another point. This would give me less meaningless text so that I could get the text that I want.)
if true
File1 = fopen('Factual1.xml','r');
File2 = fopen('Factual2.xml','r');
File3 = fopen('Colloquial1.xml','r');
File4 = fopen('Colloquial2.xml','r');
File5 = fopen('Hello.xml','r');
File6 = fopen('Hello2.xml','r');
Filenames = {'File1';'File2';'File3';'File4';'File5';'File6'};
B = {0};
for i=File1:File6
A = fscanf(i,'%s');
if ~(strcmp(A,'<w:pw:rsidR="00E3286E"w:rsidRDefault="'))
while((B = fscanf(i,'%c')) ~='\')
B
end
end
end
end
but I keep getting an error, saying that the statement B = fscanf(I,'%c') is not valid. Is there any other way that I can scan the contents of each file, character by character, so that I can extract the amount of text that I want?
0 Comments
Answers (2)
Ken Atwell
on 3 Jun 2013
I'm guessing you're a C programmer. You can't assign B in the while loop's conditional like you are attempting to do. Use two lines:
B = fscanf(i, '%c');
while B ~= '\'
...
B = fscanf(i, '%c');
end
BTW, I believe your for loop is working "accidentally" because MATLAB tends to assign file handles in numeric order -- but is perhaps not guaranteed.
4 Comments
Walter Roberson
on 4 Jun 2013
MATLAB appears to follow what POSIX does, which is to allocate the first available (lowest numbered) file descriptor. But that does not mean that the results will always be consecutive.
fid1 = fopen('file1');
fid2 = fopen('file2');
fid3 = fopen('file3');
fclose(fid1);
fclose(fid2);
nfid1 = fopen('nfile1');
nfid2 = fopen('nfile2');
nfid3 = fopen('nfile3');
If we assume nothing had been opened before, fid1 will be 3, fid2 will be 4, fid3 will be 5, then 3 and 4 are released, so nfid1 will be 3, nfid2 will be 4, but nfid3 would be the next available, 6, rather than the consecutive 5.
Paul Metcalf
on 4 Jun 2013
You are defining B as a cell matrix, then trying to replace B with a different data type which is invalid. Try first initializing B properly. E.g. B = cell(m,n); Then to assign data into each cell in the array use B{1,1} = 'first line of data'; etc... Your code is really poorly constructed in general. If I have time tonight I'll look at sending you some more tips.
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!