Finding a string in a file
Show older comments
Hello,
I need help with this problem:
How can I find the files with the same name in different folders that have the number "5800" in them? Furthermore, how can I list these files/folders and which line(s) the number "5800" appear in the files?
I would really appreciate the help.
Thanks.
2 Comments
Fangjun Jiang
on 22 Jul 2011
"5800" is in the folder name or in the file name? Is the file a ASCII text file?
Walter Roberson
on 23 Jul 2011
"5800" is a target string to be found amongst several files.
Answers (8)
Walter Roberson
on 23 Jul 2011
CommonFileName = 'x3175.txt'; %or whatever the common name is
folderinfo = dir('*');
folderinfo(~[folderinfo.isdir]) = [];
folderinfo(strcmp({folderinfo.name},{'.','..'})) = [];
for FoIdx = 1 : length(folderinfo)
specificname = fullfile(folderinfo(FoIdx}.name, CommonFileName);
if exists(specificname, 'file)
%at this point, insert your code to examine specificfile
end
end
Jason Ross
on 21 Sep 2012
I realize this question is old, but in Windows Explorer (and not strictly a MATLAB question, either), you can put "content: " in the search box in the upper right hand corner, and it will search the files (including .doc files) for the string. So for this question, putting "content: 5800" would have searched and returned a list of the files that had "5800" in it.
Also, note that you can try finding something in Explorer, and then at the bottom it says "Search again in:", and one of the choices is "content". Not the most usable thing (the search dog in previous Windows versions exposed this more quickly), but it is there.
There is also a "find" command you can run from the Command Shell, just cd to the directory in question and type
find "5800" *
And the current directory will be searched. I checked a directory where I had some Word documents and it successfully searched them.
Honglei Chen
on 22 Jul 2011
1 vote
Hi osminbas,
You could write a script to achieve this. You can use cd to change directories, what to list all the files in the folder, then for each file, use textscan to read each line and strfind to find '5800'. You then just write the result to either the screen or a file.
HTH
Honglei
Walter Roberson
on 23 Jul 2011
On unix systems:
!grep -Hn 5800 */TheFileName.txt
Fangjun Jiang
on 23 Jul 2011
0 votes
How about this? Open M-Editor, select menu Edit->Find file. Specify what text to find, in what files and what folder/sub-folder to find, the result will show all the files, folders and the lines that the text appears in.
Many other IDEs probably have the same capability too.
osminbas
on 25 Jul 2011
0 votes
6 Comments
Fangjun Jiang
on 25 Jul 2011
Does it have anything to do with MATLAB?
Walter Roberson
on 25 Jul 2011
All of the solutions presented are for finding "5800" in the _content_ of the file.
On the other hand, if it is a .doc file then it might not be encoded in ASCII or ISO-8891-1 or UTF-8: Microsoft has a fondness for storing strings in UTF-16 BE. For characters that fit within the US-ASCII or ISO-8891-1 character sets, the difference is that each of those characters is represented in UTF16-BE as a pair of bytes, with the first byte of the pair being binary 0 and the second byte of the pair being the normal US-ASCII or ISO-8891-1 byte encoding. You could end up needing to search for the binary uint8([0 '5' 0 '8' 0 '0' 0 '0']) which is usually harder for routines to find. Sometimes the easiest method in such cases is to open the file as UTF-16BE. read uint8 values, and then to use unicode2native() to convert the bytes to MATLAB char() data and search that, as it is not uncommon for routines to think that the first uint8(0) in a string marks the end of the string (a holdover from C's string structure.)
There can be additional complications in any structured file such as .DOC files: what we humans look at and see as '5800' in the file might happen to have (for example) '5' <change font face> '8' <end font weight> '0' <end font face> <change font color> '0' <end font color> . Integrating with the OpenOffice freeware may perhaps be the easiest solution to strip the markup out and allow search on plain characters.
Walter Roberson
on 25 Jul 2011
Or just go through and save each .doc file as plain text without markup and then search that plain text.
osminbas
on 28 Jul 2011
Walter Roberson
on 28 Jul 2011
The line "if exists(specificname, 'file)" that I put in the code checks to see whether the file exists in the subdirectory you are processing, and skips the string search if the file is not there.
For searching within a specific file, there are many ways. One way is:
fid = fopen(specificfile, 'rt');
lines = textscan(fid,'%[^\n]'); %reads line by line
fclose(fid);
L = find(~cellfun(@isempty,strfind(lines, '5800')),1,'first');
if ~isempty(L)
fprintf('found in %s at line %d\n', specificfile, L);
end
K E
on 21 Sep 2012
I just used this code in another project, so thanks Walter
osminbas
on 28 Jul 2011
0 votes
venkat vasu
on 22 Sep 2012
Edited: Walter Roberson
on 22 Sep 2012
a1=dir;
l=length(a1);
for i1=3:l
files=dir(a1(i1).name);
nfiles = length(files);
for i=3:nfiles
currentfilename = files(i).name;
if currentfilename==5800
%whatever operation
end
end
end
this code surely will help you...
1 Comment
Walter Roberson
on 22 Sep 2012
The task was to search the content, not the file name.
Also, "currentfilename" from files(i).name will be a string, but you attempt to compare the string to the numeric value 5800 . That is not going to have the result you expect.
Categories
Find more on Search Path in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!