Finding a string in a file

Hello,
I need help with this problem:
How can I find the files with the same name in different folders that have the number "5800" in them? Furthermore, how can I list these files/folders and which line(s) the number "5800" appear in the files?
I would really appreciate the help.
Thanks.

2 Comments

"5800" is in the folder name or in the file name? Is the file a ASCII text file?
"5800" is a target string to be found amongst several files.

Sign in to comment.

Answers (8)

CommonFileName = 'x3175.txt'; %or whatever the common name is
folderinfo = dir('*');
folderinfo(~[folderinfo.isdir]) = [];
folderinfo(strcmp({folderinfo.name},{'.','..'})) = [];
for FoIdx = 1 : length(folderinfo)
specificname = fullfile(folderinfo(FoIdx}.name, CommonFileName);
if exists(specificname, 'file)
%at this point, insert your code to examine specificfile
end
end
I realize this question is old, but in Windows Explorer (and not strictly a MATLAB question, either), you can put "content: " in the search box in the upper right hand corner, and it will search the files (including .doc files) for the string. So for this question, putting "content: 5800" would have searched and returned a list of the files that had "5800" in it.
Also, note that you can try finding something in Explorer, and then at the bottom it says "Search again in:", and one of the choices is "content". Not the most usable thing (the search dog in previous Windows versions exposed this more quickly), but it is there.
There is also a "find" command you can run from the Command Shell, just cd to the directory in question and type
find "5800" *
And the current directory will be searched. I checked a directory where I had some Word documents and it successfully searched them.
Honglei Chen
Honglei Chen on 22 Jul 2011
Hi osminbas,
You could write a script to achieve this. You can use cd to change directories, what to list all the files in the folder, then for each file, use textscan to read each line and strfind to find '5800'. You then just write the result to either the screen or a file.
HTH
Honglei
On unix systems:
!grep -Hn 5800 */TheFileName.txt
Fangjun Jiang
Fangjun Jiang on 23 Jul 2011

0 votes

How about this? Open M-Editor, select menu Edit->Find file. Specify what text to find, in what files and what folder/sub-folder to find, the result will show all the files, folders and the lines that the text appears in.
Many other IDEs probably have the same capability too.
osminbas
osminbas on 25 Jul 2011

0 votes

Thank you all. I realized that I misworded my question. I am trying to find the string "5800" in the file itself (it is a .doc file), not in the name of the file. Again, I appreciate your help.

6 Comments

Does it have anything to do with MATLAB?
All of the solutions presented are for finding "5800" in the _content_ of the file.
On the other hand, if it is a .doc file then it might not be encoded in ASCII or ISO-8891-1 or UTF-8: Microsoft has a fondness for storing strings in UTF-16 BE. For characters that fit within the US-ASCII or ISO-8891-1 character sets, the difference is that each of those characters is represented in UTF16-BE as a pair of bytes, with the first byte of the pair being binary 0 and the second byte of the pair being the normal US-ASCII or ISO-8891-1 byte encoding. You could end up needing to search for the binary uint8([0 '5' 0 '8' 0 '0' 0 '0']) which is usually harder for routines to find. Sometimes the easiest method in such cases is to open the file as UTF-16BE. read uint8 values, and then to use unicode2native() to convert the bytes to MATLAB char() data and search that, as it is not uncommon for routines to think that the first uint8(0) in a string marks the end of the string (a holdover from C's string structure.)
There can be additional complications in any structured file such as .DOC files: what we humans look at and see as '5800' in the file might happen to have (for example) '5' <change font face> '8' <end font weight> '0' <end font face> <change font color> '0' <end font color> . Integrating with the OpenOffice freeware may perhaps be the easiest solution to strip the markup out and allow search on plain characters.
Or just go through and save each .doc file as plain text without markup and then search that plain text.
Thank you all. Especially, thank you, Walter. In your code, when I define the folderinfo, it is the "big" folder that has all the subfolders in it. Some of these subfolders have the file I am looking for and some of them don't. How can I get only the ones with that file in them. And furthermore, how can I list these folder names in an excel sheet or text file (whichever is easier)?
Also, how do I search for the string "5800" inside the document? I know that you talked about the difficulties of reading inside .doc files but maybe you can answer my question assuming it is a txt file.
I really appreciate your help.
The line "if exists(specificname, 'file)" that I put in the code checks to see whether the file exists in the subdirectory you are processing, and skips the string search if the file is not there.
For searching within a specific file, there are many ways. One way is:
fid = fopen(specificfile, 'rt');
lines = textscan(fid,'%[^\n]'); %reads line by line
fclose(fid);
L = find(~cellfun(@isempty,strfind(lines, '5800')),1,'first');
if ~isempty(L)
fprintf('found in %s at line %d\n', specificfile, L);
end
K E
K E on 21 Sep 2012
I just used this code in another project, so thanks Walter

Sign in to comment.

osminbas
osminbas on 28 Jul 2011

0 votes

Thank you all. Especially, thank you, Walter. In your code, when I define the folderinfo, it is the "big" folder that has all the subfolders in it. Some of these subfolders have the file I am looking for and some of them don't. How can I get only the ones with that file in them. And furthermore, how can I list these folder names in an excel sheet or text file (whichever is easier)?
Also, how do I search for the string "5800" inside the document? I know that you talked about the difficulties of reading inside .doc files but maybe you can answer my question assuming it is a txt file.
I really appreciate your help.
venkat vasu
venkat vasu on 22 Sep 2012
Edited: Walter Roberson on 22 Sep 2012
a1=dir;
l=length(a1);
for i1=3:l
files=dir(a1(i1).name);
nfiles = length(files);
for i=3:nfiles
currentfilename = files(i).name;
if currentfilename==5800
%whatever operation
end
end
end
this code surely will help you...

1 Comment

The task was to search the content, not the file name.
Also, "currentfilename" from files(i).name will be a string, but you attempt to compare the string to the numeric value 5800 . That is not going to have the result you expect.

Sign in to comment.

Categories

Tags

Asked:

on 22 Jul 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!