How to read text files form sub-sub folders

1 view (last 30 days)
Hi,
I want to read text files from sub-sub folders:
Architecture:
Mainfolder
Tool1
sub-subFolder1
sub-subFolder2
.....
.....
Tool2
sub-subFolder1
sub-subFolder2
.....
.....
......
1. Read text files by each sub-folder(i.e, Tool1, Tool2, etc)
2. Output
Tool1.xlsx, Tool2.xlsx
I use the following code, but I can access sub-sub folders.
% - Define output header.
header = {'RainFallID', 'IINT', 'Rain Result', 'Start Time', 'Param1.pipe', ...
'10 Un Para2.pipe', 'Verti 2 mixing.dis', 'Rate.alarm times'} ;
Mainfolder='Mainfolder';
outLocatorFolder='OutputFolder';
nHeaderCols = numel( header ) ;
% - Build listing sub-folders of main folder.
% D_main = dir( 'D:\Mekala_Backupdata\Matlab2010\Mainfolder' ) ;
D_main = dir(Mainfolder ) ;
D_main = D_main(3:end) ; % Eliminate "." and ".."
% - Iterate through sub-folders and process.
for dId = 1 : numel( D_main )
% - Build listing files of sub-folder.
D_sub = dir( fullfile(Mainfolder, D_main(dId).name, '*.txt' )) ;
nFiles = numel( D_sub ) ;
keyboard
% - Prealloc output cell array.
data = cell( nFiles, nHeaderCols ) ;
% - Iterate through files and process.
for fId = 1 : nFiles
% - Read input text file.
inLocator = fullfile(Mainfolder, D_main(dId).name, D_sub(fId).name ) ;
content = fileread( inLocator ) ;
% - Extract relevant data.
rainfallId = str2double( regexp( content, '(?<=RainFallID\s+:\s*)\d+', 'match', 'once' )) ;
iint = regexp( content, '(?<=IINT\s+:\s*)\S+', 'match', 'once' ) ;
rainResult = regexp( content, '(?<=Rain Result\s+:\s*)\S+', 'match', 'once' ) ;
startTime = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?= -)', 'match', 'once' )) ;
param1Pipe = str2double( regexp( content, '(?<=Param1.pipe\s+[\d\.]+\s+\w+\s+)[\d\.]+', 'match', 'once' )) ;
tenUn = str2double( regexp( content, '(?<=10 Un Para2.pipe\s+[\d\.]+\s+\w+\s+)[\d\.]+', 'match', 'once' )) ;
verti2 = regexp( content, '(?<=Verti 2 mixing.dis\s+\S+\s%\s+)\S+', 'match', 'once' ) ;
rateAlarm = strtrim( regexp( content, '(?<=Rate.alarm times\s+\S+\s+)[^\r\n]+', 'match', 'once' )) ;
% - Populate data cell array.
data(fId,:) = {rainfallId, iint, rainResult, startTime, ...
param1Pipe, tenUn, verti2, rateAlarm} ;
end
% - Output to XLSX.
% outLocator = fullfile( 'D:\Mekala_Backupdata\Matlab2010\OutputFolder', sprintf( '%s.xlsx', D_main(dId).name )) ;
outLocator = fullfile(outLocatorFolder, sprintf( '%s.xlsx', D_main(dId).name )) ;
fprintf( 'Output XLSX: %s ..\n', outLocator ) ;
xlswrite( outLocator, [header; data] ) ;
end
many thanks in advance,

Accepted Answer

Image Analyst
Image Analyst on 4 Oct 2017
You need to use in dir() instead of *. See attached demo.

More Answers (1)

Cedric
Cedric on 4 Oct 2017
Edited: Cedric on 4 Oct 2017
Look at the EDIT 4:09pm block in the thread:
update the pseudo-code
Iterate through sub folders of 'Mainfolder'
Iterate through files of sub folder
Extract data from file and store in data array
Export data array to relevant Excel file
specifically for your new problem, and it should show you how to restructure and update the former code. At first remove all the code that is not necessary to crawling through the folders and files, and run it to check that it is crawling as desired.
Big hint: you should be able to add a level of FOR loop. Define D_sub at a strategic place:
for dmId = 1 : numel( D_main )
D_sub = dir( fullfile( Mainfolder, D_main(dmId).name )) ;
D_sub = D_sub(3:end) ; % Eliminate "." and ".."
iterate through its elements (sub-sub-folders):
for dsId = 1 : numel( D_sub )
D_subsub = dir( fullfile( Mainfolder, D_main(dmId).name, D_sub(dsId).name, '*.txt' )) ;
nFiles = numel( D_subsub ) ;
and finally iterate through D_subsub elements (the text files):
for fId = 1 : nFiles
inLocator = fullfile( Mainfolder, D_main(dmId).name, D_sub(dsId).name, D_subsub(fId).name ) ;
content = fileread( inLocator ) ;
Note that if you have a recent version of MATLAB, you can replace most calls to FULLFILE by the value of the folder field of the relevant output of a former DIR, e.g.:
inLocator = fullfile( Mainfolder, D_main(dmId).name, D_sub(dsId).name, D_subsub(fId).name ) ;
could be replaced by:
inLocator = fullfile( D_subsub(fId).folder, D_subsub(fId).name ) ;
Finally, note that if you have a lot of different situations with varying depths of nested folders, a better approach would be to build a recursive crawler, but this is a bit more complex.
  4 Comments
Cedric
Cedric on 4 Oct 2017
Edited: Cedric on 4 Oct 2017
You should index D_main with dmId when you generate the output locator. When I wrote the hints above with an additional level of loop, I changed the name of the loop index variables to make them more consistent: dmId for "dir main ID" and dsId for "dir sub ID".

Sign in to comment.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!