Selective columns from multiple text folders
1 view (last 30 days)
Show older comments
Afshin sadeghi
on 10 Mar 2021
Commented: Afshin sadeghi
on 15 Mar 2021
Hey guys,
I would really appreciate your help here.
I have a dataset containing 400 folders whit one text file inside. the text file has 13 columns and I want one of them! at the end, I want a text file with 400 columns. so far, I have it for one folder (by help of importer!) but I dont know how to implement the loop. folders and the text files (same name) have this order:
m0000000
m0000001
m0000002
...
m0000399
Here is the code so far ...
filename = 'D:\work\1ST EXP\4.2\m0000000\m0000000.TXT';
delimiter = '\t';
formatSpec = '%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
fclose(fileID);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
rawData = dataArray{1};
for row=1:size(rawData, 1);
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
invalidThousandsSeparator = false;
if any(numbers==',');
thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
end
end
if ~invalidThousandsSeparator;
numbers = textscan(strrep(numbers, ',', ''), '%f');
numericData(row, 1) = numbers{1};
raw{row, 1} = numbers{1};
end
catch me
end
end
%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%% Allocate imported array to column variable names
Rho = cell2mat(raw(:, 1));
%clearvars filename delimiter formatSpec fileID dataArray ans raw col numericData rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me R;
diary merged.txt
Rho
2 Comments
ANKUR KUMAR
on 10 Mar 2021
Could you please attach one of the text files, it would help us to help you.
Accepted Answer
Stephen23
on 11 Mar 2021
Edited: Stephen23
on 11 Mar 2021
Here is one simple and efficient approach (untested, but should get you started):
P = 'D:\work\1ST EXP\4.2';
V = 0:399;
N = numel(V);
C = cell(1,N);
for k = 1:N
F = sprintf('m%07d',V(k));
F = fullfile(P,F,sprintf('%s.txt',F));
T = readtable(F,'VariableNamingRule','preserve');
C{k} = T.Rho;
end
M = [C{:}] % only if all files have the same number of rows.
You can then save matrix M, e.g.:
writematrix(M,'myfile.txt')
Just out of interests sake, why do you need that complex string parsing and thousand's separator handling? The sample file does not include any thousands separators that I can see:
T = readtable('m0000000.txt','VariableNamingRule','preserve')
6 Comments
More Answers (1)
ANKUR KUMAR
on 10 Mar 2021
Edited: ANKUR KUMAR
on 10 Mar 2021
Since you have not attached a text file, I just put a sample text file having multiple rows in multiple folders, and use the below code to have a matrix containing the first columns from all files.
files = dir('D:\matlab_ask')
dirFlags = [files.isdir];
subFolders = files(dirFlags);
for i =3:length(subFolders)
cd(subFolders(i).name)
filename = 'test.txt';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, '%f%*s%[^\n\r]', 'Delimiter', ',');
merged_matrix(:,i-2) = [dataArray{1:end-1}];
cd ..
end
merged_matrix
2 Comments
Stephen23
on 11 Mar 2021
Edited: Stephen23
on 11 Mar 2021
Note that assuming that the first two results from DIR should be discarded is buggy: consider what would happen if the user quite reasonably decides to add a wildcard and fileextension to the DIR match string. Consider also what happens to names of files/folders that might be in the same folder and whos names start with '+' or '.'. This can be easily avoided by writing more robust code:
Note that calling CD like that should be avoided. It is more efficient and more robust to use absolute/relative filenames to access data files. The MATLAB documentation states "Avoid programmatic use of cd, addpath, and rmpath, when possible. Changing the MATLAB path during run time results in code recompilation."
ANKUR KUMAR
on 12 Mar 2021
I am still at learning stage. Thanks for your suggestions. I appreciate it.
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!