Line numbers when files are being saved in cell array

1 view (last 30 days)
I asked a question here on mathworks previously inquiring how to remove the line numbers from a file. In that case, I was creating smaller files from a larger one and removing the line numbers on each row. My query is today, how might I adapt that code when I am not creating new text files, rather creating a cell array. I have the code that I used previously but the command for removing the line numbers is built into an fprintf statement. I now wish to take this same code and apply it to a cell array. The code I had been using was
str = fgetl(fid); % read the file line by line removing line breaks
[row,~,~,idx] = sscanf(str,'%f');
fprintf(f2d,'%s\n',str(idx+1:end)); %ignore the line number of each row
Obviously this code was more integrated into the original code. My problem arises with trying to insert the '%s\n',str(idx+1:end) code into the generation of the cell array. Is it possible to continue using textscan? As fgetl and sscanf are reading the data already making textscan superfluous. Since there is no command now to create text files, I am unsure where '%s\n',str(idx+1:end) can go. It seems that textscan is a much faster way of reading the data than fgetl as it does not need to go line by line through the data. Is there a method to combine the line number removal method with textscan or must i develop a new way for the code to remove the line number?
filePattern = fullfile(myFolder, '*.asc'); % Call all files with '.asc' from the chosen folder
Files = dir(filePattern); % list folder contents
finishCell = cell(length(Files));
for K = 1 : length(Files) % for all files files in the folder
baseFileName = Files(K).name;
FileName = fullfile(myFolder, baseFileName);
fid = fopen(FileName); % open the file from chosen folder
str = fgetl(fid); % read the file line by line removing line breaks
[row,~,~,idx] = sscanf(str,'%f'); % ignore the line number of each row
Cell = textscan( fid, '%f', 'delimiter', ';'); % scanning data from files
fclose(fid); % close file from chosen folder
data = cell2mat(Cell); % convert the cell data to matrix
N = 1024; % Number of numbers per row
Finish0 = reshape(data, N, [])'; % reshape the data into the correct format
finishCell{K} = Finish0;
end
Essentially, is it possible to remove line numbers from files and then save them in a cell array without making new text files?
  4 Comments
Stephen23
Stephen23 on 12 Jun 2017
Edited: Stephen23 on 12 Jun 2017
@Aaron Smith: I think you should approach this as a new task, rather than trying to adapt that code that I gave you earlier. Your original file was, if my memory serves me correctly, a text file of about 200 megabytes, with 1025*1024 values in each row. Processing all 200 MB by importing it as numeric data proved to be troublesome, which is why I showed you how to split in into smaller files by reading and writing each line as text: not particularly fast, but it avoided all of the "out of memory" errors.
Now you have smaller files that could conceivably be handled by some numeric importation operator, and depending on what your goals are this might be a better solution for the smaller files.
Summary: do not adapt the old code (it had a very different purpose). Trying the standard MATLAB numeric importation functions would be worthwhile. Or using tall arrays.
Could you please:
  1. upload two cut-down (not full size) files in a new comment.
  2. describe how you want the data to be once it is in MATLAB memory: do you want it as numeric data, or kept as string data?
Aaron Smith
Aaron Smith on 13 Jun 2017
I have attached two cut down files, basically a few cull rows of data from two files. I need the data to be numeric, saved as matrices inside a cell array so that i do not need to create new text files. It will help with keeping memory and not cluttering my desktop with multiple similar versions of the same data. The code I added in the original question is doing what i need it to do, i just need there to be one less value on each row.

Sign in to comment.

Answers (1)

Stephen23
Stephen23 on 13 Jun 2017
Edited: Stephen23 on 26 Apr 2021
This simple code works on the cut-down files, you can try it on the full-size files and see what happens. I used dlmread to read the entire numeric array (this is fast and efficient), and then simply ignore the first column (easily using indexing):
P = 'absolute/relative path to where the files are saved';
S = dir(fullfile(P,'cut down *.txt'));
S = natsortfiles(S); % optional, see below.
for k = 1:numel(S)
M = dlmread(fullfile(P,S(k).name),';');
S(k).data = M(:,2:end);
end
Giving:
size(S)
ans = 1×2
2 1
size(S(1).data)
ans = 1×2
8 1025
size(S(2).data)
ans = 1×2
15 1025
S(1).data
ans = 8×1025
765.44 0 0 1148.2 765.44 382.72 382.72 1148.2 382.72 1148.2 1913.6 382.72 765.44 382.72 382.72 1148.2 382.72 0 0 382.72 1148.2 0 765.44 382.72 765.44 382.72 382.72 765.44 382.72 382.72 0 382.72 382.72 3827.2 0 382.72 382.72 765.44 382.72 765.44 765.44 765.44 382.72 0 765.44 765.44 382.72 765.44 382.72 1148.2 382.72 765.44 765.44 382.72 1148.2 382.72 0 1530.9 382.72 382.72 382.72 0 382.72 382.72 765.44 382.72 382.72 0 765.44 382.72 382.72 382.72 0 382.72 765.44 382.72 0 382.72 382.72 382.72 1148.2 0 0 765.44 765.44 1148.2 765.44 1148.2 382.72 382.72 0 0 0 382.72 382.72 0 382.72 0 382.72 382.72 765.44 382.72 765.44 382.72 382.72 382.72 382.72 382.72 0 22038 765.44 765.44 382.72 382.72 382.72 765.44 382.72 382.72 0 765.44 765.44 0 0 382.72 765.44 0 1148.2 382.72 382.72 382.72 382.72 0 382.72 0 1148.2 0 1148.2 382.72 382.72 765.44 382.72 382.72 382.72 0 765.44 1530.9 382.72 765.44 0 382.72 382.72 382.72 765.44 765.44 382.72 382.72 382.72 0 1148.2 765.44 382.72 382.72 765.44 0 382.72 382.72 765.44 382.72 765.44 765.44 382.72 382.72 0 765.44 382.72 765.44 765.44 382.72 765.44 765.44 0 1530.9 0 382.72 765.44 765.44 382.72 382.72 382.72 382.72 382.72 382.72 0 382.72 765.44 0 0 765.44 382.72 765.44 765.44 382.72 382.72 765.44 382.72 382.72 382.72 765.44 382.72 0 0 0 382.72 382.72 0 382.72 382.72 382.72 765.44 382.72 382.72 765.44 382.72 382.72 1148.2 382.72 765.44 765.44 765.44 765.44 765.44 382.72 1148.2 765.44 382.72 765.44 1148.2 382.72 382.72 765.44
Note that I used my FEX submission natsortfiles to sort the filenames into numeric order. You can download natsortfiles here:
The files I used for testing are attached here:
  20 Comments
Aaron Smith
Aaron Smith on 29 Jun 2017
File = uigetfile('C:\Users\c13459232\Documents\MATLAB\Fixing this\Bulk');
fid = fopen(File);
N = 1025;
formatspec = '%f';
k= 0;
C = cell(size(k));
while ~feof(fid)
k = k+1;
vec = textscan(fid,formatSpec,N,'Delimiter',';');
C{k}(end+1,:) = vec(2:end);
end
fclose(fid);
Error using feof
Invalid file identifier. Use fopen to generate a valid file identifier.
I tried using the textscan in blocks method and combining it with the code you wrote to create the cell array while removing the line numbers.
I am not sure how I could remove the line numbers on the data when using tall arrays, unless I do it beforehand and then save the data as tall arrays. This still encounters the problem of running out of memory before the data can be fully processed and formatted
Stephen23
Stephen23 on 29 Jun 2017
Edited: Stephen23 on 26 Apr 2021
@Aaron Smith: the error that you show above has exactly the same cause as all the other times that you have shown this error. You need to pass the path data to any function that you want to use to open/read that file.
(hint one: uigetfile has three outputs. hint two: read the documentation for functions you are using).

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!