Editing a Tall table and writing it into a csv file

2 views (last 30 days)
Hi,
I have a really large csv file (about 6 millions rows and 30 columns). I want to edit specific columns of this file and save the changes.
I tried creating a tall table from a datastore, extracting and manipulating the relevant columns, and then assigning them into the table. However, when i attemp to write the new tall table, I get the following error:
Error using tall/subsasgn (line 29)
Incompatible tall array arguments. The first dimension in each tall array
must have the same size, or have a size of 1.
that's even though I had no problems editing the table before attempting to write it.
relevant code:
%get csv file
[file,path] = uigetfile('*.csv');
source = [path,file];
%% create tall table
ds = datastore(source);
ds.TextscanFormats{1} = '%s';
ds.Delimiter = ',';
tTable = tall(ds);
%retrieve relevant column data
colX = gather(tTable.colX);
Flag = gather(tTable.Flag);
combinedFlag = colX2flag(Flag,colX); %this is a function that manipulates the data
combinedFlag = tall(combinedFlag);
%%
% put data back into table
tTable.colX(:) = combinedFlag;
tTable.colY(:) = combinedFlag;
tTable.colZ(:) = combinedFlag;
%%
write('C:\Users\.......\test_*.csv',tTable); %obviously no ..... in the actual code
In addition, if I try to write tTable without any manipulation, it splits the result into many csv files. is there a way to save all the data into just one file?
  4 Comments
Guillaume
Guillaume on 17 Mar 2020
Right, so the problem is actually from the line:
tTable.colX(:) = combinedFlag;
matlab only goes through the actual assignment once you call write, hence why you don't receive an error on the actual line, but that's where the problem is. It seems that your combinedFlag doesn't have the same number of rows as the original array, which indeed is a problem.
Rachel Leber
Rachel Leber on 17 Mar 2020
Thank you for the quick responses.
It is actually the exact same size. I was able to solve this by gathering the entire table, editing it as a regular table and then calling tall and write. Regarding writing the entire table into one csv file, i'm rather unfamilliar with tall tables and parallel processing - would you kindly elaborate on your suggestion to ignore filenames?

Sign in to comment.

Answers (2)

Guillaume
Guillaume on 17 Mar 2020
"It is actually the exact same size"
If I recall correctly, you do indeed get some misleading error messages when you try to combine different tall arrays from different datastores, which is the case here (your combinedflag tall array is completely disconnected from the original tall array since you've been through a gather). My understanding is that combining tall arrays like that is not supported.
To fix the problem, you would have to get rid of the gather and modify your colX2flag function so that it can operates directly on tall arrays.
However, since you have enough memory to gather the entire table, there's no point in using tall arrays. You can just use regular tables which would solve all your problems:
%get csv file
[file,path] = uigetfile('*.csv');
source = fullfile(path, file); %prefer fullfile to concatenation
tTable = readtable(source, 'Delimiter', ',');
combinedFlag = colX2flag(tTable.Flag, tTable.colX);
tTable.colX(:) = combinedFlag;
tTable.colY(:) = combinedFlag;
tTable.colZ(:) = combinedFlag;
writetable(tTable, 'C:\somewhere\test.csv');

Tom W
Tom W on 7 Jan 2021
Did you figure it out?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!