For loop only working/filling cell array for half of data

2 views (last 30 days)
I am trying to use a for loop to fill a cell array containing tables with various statistics (e.g. mean, median ...) for sites within a large dataset.
The aim is to end up with a cell array 1x42, with a table for each variable.
The loop seems to only work for the first 16 variables. The remaining tables are empty. However, if I run the same loop specifiying a single variable (eg. i = 20), the code works and that output gives a filled table.
Code and input data are attached.
clear variables; clc; load x.mat;
for i = 1:(size(x,2))
x = x(~isnan(table2array(x(:,i))),:);
[site_num,ia,obs_count] = unique(x.site_num,'sorted');
ans_mean = accumarray(obs_count,table2array(x(:,i)),[],@(x)mean(x,'omitnan')); ans_mean = [array2table(ans_mean)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_mean.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_mean = renamevars(ans_mean,'ans_mean',header);
ans_median = accumarray(obs_count,table2array(x(:,i)),[],@(x)median(x,'omitnan')); ans_median = [array2table(ans_median)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_median.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_median = renamevars(ans_median,'ans_median',header);
ans_std = accumarray(obs_count,table2array(x(:,i)),[],@(x)std(x,'omitnan')); ans_std = [array2table(ans_std)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_std.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_std = renamevars(ans_std,'ans_std',header);
ans_lq = accumarray(obs_count,table2array(x(:,i)),[],@(x)quantile(x,0.25)); ans_lq = [array2table(ans_lq)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_lq.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_lq = renamevars(ans_lq,'ans_lq',header);
ans_uq = accumarray(obs_count,table2array(x(:,i)),[],@(x)quantile(x,0.75)); ans_uq = [array2table(ans_uq)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_uq.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_uq = renamevars(ans_uq,'ans_uq',header);
obs_count = array2table(accumarray(obs_count,1)); txt1 = x(:,i).Properties.VariableNames; header = strcat(txt1,{'_'},{'obs_count'}); obs_count = renamevars(obs_count,'Var1',header);
all{i} = [array2table(site_num) ans_mean ans_median ans_std ans_lq ans_uq obs_count];
Any thoughts/help/tips would be greatly appreciated! Thank you!
Apologies if my code is quite inefficient, I'm still in the learning process :)

Accepted Answer

Karim on 11 Nov 2022
Edited: Karim on 12 Nov 2022
One issue was the reuse of the variable name "x" directly after entering the loop, you overwrite your orinal data by removing elements with a nan. After a few loops you are left with no data.
It's better to create a temporary variable, I called it "currData" to extract the data on which your are working in the current loop. I shortend the code a bit and added a few comments.
% load mat file
load(websave('myFile', ""));
% allocate a cell array for the output data
AllData = cell(1,size(x,2));
for i = 1:size(x,2)
% extract data for current loop, and convert to array
% EDIT: included Stephen23's proposal to extract the data
currData = x{:,i};
% figure out which values are a number
NumIdx = ~isnan( currData );
% only keep the numbers for further processing
currData = currData(NumIdx);
% sort the "site num" for the numbers in tha array
[site_num,~,obs_count] = unique(x.site_num(NumIdx) ,'sorted');
% get the name of the current variable
currVarName = x(:,i).Properties.VariableNames + "_";
% do the processing
ans_mean = accumarray(obs_count,currData,[],@(x)mean(x,'omitnan'));
ans_median = accumarray(obs_count,currData,[],@(x)median(x,'omitnan'));
ans_std = accumarray(obs_count,currData,[],@(x)std(x,'omitnan'));
ans_lq = accumarray(obs_count,currData,[],@(x)quantile(x,0.25));
ans_uq = accumarray(obs_count,currData,[],@(x)quantile(x,0.75));
% create the table names for the current variable
varNames = [ currVarName + "site_num";
currVarName + "ans_mean";
currVarName + "ans_median";
currVarName + "ans_std";
currVarName + "ans_lq";
currVarName + "ans_uq"
currVarName + "obs_count";];
% gather the data in a table
currTable = table(site_num, ans_mean, ans_median, ans_std, ans_lq, ans_uq, accumarray(obs_count,1),...
% store the table in the output cell array
AllData{i} = currTable;
% have a look at the data in the output cell
AllData = 1×42 cell array
{130×7 table} {130×7 table} {130×7 table} {130×7 table} {92×7 table} {76×7 table} {57×7 table} {104×7 table} {99×7 table} {53×7 table} {130×7 table} {104×7 table} {67×7 table} {102×7 table} {98×7 table} {45×7 table} {26×7 table} {26×7 table} {18×7 table} {68×7 table} {81×7 table} {69×7 table} {27×7 table} {62×7 table} {9×7 table} {0×7 table} {66×7 table} {66×7 table} {65×7 table} {65×7 table} {15×7 table} {8×7 table} {46×7 table} {27×7 table} {51×7 table} {29×7 table} {51×7 table} {29×7 table} {29×7 table} {17×7 table} {16×7 table} {9×7 table}
  1 Comment
Claudia on 11 Nov 2022
Thank you soooooo much Karim! You have literally saved the day :)
Thank you for your detailed, thoughtful and super helpful answer! I really appreciate it!

Sign in to comment.

More Answers (0)




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!