How to create separate tables from a data set
    7 views (last 30 days)
  
       Show older comments
    
Working with gait analysis data, the exported raw file separates the data into four "tables" on top of each other (Joints, Model outputs, Segments and Trajectories).
When I use the readtable function, it identifies the variables from the top table (Joints) but not the other tables and instead has 5 rows of NaN separating the tables.
How do I get my code to identify and separate the data into easier to read tables.
(Attached is the file as a .txt, the file was too large to attach as .csv which is what I've been working with)
3 Comments
  Image Analyst
      
      
 on 10 Feb 2021
				Please explain what column numbers of that are supposed to be what table.  And do you want an array of tables (one table for each patient) or do you want all patients in the same table with the patient ID as one of the columns (so there is just a single set of 4 tables)?
Accepted Answer
  dpb
      
      
 on 10 Feb 2021
        
      Edited: dpb
      
      
 on 11 Feb 2021
  
      Boy! is that a mess!  Why people/vendors do such things is beyond ken.
Another start...
G=readcell('Level Reduced.csv');            % bring in whole thing as cell array
SECTIONS={'Joints','Model','Segments','Trajectories'};  % the looked for data sections
ich=find(cellfun(@ischar,G(:,1)));          % the records that have char() data first column
isec=ich(contains(G(ich,1),SECTIONS));      % the locations of each section beginning
isec=[isec;size(G,1)+2];                    % add last line for indexing sections first:last lines
for i=1:numel(SECTIONS)
  vn=string(G(isec(i)+3,:));
  in=find(ismissing(vn));
  vn(in)="Var"+in;
  cmd=sprintf('t%s=cell2table(G(isec(%d)+5:isec(%d)-2,:))',SECTIONS{i},i,i+1);
  eval(cmd)
end
The above results in four tables (I cut the size of the sample file down even further to just three records per group) named per the SECTIONS array; normally I would never use eval to "poof" variables into the workplace, but this seems to be the unusual case where it does make some sense.  Code from here on would need to (and would presume would want to) use the explicit table names as it appears each group is different-enough as that code would not be duplicated.
tJoints =
  3×25 table
     Var1     Var2    Var3         Var4          Var5     Var6      Var7     Var8         Var9         Var10    Var11     Var12    Var13     Var14     Var15    Var16    Var17    Var18    Var19    Var20     Var21    Var22    Var23     Var24      Var25 
    ______    ____    _____    _____________    ______    _____    ______    _____    _____________    _____    ______    _____    ______    ______    _____    _____    _____    _____    _____    ______    _____    _____    ______    ______    _______
    123.00    0.00    41.07    [1×1 missing]    -23.61    -0.50    -52.06    11.76    [1×1 missing]    33.19    -23.97    26.34    -24.32    -19.67    0.78     -9.17    -4.21    3.50     23.78    -70.89    19.20    48.85    897.59    211.86        NaN
    124.00    0.00    40.01    [1×1 missing]    -23.56    -0.31    -51.82    11.77    [1×1 missing]    33.21    -23.97    26.64    -24.50    -19.55    0.99     -9.05    -4.14    3.77     23.71    -70.82    18.92    49.97    896.71    211.33    -106.03
    125.00    0.00    38.65    [1×1 missing]    -23.43    -0.10    -51.42    11.78    [1×1 missing]    33.20    -23.99    26.82    -24.56    -19.54    1.27     -8.98    -4.06    4.09     23.83    -70.81    18.81    50.99    896.06    210.57    -136.10
tModel =
  3×25 table
     Var1     Var2    Var3    Var4    Var5    Var6    Var7     Var8        Var9             Var10            Var11            Var12            Var13            Var14            Var15            Var16            Var17        Var18    Var19     Var20     Var21    Var22     Var23     Var24     Var25 
    ______    ____    ____    ____    ____    ____    _____    ____    _____________    _____________    _____________    _____________    _____________    _____________    _____________    _____________    _____________    _____    ______    ______    _____    ______    ______    ______    ______
    123.00    0.00    8.94    0.00    0.00    8.97    -0.53    4.86    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    7.14     -10.27    -11.83    79.67    314.58    492.79    400.52    400.52
    124.00    0.00    8.97    0.00    0.00    9.00    -0.47    4.71    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    7.11     -10.41    -11.56    79.93    314.35    493.12    400.61    400.61
    125.00    0.00    8.93    0.00    0.00    8.96    -0.45    4.66    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    7.06     -10.53    -11.23    80.12    313.99    493.47    400.64    400.64
tSegments =
  3×25 table
     Var1     Var2    Var3     Var4     Var5      Var6      Var7      Var8     Var9     Var10     Var11     Var12     Var13     Var14     Var15    Var16    Var17    Var18    Var19     Var20     Var21    Var22     Var23    Var24    Var25 
    ______    ____    _____    _____    _____    ______    ______    ______    _____    ______    _____    _______    ______    ______    _____    _____    _____    _____    ______    ______    _____    ______    _____    _____    ______
    123.00    0.00    59.21    54.50    77.89    -78.23    302.03    892.00    37.75    100.37    28.41    -122.44    343.55    135.89    89.18    85.27    58.57    41.15    322.18    500.28    37.75    100.37    28.41    79.86    342.85
    124.00    0.00    58.96    54.49    78.23    -79.49    301.43    890.79    37.24     99.29    29.23    -114.01    343.27    131.59    88.12    84.58    59.54    41.39    322.36    499.57    37.24     99.29    29.23    89.46    344.06
    125.00    0.00    58.88    54.47    78.42    -80.55    300.64    889.76    36.97     98.03    30.11    -105.09    343.29    126.97    87.06    83.63    60.52    40.94    322.21    498.77    36.97     98.03    30.11    99.54    345.12
tTrajectories =
  3×25 table
     Var1     Var2    Var3      Var4      Var5     Var6     Var7      Var8      Var9      Var10     Var11      Var12     Var13     Var14     Var15    Var16     Var17     Var18    Var19     Var20     Var21     Var22     Var23     Var24     Var25 
    ______    ____    _____    ______    ______    _____    _____    ______    _______    ______    ______    _______    ______    ______    _____    ______    ______    _____    ______    ______    ______    ______    ______    ______    ______
    123.00    0.00    50.81    359.27    970.07    42.66    46.04    946.57    -172.71    252.49    978.63    -173.44    164.97    976.32    56.60    392.62    730.53    92.55    379.65    502.24    -20.02    369.62    289.29    -75.38    361.01
    124.00    0.00    50.13    358.90    970.59    42.10    45.82    946.35    -173.70    252.02    978.65    -174.31    164.49    976.29    55.85    392.45    730.88    92.50    379.48    502.53    -15.05    369.16    286.28    -67.52    361.31
    125.00    0.00    49.65    358.41    971.07    41.52    45.60    946.12    -174.70    251.56    978.75    -175.16    164.03    976.30    55.03    392.30    731.20    92.30    379.21    502.81    -10.04    368.90    283.43    -59.52    361.58
>> 
I tried to use the records beginning with 'Frame' as variable names, but that's fraught with issues -- and didn't have time to sort them all out...that's what the code for variable vn in the loop does; one would use
tName.Properties.VariableNames=vn;
% or
,'VariableNames',vn     % add to argument list of |cell2table| for variable names
on creation, but
>> tTrajectories.Properties.VariableNames=vn;
Duplicate table variable name: 'X'. 
>> 
so one will have have some naming logic to build a unique set of names for each group of repeated names for each table.  I didn't have the time to create that code at the moment.
The above code for vn does create a default VarN name for missing values that occur in at least the first section, but the duplicates was where I then had to resign from the fray somewhat bloodied but mostly just out of time to spare just now.
3 Comments
  dpb
      
      
 on 10 Jun 2021
				The file you attached ran to completion here...created the four expected tables.
Wherever the problem is, it must be in the part you didn't include in the posting.
More Answers (1)
  Mathieu NOE
      
 on 10 Feb 2021
        hello 
here below my suggestion to split the large csv file into smaller txt files 
they may be smarter use of regexp but this is what I can offer quick and dirty
seems you can now easily do a second loop to load these files with readtable and generate your multiples tables from there;
hope it helps,  even if it can certainly be further refined 
% Tell it what folder you want to put the files in
outdir = cd;
% Read the initial file in all at once
filename = 'Level Reduced.csv';
fid = fopen(filename, 'r');
data = fread(fid, '*char').';
fclose(fid);
% Break it into pieces based upon ',,,,,,,,,,,,,,,,,,,,,,,,' lines
pieces = regexp(data, '\n\s*\n', 'split');
[match,piece,startIndex,endIndex] = regexp(data,',,,,,,,,,,,,,,,,,,,,,,,,','match','split');
% Now loop through and save each one
n = 0;
for k = 1:numel(piece)
    if size(piece{k},2) > 100 % so avoid storing empty files or single line (title) 
        n = n + 1;
        filename = fullfile(outdir, ['out' num2str(n), '.txt']);
        % Now write the piece to the file
        fid = fopen(filename, 'w');
        fwrite(fid, piece{k});
        fclose(fid);
    end
end
1 Comment
  Mathieu NOE
      
 on 10 Feb 2021
				just noticed this line has to be removed : 
pieces = regexp(data, '\n\s*\n', 'split');
See Also
Categories
				Find more on Tables in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!