How to create separate tables from a data set

Question

0 votes

Working with gait analysis data, the exported raw file separates the data into four "tables" on top of each other (Joints, Model outputs, Segments and Trajectories).

When I use the readtable function, it identifies the variables from the top table (Joints) but not the other tables and instead has 5 rows of NaN separating the tables.

How do I get my code to identify and separate the data into easier to read tables.

(Attached is the file as a .txt, the file was too large to attach as .csv which is what I've been working with)

3 Comments
Show 1 older comment Hide 1 older comment

Jak on 10 Feb 2021

Level Reduced.csv

Yeh you're right, that is much simpler. Thanks.

You can find it attached here

Image Analyst on 10 Feb 2021

Please explain what column numbers of that are supposed to be what table. And do you want an array of tables (one table for each patient) or do you want all patients in the same table with the patient ID as one of the columns (so there is just a single set of 4 tables)?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 10 Feb 2021

Edited: dpb on 11 Feb 2021

Open in MATLAB Online

1 vote

Boy! is that a mess! Why people/vendors do such things is beyond ken.

Another start...

G=readcell('Level Reduced.csv');            % bring in whole thing as cell array
SECTIONS={'Joints','Model','Segments','Trajectories'};  % the looked for data sections
ich=find(cellfun(@ischar,G(:,1)));          % the records that have char() data first column
isec=ich(contains(G(ich,1),SECTIONS));      % the locations of each section beginning
isec=[isec;size(G,1)+2];                    % add last line for indexing sections first:last lines
for i=1:numel(SECTIONS)
  vn=string(G(isec(i)+3,:));
  in=find(ismissing(vn));
  vn(in)="Var"+in;
  cmd=sprintf('t%s=cell2table(G(isec(%d)+5:isec(%d)-2,:))',SECTIONS{i},i,i+1);
  eval(cmd)
end

The above results in four tables (I cut the size of the sample file down even further to just three records per group) named per the SECTIONS array; normally I would never use eval to "poof" variables into the workplace, but this seems to be the unusual case where it does make some sense. Code from here on would need to (and would presume would want to) use the explicit table names as it appears each group is different-enough as that code would not be duplicated.

tJoints =
  3×25 table
     Var1     Var2    Var3         Var4          Var5     Var6      Var7     Var8         Var9         Var10    Var11     Var12    Var13     Var14     Var15    Var16    Var17    Var18    Var19    Var20     Var21    Var22    Var23     Var24      Var25 
    ______    ____    _____    _____________    ______    _____    ______    _____    _____________    _____    ______    _____    ______    ______    _____    _____    _____    _____    _____    ______    _____    _____    ______    ______    _______
    123.00    0.00    41.07    [1×1 missing]    -23.61    -0.50    -52.06    11.76    [1×1 missing]    33.19    -23.97    26.34    -24.32    -19.67    0.78     -9.17    -4.21    3.50     23.78    -70.89    19.20    48.85    897.59    211.86        NaN
    124.00    0.00    40.01    [1×1 missing]    -23.56    -0.31    -51.82    11.77    [1×1 missing]    33.21    -23.97    26.64    -24.50    -19.55    0.99     -9.05    -4.14    3.77     23.71    -70.82    18.92    49.97    896.71    211.33    -106.03
    125.00    0.00    38.65    [1×1 missing]    -23.43    -0.10    -51.42    11.78    [1×1 missing]    33.20    -23.99    26.82    -24.56    -19.54    1.27     -8.98    -4.06    4.09     23.83    -70.81    18.81    50.99    896.06    210.57    -136.10
tModel =
  3×25 table
     Var1     Var2    Var3    Var4    Var5    Var6    Var7     Var8        Var9             Var10            Var11            Var12            Var13            Var14            Var15            Var16            Var17        Var18    Var19     Var20     Var21    Var22     Var23     Var24     Var25 
    ______    ____    ____    ____    ____    ____    _____    ____    _____________    _____________    _____________    _____________    _____________    _____________    _____________    _____________    _____________    _____    ______    ______    _____    ______    ______    ______    ______
    123.00    0.00    8.94    0.00    0.00    8.97    -0.53    4.86    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    7.14     -10.27    -11.83    79.67    314.58    492.79    400.52    400.52
    124.00    0.00    8.97    0.00    0.00    9.00    -0.47    4.71    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    7.11     -10.41    -11.56    79.93    314.35    493.12    400.61    400.61
    125.00    0.00    8.93    0.00    0.00    8.96    -0.45    4.66    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    [1×1 missing]    7.06     -10.53    -11.23    80.12    313.99    493.47    400.64    400.64
tSegments =
  3×25 table
     Var1     Var2    Var3     Var4     Var5      Var6      Var7      Var8     Var9     Var10     Var11     Var12     Var13     Var14     Var15    Var16    Var17    Var18    Var19     Var20     Var21    Var22     Var23    Var24    Var25 
    ______    ____    _____    _____    _____    ______    ______    ______    _____    ______    _____    _______    ______    ______    _____    _____    _____    _____    ______    ______    _____    ______    _____    _____    ______
    123.00    0.00    59.21    54.50    77.89    -78.23    302.03    892.00    37.75    100.37    28.41    -122.44    343.55    135.89    89.18    85.27    58.57    41.15    322.18    500.28    37.75    100.37    28.41    79.86    342.85
    124.00    0.00    58.96    54.49    78.23    -79.49    301.43    890.79    37.24     99.29    29.23    -114.01    343.27    131.59    88.12    84.58    59.54    41.39    322.36    499.57    37.24     99.29    29.23    89.46    344.06
    125.00    0.00    58.88    54.47    78.42    -80.55    300.64    889.76    36.97     98.03    30.11    -105.09    343.29    126.97    87.06    83.63    60.52    40.94    322.21    498.77    36.97     98.03    30.11    99.54    345.12
tTrajectories =
  3×25 table
     Var1     Var2    Var3      Var4      Var5     Var6     Var7      Var8      Var9      Var10     Var11      Var12     Var13     Var14     Var15    Var16     Var17     Var18    Var19     Var20     Var21     Var22     Var23     Var24     Var25 
    ______    ____    _____    ______    ______    _____    _____    ______    _______    ______    ______    _______    ______    ______    _____    ______    ______    _____    ______    ______    ______    ______    ______    ______    ______
    123.00    0.00    50.81    359.27    970.07    42.66    46.04    946.57    -172.71    252.49    978.63    -173.44    164.97    976.32    56.60    392.62    730.53    92.55    379.65    502.24    -20.02    369.62    289.29    -75.38    361.01
    124.00    0.00    50.13    358.90    970.59    42.10    45.82    946.35    -173.70    252.02    978.65    -174.31    164.49    976.29    55.85    392.45    730.88    92.50    379.48    502.53    -15.05    369.16    286.28    -67.52    361.31
    125.00    0.00    49.65    358.41    971.07    41.52    45.60    946.12    -174.70    251.56    978.75    -175.16    164.03    976.30    55.03    392.30    731.20    92.30    379.21    502.81    -10.04    368.90    283.43    -59.52    361.58
>> 

I tried to use the records beginning with 'Frame' as variable names, but that's fraught with issues -- and didn't have time to sort them all out...that's what the code for variable vn in the loop does; one would use

tName.Properties.VariableNames=vn;
% or
,'VariableNames',vn     % add to argument list of |cell2table| for variable names

on creation, but

>> tTrajectories.Properties.VariableNames=vn;
Duplicate table variable name: 'X'. 
>> 

so one will have have some naming logic to build a unique set of names for each group of repeated names for each table. I didn't have the time to create that code at the moment.

The above code for vn does create a default VarN name for missing values that occur in at least the first section, but the duplicates was where I then had to resign from the fray somewhat bloodied but mostly just out of time to spare just now.

3 Comments
Show 1 older comment Hide 1 older comment

Jak on 10 Jun 2021

Open in MATLAB Online

LevelWalk reduced.txt

Hi again,

The code you sent was brilliant, and worked wonders on the dataset i sent. However it seems to be struggling with this other dataset and I'm not sure why.

I'm getting the error

Index in position 1 exceeds array bounds (must not exceed 12895).
Error in Raw_Data_Split (line 8)
    vn=string(G(isec(i)+3,:));

I figure its to do with the length of the tables they are creating as the structure of the dataset is the same the previous one. Whilst probably not ideal, I usually use the "try" function when the array bounds are exceeded, however in this instance it means only the first table (tJoints) is created.

Attached is the new dataset for reference. It has been reduced though.

dpb on 10 Jun 2021

The file you attached ran to completion here...created the four expected tables.

Wherever the problem is, it must be in the part you didn't include in the posting.

Sign in to comment.

Answer 2

Mathieu NOE on 10 Feb 2021

Open in MATLAB Online

1 vote

hello

here below my suggestion to split the large csv file into smaller txt files

they may be smarter use of regexp but this is what I can offer quick and dirty

seems you can now easily do a second loop to load these files with readtable and generate your multiples tables from there;

hope it helps, even if it can certainly be further refined

% Tell it what folder you want to put the files in
outdir = cd;
% Read the initial file in all at once
filename = 'Level Reduced.csv';
fid = fopen(filename, 'r');
data = fread(fid, '*char').';
fclose(fid);
% Break it into pieces based upon ',,,,,,,,,,,,,,,,,,,,,,,,' lines
pieces = regexp(data, '\n\s*\n', 'split');
[match,piece,startIndex,endIndex] = regexp(data,',,,,,,,,,,,,,,,,,,,,,,,,','match','split');
% Now loop through and save each one
n = 0;
for k = 1:numel(piece)
    if size(piece{k},2) > 100 % so avoid storing empty files or single line (title) 
        n = n + 1;
        filename = fullfile(outdir, ['out' num2str(n), '.txt']);
        % Now write the piece to the file
        fid = fopen(filename, 'w');
        fwrite(fid, piece{k});
        fclose(fid);
    end
end

1 Comment
Show -1 older comments Hide -1 older comments

Mathieu NOE on 10 Feb 2021

Open in MATLAB Online

just noticed this line has to be removed :

pieces = regexp(data, '\n\s*\n', 'split');

Sign in to comment.

How to create separate tables from a data set

3 Comments
Show 1 older comment Hide 1 older comment

Accepted Answer

3 Comments
Show 1 older comment Hide 1 older comment

More Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

How to create separate tables from a data set

3 Comments Show 1 older comment Hide 1 older comment

Accepted Answer

3 Comments Show 1 older comment Hide 1 older comment

More Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

3 Comments
Show 1 older comment Hide 1 older comment

3 Comments
Show 1 older comment Hide 1 older comment

1 Comment
Show -1 older comments Hide -1 older comments