MATLAB Answers

Error with splitting table into separate columns.

1 view (last 30 days)
I am working with a 616x91 table that I am reorganising using:
x = table1(1:48:end,:);
y = x(:)
a = table1(2:48:end,:);
b = a(:)
and so on.... until
c = table1(48:48:end,:);
d = x(:)
Each data point is repeated (not an exact repition of data, just the data point) every 48 rows, hence the 48.
My issue is that each output variable (y,b,d etc.) variable should be a 1183x1 column however, the last 8 output variables output a 1092x1 column. I.e. the last 8 variables are:
m = table1(41:48:end,:); q = table1(48:48:end,:);
n = m(:) ......to..... p = q(:)
Can anyone explain why this is happening?
  12 Comments
ANDREW Feenan
ANDREW Feenan on 29 Jul 2021
I have attached the file, the first sheet is the data now. The seconds sheet is how I need the data do be displayed.

Sign in to comment.

Accepted Answer

Peter Perkins
Peter Perkins on 29 Jul 2021
This is an application of unstack and then stack. The code looks a lot more complicated than it might if I hard-coded in sizes, as you might. It's really just a call to unstack, a call to stack, and some bookkeeping. Also, you've used the word "Variable", which is gonna get confused with how that word is used in the tabular lexicon, so I'm using "Measurement".
First, your file is somewhat broken. Easy enough to read, but the Person column is not complete. And there are mostly NaNs in the data. Here's what I did:
t1 = readtable("Example.xlsx", ...
"TextType","string", "ReadVariableNames",true, "PreserveVariableNames",true)
people = "Person " + (1:13)';
t1.("Person (First & Surname)") = repelem(people,48);
t1{:,3:end} = rand(height(t1),91);
But let me illustrate with something smaller.
>> people = ["Person1";"Person2";"Person3";"Person4"];
>> meas = ["Measurement1";"Measurement2";"Measurement3"];
>> dates = datetime(2121,7,29:34);
>> X = rand(length(people)*length(meas),length(dates));
>> t0 = table(repelem(people,length(meas),1), repmat(meas,length(people),1));
>> t1 = [t0 array2table(X)];
>> t1.Properties.VariableNames = ["Person" "Measurement" string(dates)]
t1 =
12×8 table
Person Measurement 29-Jul-2121 30-Jul-2121 31-Jul-2121 01-Aug-2121 02-Aug-2121 03-Aug-2121
_________ ______________ ___________ ___________ ___________ ___________ ___________ ___________
"Person1" "Measurement1" 0.86958 0.56376 0.96026 0.42636 0.75018 0.82838
"Person1" "Measurement2" 0.30749 0.26297 0.43406 0.53727 0.86967 0.39818
"Person1" "Measurement3" 0.30016 0.16764 0.36816 0.09737 0.15583 0.407
"Person2" "Measurement1" 0.26145 0.57806 0.70349 0.013457 0.86016 0.81714
"Person2" "Measurement2" 0.093242 0.33359 0.74725 0.077764 0.444 0.73318
"Person2" "Measurement3" 0.40713 0.54292 0.72929 0.42199 0.32273 0.12627
"Person3" "Measurement1" 0.58666 0.31563 0.25499 0.62665 0.22856 0.21707
"Person3" "Measurement2" 0.54294 0.36551 0.48211 0.11886 0.34332 0.077295
"Person3" "Measurement3" 0.55193 0.34819 0.58987 0.13769 0.81084 0.20005
"Person4" "Measurement1" 0.087758 0.14986 0.076465 0.47683 0.36707 0.25788
"Person4" "Measurement2" 0.13081 0.91179 0.58878 0.10389 0.53224 0.44917
"Person4" "Measurement3" 0.99408 0.81306 0.60691 0.14631 0.43607 0.72003
Person and Measurement are categorical data, they should be stored in categorical variables.
>> t1.("Person") = categorical(t1.("Person"));
>> t1.Measurement = categorical(t1.Measurement);
OK, this has data separately for each date, and data for each measurement stacked up within each date. You want the opposite. So unstack on Variable:
>> meas = unique(t1.Measurement,"stable");
>> nmeas = length(meas);
>> dates = datetime(t1.Properties.VariableNames(3:end));
>> ndates = length(dates);
>> nvarsUnstacked = nmeas * length(dates);
>> t2 = unstack(t1,2+(1:ndates),"Measurement")
t2 =
4×19 table
Person 29-Jul-2121_Measurement1 29-Jul-2121_Measurement2 29-Jul-2121_Measurement3 30-Jul-2121_Measurement1 30-Jul-2121_Measurement2 30-Jul-2121_Measurement3 31-Jul-2121_Measurement1 31-Jul-2121_Measurement2 31-Jul-2121_Measurement3 01-Aug-2121_Measurement1 01-Aug-2121_Measurement2 01-Aug-2121_Measurement3 02-Aug-2121_Measurement1 02-Aug-2121_Measurement2 02-Aug-2121_Measurement3 03-Aug-2121_Measurement1 03-Aug-2121_Measurement2 03-Aug-2121_Measurement3
_______ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________
Person1 0.5417 0.10085 0.43765 0.25821 0.2993 0.2469 0.2124 0.66636 0.77015 0.43363 0.46433 0.49833 0.87624 0.49025 0.90014 0.44137 0.33786 0.49158
Person2 0.90169 0.47628 0.84116 0.43142 0.77569 0.59875 0.25473 0.83978 0.99591 0.34608 0.56922 0.33262 0.18738 0.50787 0.070031 0.62675 0.96446 0.50531
Person3 0.93431 0.57748 0.11194 0.51276 0.33596 0.25611 0.4042 0.92685 0.026795 0.82401 0.52511 0.90661 0.23808 0.42686 0.18892 0.28795 0.37948 0.38927
Person4 0.75668 0.79903 0.49847 0.14754 0.95279 0.18124 0.93929 0.72575 0.99809 0.35544 0.080383 0.14538 0.71777 0.13216 0.26415 0.94446 0.40165 0.36561
Now we just need to restack, this time dates within measurements. To do that, we'll need to group the variables in the table by the measurements they correspond to. There are other less arcane ways to do this, but this way is short.
>> varGroups = arrayfun(@(i) {1 + (i:nmeas:nvarsUnstacked)},1:nmeas)
varGroups =
1×3 cell array
{[2 5 8 11 14 17]} {[3 6 9 12 15 18]} {[4 7 10 13 16 19]}
Now we stack those groups of table variables.
>> t3 = stack(t2,varGroups,"NewDataVariableName",string(meas),"IndexVariableName","Date")
t3 =
24×5 table
Person Date Measurement1 Measurement2 Measurement3
_______ ____ ____________ ____________ ____________
Person1 2 0.5417 0.10085 0.43765
Person1 5 0.25821 0.2993 0.2469
Person1 8 0.2124 0.66636 0.77015
Person1 11 0.43363 0.46433 0.49833
Person1 14 0.87624 0.49025 0.90014
Person1 17 0.44137 0.33786 0.49158
Person2 2 0.90169 0.47628 0.84116
Person2 5 0.43142 0.77569 0.59875
Person2 8 0.25473 0.83978 0.99591
Person2 11 0.34608 0.56922 0.33262
Person2 14 0.18738 0.50787 0.070031
Person2 17 0.62675 0.96446 0.50531
Person3 2 0.93431 0.57748 0.11194
Person3 5 0.51276 0.33596 0.25611
Person3 8 0.4042 0.92685 0.026795
Person3 11 0.82401 0.52511 0.90661
Person3 14 0.23808 0.42686 0.18892
Person3 17 0.28795 0.37948 0.38927
Person4 2 0.75668 0.79903 0.49847
Person4 5 0.14754 0.95279 0.18124
Person4 8 0.93929 0.72575 0.99809
Person4 11 0.35544 0.080383 0.14538
Person4 14 0.71777 0.13216 0.26415
Person4 17 0.94446 0.40165 0.36561
Where's the dates go? They are there, or at least their var indices into t2 are. Fix that up, then add the weekday name, and sort on date.
>> dateInds = ceil(t3.Date/nmeas);
t3.Date = dates(dateInds)';
t3 = sortrows(t3,"Date");
t3 = addvars(t3,categorical(day(t3.Date,"name")),'After',"Date",'NewVariableName',"Weekday")
t3 =
24×6 table
Person Date Weekday Measurement1 Measurement2 Measurement3
_______ ___________ _________ ____________ ____________ ____________
Person1 29-Jul-2121 Tuesday 0.5417 0.10085 0.43765
Person2 29-Jul-2121 Tuesday 0.90169 0.47628 0.84116
Person3 29-Jul-2121 Tuesday 0.93431 0.57748 0.11194
Person4 29-Jul-2121 Tuesday 0.75668 0.79903 0.49847
Person1 30-Jul-2121 Wednesday 0.25821 0.2993 0.2469
Person2 30-Jul-2121 Wednesday 0.43142 0.77569 0.59875
Person3 30-Jul-2121 Wednesday 0.51276 0.33596 0.25611
Person4 30-Jul-2121 Wednesday 0.14754 0.95279 0.18124
Person1 31-Jul-2121 Thursday 0.2124 0.66636 0.77015
Person2 31-Jul-2121 Thursday 0.25473 0.83978 0.99591
Person3 31-Jul-2121 Thursday 0.4042 0.92685 0.026795
Person4 31-Jul-2121 Thursday 0.93929 0.72575 0.99809
Person1 01-Aug-2121 Friday 0.43363 0.46433 0.49833
Person2 01-Aug-2121 Friday 0.34608 0.56922 0.33262
Person3 01-Aug-2121 Friday 0.82401 0.52511 0.90661
Person4 01-Aug-2121 Friday 0.35544 0.080383 0.14538
Person1 02-Aug-2121 Saturday 0.87624 0.49025 0.90014
Person2 02-Aug-2121 Saturday 0.18738 0.50787 0.070031
Person3 02-Aug-2121 Saturday 0.23808 0.42686 0.18892
Person4 02-Aug-2121 Saturday 0.71777 0.13216 0.26415
Person1 03-Aug-2121 Sunday 0.44137 0.33786 0.49158
Person2 03-Aug-2121 Sunday 0.62675 0.96446 0.50531
Person3 03-Aug-2121 Sunday 0.28795 0.37948 0.38927
Person4 03-Aug-2121 Sunday 0.94446 0.40165 0.36561
  4 Comments
Peter Perkins
Peter Perkins on 30 Jul 2021
As I said, you have issues in your spreadsheet. I can only assume that you are now using something where "13" and "48" are not the right values.

Sign in to comment.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!