Grouping Variables using rowfun Function Rearranging Order to Alphabetical

How does one keep the original sequencing of variables used for matching in the rowfun?
% Guillaume's Code from Prior Question
data = readtable('ML_Q_3.xlsx', 'ReadVariableNames', false);
%optionally, give better name to the variables of the table
%data.Properties.VariableNames = {'???', '????'}
data = rmmissing(data); %get rid of empty rows
sorted = rowfun(@(v) {v}, data, 'GroupingVariables', 1); %group column 2 by column 1
The resulting sorted names are in alphabetical order (col 1) ['A','B','F','R','X'] .
Where I require it to be per the original ['B','F','R','X','A'] .
I can not find anything in the documentation rowfun how to specify to keep the original order.

 Accepted Answer

unique() with 'stable' option. Take the third output and use it as the grouping for splitapply()

8 Comments

Hi Walter,
Can you please elaborate what you mean?
I tried ammending the
sorted = rowfun(@(v) {v}, data, 'GroupingVariables', 1);
to
sorted = rowfun(@(v) {v}, unique(data, 'stable'), 'GroupingVariables', 1);
which does not resolve the problem.
I also require the "csorted" table for the "Group Count" as a requirement that is currently outputted from the "csorted" return (if that makes any difference).
I think I understand what you mean after reading the documentation several times.
Sequence of commands used:
1. Use Unique(data,'stable') ordering the 2nd col by the first (treated as the "key variables")
2. Use the 3rd output (sorted) which puts the key variables in alphabetical order.
3. Use splitapply(func,sorted,data(:,1)) to assign the values in the sorted array to same order to that of column 1 of the data table.
Problem is the unique() function does not allow duplicate values (omits them).
I require duplicates values, is there a command that is similar to unique() but does not remove duplicate values?
"when the command unique (data,'stable') is executed, any variable with the same value is omited in the output."
Nothing is "omitted" from unique's third output. Take a look at this simple example:
>> C = {'B';'A';'C';'A'}
C =
'B'
'A'
'C'
'A'
>> [~,~,X] = unique(C) % sorted
X =
2
1
3
1
>> [~,~,X] = unique(C,'stable')
X =
1
2
3
2
"I require duplicates values so what command will do this in unique's place?"
The third output of unique gives you exactly the indices that you need to group the data in the order that you want (which you can then use as splitapply's input G).
My appologies Walter,
Using:
data = rmmissing(data); %get rid of empty rows
% "rowfun" applies function to table or timetable rows.
%%
% ORIGINAL DON'T TOUCH
%sorted = rowfun(@(v) {v}, data, 'GroupingVariables', 1); %group column 2 by column 1
%
%%
sorted = unique(data,'stable')
sorted2 = rowfun(@(v) {v}, data, 'GroupingVariables', 1); %group column 2 by column 1
"sorted" has a different row value (less [22 x2]) than the "sorted2" (27 rows) value.
When reviewing which rows were being removed (sorted) in the main program, the first key variable was being removed, but all key variables (sorted2), GroupCount (summed) and Var3 have the same number rows as the data table.
Read Walter Roberson's answer again (here is the relevant part): "Take the third output..."
Look at my comment, where I used the third output:
[~,~,X] = unique(...)
% ^ The THIRD output.
Now take a look at your code:
sorted = unique(data,'stable')
Do you see the difference? You used the first output, and ignored the third output completely. You also used rowfun rather than splitapply that Walter Roberson recommended (which requires only two lines of code).
PS: Note that you should not sort the entire table, just the table variable (column) that is relevant (you mentioned the first column).
Hi Stephen,
I asked Walter to please elaborate on his instuctions as I did not understand what he was advising.
"unique() with 'stable' option. Take the third output and use it as the grouping for splitapply()" did not make sense to me as to what he was instructing.
Based on nothing more than Walters instructions I had an array created using the unique function, I.E.
sorted = unique(data,'stable')
I had no idea I had to then specify an array the likes of
[~,~,X] = unique(...)
% ^ The THIRD output.
to obtain "index vectors".
I did what I did (using the commands recommended) based on what I thought were the variables needed (after reading the commands documentation and trial and error).
My apologies for the miscommunication.
Stephen's elaboration of what I wrote is correct. You need the third of the three possible outputs from unique() in order to use as the grouping variable for splitapply()
data = readtable('ML_Q_2_2.xlsx', 'ReadVariableNames', false);
data = rmmissing(data); %get rid of empty rows
[~,~,X] = unique(data,'stable')
output = splitapply(@sum,data, X) % Error returned
Error returned for "output":
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
Also, "X" is returning a vector based on the values in the second column and not the key variables of the first column (data):
I.E.
[1 2 3 4 5 1 6 7 8 9 1 10 11 12 13 1 14 15 16 1 17 18 19 1 20 21 22 ].
Not [1 2 3 4 5].
I resolved this by specifying the first column
[~,~,X] = unique(data(:,1),'stable')
I then used the following code to sum the values of the second column to their correlating key variables.
grouped = splitapply(@sum,data(:,2), X)
Thank you for pointing me in the right direction Walter.

Sign in to comment.

More Answers (0)

Asked:

Jay
on 24 Jul 2019

Commented:

Jay
on 25 Jul 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!