Grouping Variables using rowfun Function Rearranging Order to Alphabetical
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
0 votes
How does one keep the original sequencing of variables used for matching in the rowfun?
% Guillaume's Code from Prior Question
data = readtable('ML_Q_3.xlsx', 'ReadVariableNames', false);
%optionally, give better name to the variables of the table
%data.Properties.VariableNames = {'???', '????'}
data = rmmissing(data); %get rid of empty rows
sorted = rowfun(@(v) {v}, data, 'GroupingVariables', 1); %group column 2 by column 1
The resulting sorted names are in alphabetical order (col 1) ['A','B','F','R','X'] .
Where I require it to be per the original ['B','F','R','X','A'] .
Accepted Answer
Walter Roberson
on 24 Jul 2019
unique() with 'stable' option. Take the third output and use it as the grouping for splitapply()
8 Comments
Jay
on 24 Jul 2019
Hi Walter,
Can you please elaborate what you mean?
I tried ammending the
sorted = rowfun(@(v) {v}, data, 'GroupingVariables', 1);
to
sorted = rowfun(@(v) {v}, unique(data, 'stable'), 'GroupingVariables', 1);
which does not resolve the problem.
I also require the "csorted" table for the "Group Count" as a requirement that is currently outputted from the "csorted" return (if that makes any difference).
I think I understand what you mean after reading the documentation several times.
Sequence of commands used:
1. Use Unique(data,'stable') ordering the 2nd col by the first (treated as the "key variables")
2. Use the 3rd output (sorted) which puts the key variables in alphabetical order.
3. Use splitapply(func,sorted,data(:,1)) to assign the values in the sorted array to same order to that of column 1 of the data table.
Problem is the unique() function does not allow duplicate values (omits them).
I require duplicates values, is there a command that is similar to unique() but does not remove duplicate values?
"when the command unique (data,'stable') is executed, any variable with the same value is omited in the output."
Nothing is "omitted" from unique's third output. Take a look at this simple example:
>> C = {'B';'A';'C';'A'}
C =
'B'
'A'
'C'
'A'
>> [~,~,X] = unique(C) % sorted
X =
2
1
3
1
>> [~,~,X] = unique(C,'stable')
X =
1
2
3
2
"I require duplicates values so what command will do this in unique's place?"
The third output of unique gives you exactly the indices that you need to group the data in the order that you want (which you can then use as splitapply's input G).
My appologies Walter,
Using:
data = rmmissing(data); %get rid of empty rows
% "rowfun" applies function to table or timetable rows.
%%
% ORIGINAL DON'T TOUCH
%sorted = rowfun(@(v) {v}, data, 'GroupingVariables', 1); %group column 2 by column 1
%
%%
sorted = unique(data,'stable')
sorted2 = rowfun(@(v) {v}, data, 'GroupingVariables', 1); %group column 2 by column 1
"sorted" has a different row value (less [22 x2]) than the "sorted2" (27 rows) value.
When reviewing which rows were being removed (sorted) in the main program, the first key variable was being removed, but all key variables (sorted2), GroupCount (summed) and Var3 have the same number rows as the data table.
[~,~,X] = unique(...)
% ^ The THIRD output.
Now take a look at your code:
sorted = unique(data,'stable')
Do you see the difference? You used the first output, and ignored the third output completely. You also used rowfun rather than splitapply that Walter Roberson recommended (which requires only two lines of code).
PS: Note that you should not sort the entire table, just the table variable (column) that is relevant (you mentioned the first column).
Hi Stephen,
I asked Walter to please elaborate on his instuctions as I did not understand what he was advising.
"unique() with 'stable' option. Take the third output and use it as the grouping for splitapply()" did not make sense to me as to what he was instructing.
Based on nothing more than Walters instructions I had an array created using the unique function, I.E.
sorted = unique(data,'stable')
I had no idea I had to then specify an array the likes of
[~,~,X] = unique(...)
% ^ The THIRD output.
to obtain "index vectors".
I did what I did (using the commands recommended) based on what I thought were the variables needed (after reading the commands documentation and trial and error).
My apologies for the miscommunication.
Walter Roberson
on 24 Jul 2019
Stephen's elaboration of what I wrote is correct. You need the third of the three possible outputs from unique() in order to use as the grouping variable for splitapply()
Jay
on 25 Jul 2019
data = readtable('ML_Q_2_2.xlsx', 'ReadVariableNames', false);
data = rmmissing(data); %get rid of empty rows
[~,~,X] = unique(data,'stable')
output = splitapply(@sum,data, X) % Error returned
Error returned for "output":
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
Also, "X" is returning a vector based on the values in the second column and not the key variables of the first column (data):
I.E.
[1 2 3 4 5 1 6 7 8 9 1 10 11 12 13 1 14 15 16 1 17 18 19 1 20 21 22 ].
Not [1 2 3 4 5].
I resolved this by specifying the first column
[~,~,X] = unique(data(:,1),'stable')
I then used the following code to sum the values of the second column to their correlating key variables.
grouped = splitapply(@sum,data(:,2), X)
Thank you for pointing me in the right direction Walter.
More Answers (0)
Categories
Find more on Logical in Help Center and File Exchange
Products
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)