Sorting table row variables with number and letters

Hello,
I am trying to sort a table with values which looks something like this as an example below, my actual data has hundreds of rows of data.
And I want to make it look like this:
A = struct2table(data); %Data imported as struct and converted to table (50 x 6)
B = A(2,2:end); %Deletion of extra rows and keeping columns with variable names (1 x 5)
arrayB = table2array(B);
TableB = array2table(arrayB.'); % Transposing so variablenames are in rows (5 x 1)
TableB.Properties.RowNames = B.Properties.VariableNames; %Assigning Variable names to rows
TableC = TableB(:,sort(B.Properties.VariableNames));
After running the code I get the error:
Unrecognized variable name 'abc_0_xyz'.
I'm stuck at this error since the variable is clearly in the table already so something is probably not right.
I'm thinking I need to remove either "_xyz" or "abc_" for the sort to work or use delimiter both of which I don't know how or what function to use in my case. It's my first post so apologies in advance if I missed anything. Any help is greatly appreciated
Thanks,

 Accepted Answer

This option also appears in the OP's 2017b documentation.

9 Comments

Just an update: The number of variables i have can go upto a thousand, I tried the sortrows but it gives an output with row names starting from 0 then goes 1,100,101,102 ... and so on instead on 1,2,3,4. Shoudl've mentioned the large number of variables. Post updated. Thanks
If the numeric part of the row labels abc_###_xyz are the only parts that vary, you could extract the numeric parts using regular expressions, then sort them and use the sorted index similar to Voss' answer. Otherwise, my first thought was also the link @Walter Roberson shared.
rownum = str2double(regexp(A.Properties.RowNames, '\d+', 'once'));
[~,idx] = sort(rownum);
newA = A(idx,:);
rownum = str2double(regexp(A.Properties.RowNames, '\d+', 'once'));
[~,idx] = sort(rownum);
newA = A(idx,:);
With this it created just a newA with properties and nothing else. I changed just "A" to "TableB" and it returned the same.
Also used natsortrows(TableB, [ ] , 'RowNames'); No change in output it still remains the same.
@Adam Danz yes in abc_###_xyz "###" are the only values that change, but in this instance of Voss answer will I have to write all hunderds of rownames and corresponding values or is there a simpler way?
A = table({'abc_0_xyz';'abc_4_xyz';'abc_3_xyz';'abc_1_xyz';'abc_2_xyz'},[10;50;30;90;150])
Thanks,
Note that:
  • your example creates a table with two variables and no row names,
  • your explanation shows a table with one variable and rows names.
So it is unclear what your actual data table is like. Descriptions of data by OPs are always much less accurate that being provided with actual sample data, which you can do by clicking the paperclip button.
"Also used natsortrows(TableB, [ ] , 'RowNames'); No change in output it still remains the same."
Lets check that right now (using a table that matches your description, but not your example):
A = table([10;50;30;90;150], 'RowNames',{'abc_0_xyz';'abc_20_xyz';'abc_10_xyz';'abc_2_xyz';'abc_1_xyz'})
A = 5×1 table
Var1 ____ abc_0_xyz 10 abc_20_xyz 50 abc_10_xyz 30 abc_2_xyz 90 abc_1_xyz 150
B = natsortrows(A,[],'RowNames')
B = 5×1 table
Var1 ____ abc_0_xyz 10 abc_1_xyz 150 abc_2_xyz 90 abc_10_xyz 30 abc_20_xyz 50
So, the function NATSORTROWS() correctly sorted the RowNames alphanumerically, just as it is documented and tested to do, using an example table that matches your description. In contrast you tried some code (which you did not show) using a table (which you did not show) and got a result (which you did not show), so we can only conclude that the problem lies with the user, not with the function. Not showing what you did is the perfect way to make it really difficult for people to help you.
@RD, this is how I imagine your data to look
data.Name = ["A";"B";"C"];
data.abc0xyz = randi(9,3,1);
data.abc10xyz = randi(9,3,1);
data.abc20xyz = randi(9,3,1);
data.abc1xyz = randi(9,3,1);
data.abc2xyz = randi(9,3,1);
data.abc3xyz = randi(9,3,1);
data.abc22xyz = randi(9,3,1);
data.abc12xyz = randi(9,3,1);
disp(data)
Name: [3×1 string] abc0xyz: [3×1 double] abc10xyz: [3×1 double] abc20xyz: [3×1 double] abc1xyz: [3×1 double] abc2xyz: [3×1 double] abc3xyz: [3×1 double] abc22xyz: [3×1 double] abc12xyz: [3×1 double]
A = struct2table(data); %Data imported as struct and converted to table (50 x 6)
B = A(2,2:end); %Deletion of extra rows and keeping columns with variable names (1 x 5)
arrayB = table2array(B);
TableB = array2table(arrayB.'); % Transposing so variablenames are in rows (5 x 1)
TableB.Properties.RowNames = B.Properties.VariableNames;
disp(TableB)
Var1 ____ abc0xyz 4 abc10xyz 2 abc20xyz 9 abc1xyz 2 abc2xyz 2 abc3xyz 1 abc22xyz 9 abc12xyz 1
Extract numeric values and sort
To follow through on the idea of using regular expressions to extract the numeric part of the row name:
% Assuming numeric part is the only section to change in abc###xyz
nstr = regexp(TableB.Properties.RowNames, '\d+', 'match','once');
n = str2double(nstr);
[~, sortIdx] = sort(n);
% Sort table
TableC = TableB(sortIdx,:)
TableC = 8×1 table
Var1 ____ abc0xyz 4 abc1xyz 2 abc2xyz 2 abc3xyz 1 abc10xyz 2 abc12xyz 1 abc20xyz 9 abc22xyz 9
RD
RD on 28 Jun 2022
Edited: RD on 28 Jun 2022
% Assuming numeric part is the only section to change in abc###xyz
nstr = regexp(TableB.Properties.RowNames, '\d+', 'match','once');
n = str2double(nstr);
[~, sortIdx] = sort(n);
% Sort table
TableC = TableB(sortIdx,:)
@Adam Danz Thanks this worked. Read little bit more on regexp to understand how the function works.
Thanks again everyone else! I will mark this as solved.
Simpler:
TableC = natsortrows(TableB,[],'RowNames')

Sign in to comment.

More Answers (2)

Your question and examples are confusing: sometimes you show that you want to sort RowNames, and sometimes that you want to sort by first variable/column. However both of them are very easy to achieve using my FEX submission NATSORTROWS():
Sort by RowNames:
A = table([10;50;30;90;150], 'RowNames',{'abc_0_xyz';'abc_20_xyz';'abc_10_xyz';'abc_2_xyz';'abc_1_xyz'})
A = 5×1 table
Var1 ____ abc_0_xyz 10 abc_20_xyz 50 abc_10_xyz 30 abc_2_xyz 90 abc_1_xyz 150
B = natsortrows(A,[],'RowNames')
B = 5×1 table
Var1 ____ abc_0_xyz 10 abc_1_xyz 150 abc_2_xyz 90 abc_10_xyz 30 abc_20_xyz 50
Sort by any variable:
A = table({'abc_0_xyz';'abc_20_xyz';'abc_10_xyz';'abc_2_xyz';'abc_1_xyz'},[10;50;30;90;150])
A = 5×2 table
Var1 Var2 ______________ ____ {'abc_0_xyz' } 10 {'abc_20_xyz'} 50 {'abc_10_xyz'} 30 {'abc_2_xyz' } 90 {'abc_1_xyz' } 150
B = natsortrows(A,[],'Var1') % using variable name
B = 5×2 table
Var1 Var2 ______________ ____ {'abc_0_xyz' } 10 {'abc_1_xyz' } 150 {'abc_2_xyz' } 90 {'abc_10_xyz'} 30 {'abc_20_xyz'} 50
B = natsortrows(A,[],1) % using index
B = 5×2 table
Var1 Var2 ______________ ____ {'abc_0_xyz' } 10 {'abc_1_xyz' } 150 {'abc_2_xyz' } 90 {'abc_10_xyz'} 30 {'abc_20_xyz'} 50

1 Comment

Thanks Stephen, this was exactly what I was looking for! :)

Sign in to comment.

It seems like you can rearrange the rows by using the second output from sort.
If the abc_0_xyz, etc., are the RowNames of the original table:
A = table([10;50;30;90;150],'RowNames',{'abc_0_xyz';'abc_4_xyz';'abc_3_xyz';'abc_1_xyz';'abc_2_xyz'})
A = 5×1 table
Var1 ____ abc_0_xyz 10 abc_4_xyz 50 abc_3_xyz 30 abc_1_xyz 90 abc_2_xyz 150
[~,idx] = sort(A.Properties.RowNames);
newA = A(idx,:)
newA = 5×1 table
Var1 ____ abc_0_xyz 10 abc_1_xyz 90 abc_2_xyz 150 abc_3_xyz 30 abc_4_xyz 50
Or if the abc_0_xyz, etc., are a variable in the original table:
A = table({'abc_0_xyz';'abc_4_xyz';'abc_3_xyz';'abc_1_xyz';'abc_2_xyz'},[10;50;30;90;150])
A = 5×2 table
Var1 Var2 _____________ ____ {'abc_0_xyz'} 10 {'abc_4_xyz'} 50 {'abc_3_xyz'} 30 {'abc_1_xyz'} 90 {'abc_2_xyz'} 150
[~,idx] = sort(A.Var1);
newA = A(idx,:)
newA = 5×2 table
Var1 Var2 _____________ ____ {'abc_0_xyz'} 10 {'abc_1_xyz'} 90 {'abc_2_xyz'} 150 {'abc_3_xyz'} 30 {'abc_4_xyz'} 50

3 Comments

RD
RD on 25 Jun 2022
Edited: RD on 25 Jun 2022
Post updated. I missed a big detail.. the number of RowNames/Variables are in several hundereds.
A = table({'abc_0_xyz';'abc_4_xyz';'abc_3_xyz';'abc_1_xyz';'abc_2_xyz'},[10;50;30;90;150])
Writing all of them would be time consuming process inside the { 'x'.......'n' }, is there a way to automatically define them with respective [10;50;30......n] values?
Thanks for the help,
My answer works the same whether you have a table with 5 rows or 500 rows.
You do not need to construct the table A like I did, since the premise of the question was that you already have the table. I only constructed A explicitly in my answer to show how it works.
"I missed a big detail.. the number of RowNames/Variables are in several hundereds.... is there a way to automatically define them with respective [10;50;30......n] values?"
In your question you wrote that you created that table from some imported data:
"%Data imported as struct and converted to table (50 x 6)"
What is stopping you from using that? Why do you suddenly want to write everything out by hand?

Sign in to comment.

Products

Release

R2017b

Asked:

RD
on 24 Jun 2022

Commented:

BL
on 13 May 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!