# How to sort strings by lenght?

31 views (last 30 days)
rfgdrg on 31 Oct 2014
Edited: per isakson on 11 Nov 2017
Hi, I want to do this:
'bertil'
'cesar '
'berit '];
>> Asort = charsort(A)
Asort =
cesar
berit
bertil
I use strtrim to remove the blanks but i have no idea how i could sort mi list by lenght. Any sugestions?

Farook Sadarudeen on 10 Nov 2017
Edited: per isakson on 10 Nov 2017
If your list is as below:
You can sort it based on the length as below:
[~,stringLength] = sort(cellfun(@length,A),'descend');
OutputListSorted = A(stringLength);
Jan on 10 Nov 2017
"stringLength" is a confusing name for a sorting index.

Geoff Hayes on 31 Oct 2014
Since you asked for suggestions, here is an almost solution. Rather than using a matrix of strings, convert this to a cell array, as
Then in your function, charsort, remove (as you already said) the whitespace (blanks). So you have a cell array of strings, each of which you want to find the length of. You can use cellfun to apply a function to each element of your cell array. In this case, we want to apply length as
stringLengthsOfA = cellfun(@(x)length(x),A)';
which returns a column vector (I added the apostrophe so that the result of cellfun is a column (this assumes that A is a row cell array)) like
stringLengthsOfA =
4
6
5
3
5
We have the appropriate length of each string which we can now sort. But as soon as we sort this vector, we lose the original order. So just prepend a column that has the indices of each of these lengths (so the first element 4 is for row 1, the second element 6 is for row 2, etc.) as
1 4
2 6
3 5
4 3
5 5
(The above is easy to do.) Now consider using sortrows which will allow you to specify which column to sort on and preserve the relationship between elements of each row. See what happens when you sort (the above matrix) on the second column. You will get the lengths (second column) sorted in ascending order with their corresponding indices (first column). You can then use the first column to grab the (now) sorted data from A.
Try putting the above together and see what happens!

per isakson on 10 Nov 2017
Edited: per isakson on 10 Nov 2017
Without using a cell array
'bertil'
'cesar '
'berit '];
>> len = arrayfun( @(jj) length(strtrim(A(jj,:))), [1:size(A,1)] );
>> [ ~, ix ] = sort( len );
>> A(ix,:)
ans =
cesar
berit
bertil
I guess, this is faster and uses less memory than solutions based on a cell array.

Stephen23 on 10 Nov 2017
Edited: Stephen23 on 10 Nov 2017
Simpler:
>> [~,idx] = sortrows(A~=32);
>> A(idx,:)
ans =
cesar
berit
bertil

Jan on 10 Nov 2017
Edited: Jan on 10 Nov 2017
'bertil'
'cesar '
'berit '];
[~, Index] = sort(cellfun('length', cellstr(A)));
Cs = C(Index);
Similar answers have been given already, but it is worth to mention, that cellfun('length') is faster than cellfun(@length) or cellfun(@(x)length(x)).
This implies, that trailing spaces do not belong to the strings. Note that CHAR matrices are a really bad method to store multiple strings. Using a cell string or a modern STRING object is much better.
##### 3 CommentsShow 1 older commentHide 1 older comment
Jan on 11 Nov 2017
@per: Char matrices have the disadvantage, that the padding with spaces implies, that trailing spaces are not significant. This is a magic number problem (see https://www.mathworks.com/matlabcentral/answers/83075-magic-strings-and-numbers-in-matlab), and here even the non-exotic space character got a special job. Using char(0) would have been a smarter choice.
This problem is severe. Implementing general algorithms like sorting or string matching will always be impeded by the requirement to treat trailing spaces separately.
So you can compare the speed for sorting and the memory consumption:
CStr = sprintfc('%6d', 0:1e6-1);
CStr = CStr(randperm(1e6));
S = string(CStr);
CharMat = char(CStr);
tic; CStrS = sort(CStr); toc
tic; SS = sort(S); toc
tic; CharMatS = sortrows(CharMat); toc
R2016b/Win64:
Elapsed time is 1.261178 seconds.
Elapsed time is 1.605999 seconds.
Elapsed time is 1.202420 seconds.
Memory consumption:
whos CStr S CharMat
Name Size Bytes Class
CStr 1x1000000 124000000 cell
CharMat 1000000x6 12000000 char
S 1x1000000 62000072 string
But what happens, if you want to sort the strings '1', '1 ', '1 ' (none, one and two trailing spaces)? As soon as you try this with a CHAR matrix, you need much more complicated methods.
Imagine the same situation for storing numbers: Would you ever store a list of vectors with different lengths using a matrix padded with the value 32? Any other value could collide with the input data also, even Inf or NaN. Any algorithm based on such a data representation would suffer from this design. Even if the actual processing is very fast, it takes time to check the user inputs or results of computations for conflicts with the padding with magic numbers.
I use CHAR matrices for a tabular output to the command window. For all other applications and algorithms, I cannot exclude reliably, if a trailing space is significant, and therefore I consider this data representation as flawed.
Where do you see benefits for "search, match, select, ..." for char matrices? Are these benefits worth to restrict the usage of trailing spaces?
per isakson on 11 Nov 2017
Edited: per isakson on 11 Nov 2017
• Magic strings and numbers in Matlab is a good read.
• Yes, the "padding with spaces" requires precaution.
• column/row-major also requires precaution. To take advantage of CHAR in the work with zillions of names the characters of single names must be stored in consecutive order in memory.
• ... but then there is no function, sortcolumns
• btw: I finally concatenated my zillions of short strings into a long string with char(31) as separator. Now, I keep my fingers crossed.