Sorting excel data (numbers and strings) and outputting sorted data to another excel file.

37 views (last 30 days)
I have a rather large excel data file that I would like to sort. My goal is to sort the data in three different columns. Two columns are based on keywords within large sentences and one column is just numbers. There are 10 different keywords and 20 different numbers I need to sort with. I would then like the entire row of each row that meets the requirements of keywords and numbers to be placed in a separate excel file.
I am very new to MATLAB and have been struggling with this for some time now. Any help provided would be greatly appreciated.
Kind regards,
  14 Comments

Sign in to comment.

Accepted Answer

Cedric Wannaz
Cedric Wannaz on 13 Sep 2017
Edited: Cedric Wannaz on 27 Sep 2017
EDIT 09/27 @ 22:57 UTC
I updated your file and attached it to my answer (Data_original.xlsx). I picked on number (093321, defined as a string to keep the trailing zero) and one keyword (F25B1-00) from the content. I copied the cell that you updated to all colored rows, and I altered either the number or the keyword in rows highlighted in dark brown (which leaves rows in yellow untouched).
The following code looks for rows that have either at least one number or at least one keyword, and exports them to an output Excel document. It is just one small application for starting the discussion. See if you understand. For this, display arrays numStrPos, keywordPos, hasNumStr, hasKeyword, and at vector hasOneOfEach, and see if you understand their sizes, content, type, the logical operations &, etc.
%- Read input file.
[~,~,rawData] = xlsread( 'Data_original.xlsx' ) ;
nRows = size( rawData, 1 ) ;
% - Define arrays of numbers (defined as strings) and keywords to match.
numStr = {'093821', '093321'} ;
keywords = {'F25B1-00', 'A25D1-00', 'F25B1-99'} ;
% - Prealloc cell arrays for storing the output of find in column.
nNumStr = numel( numStr ) ;
numStrPos = cell( nRows, nNumStr ) ;
nKeyword = numel( keywords ) ;
keywordPos = cell( nRows, nKeyword ) ;
% - Iterate through elements of numStr and find in all data rows, and store.
% in column.
for nsId = 1 : nNumStr
numStrPos(:,nsId) = strfind( rawData(:,21), numStr{nsId} ).' ;
end
% - Iterate through elements of keywords and find in all data rows.
for kId = 1 : nKeyword
keywordPos(:,kId) = strfind( rawData(:,21), keywords{kId} ).' ;
end
% - Convert to arrays of logicals flagging "found at least one", which
% translates into "check not empty".
hasNumStr = ~cellfun( @isempty, numStrPos ) ;
hasKeyword = ~cellfun( @isempty, keywordPos ) ;
% - Create a column vector of logicals that flag rows that have e.g. at
% least one number AND one keyword.
hasOneOfEach = any( hasNumStr, 2 ) & any( hasKeyword, 2 ) ;
% - Use this vector to pick relevant rows of rawData for export.
xlswrite( 'Data_selection.xlsx', rawData(hasOneOfEach,:) ) ;
===[ FORMER ANSWER, READ IT FOR TRAINING ] =============================
Here is an example if you cannot attach a slice of data.
Say you load some Excel file content
[~,~,raw] = xlsread( 'MyData.xlsx', 'RelevantSheetName' ) ;
where the 3rd output of XLSREAD is raw data (contains both string and numeric data), and you get the following cell array:
raw =
4×3 cell array
'A,C' 'X' [20]
'B,C,E,F' 'X,Y,Z' [35]
'F,A' 'X,Z' [10]
'C,B,A' 'Y' [30]
where the letters represent distinct keywords. Now you want to extract rows that have at least keyword A in column 1, at least keyword X in column 2, and then sort them by number in column 3.
Proceed by little steps. First, are we able to spot rows that have A in column 1? There is string comparison/finding and/or pattern matching. If we google a little, we find MATLAB functions STRCMP, STRCMPI, STRFIND, REGEXP, REGEXPI, etc. For strict comparison, we would have to split using ', ' as a separator and probably iterate through rows for doing that .. maybe STRFIND is easier to use. Can we use it on a single cell content?
>> strfind( raw{1,1}, 'A' )
ans =
1
>> strfind( raw{1,1}, 'C' )
ans =
3
>> strfind( raw{1,1}, 'F' )
ans =
[]
apparently it outputs the position of the match and an empty array otherwise. Can STRFIND operate on a cell array (so we don't have to iterate)? Let's test passing the whole first column of raw:
>> strfind( raw(:,1), 'A' )
ans =
4×1 cell array
[1]
[]
[3]
[5]
apparently we can. Now it would be great if we could use this output to index relevant rows: rows for which there was a match. There are several way to index arrays, e.g. by position/index or using vectors of logicals (true/false) that flag relevant positions. Are we able to generate one by hand?
>> lIndex = [true, false, true, true] ; % or lIndex = logical( [1, 0, 1, 1] ) ;
>> class( lIndex )
ans =
logical
Are we able to use it to extract relevant rows of raw, keeping all columns?
>> raw(lIndex,:)
ans =
3×3 cell array
'A,C' 'X' [20]
'F,A' 'X,Z' [10]
'C,B,A' 'Y' [30]
So now are we able to build such a vector of logicals using the output of STRFIND? First, are we able to test if a single element is empty?
>> matchCol1 = strfind( raw(:,1), 'A' )
matchCol1 =
4×1 cell array
[1]
[]
[3]
[5]
>> isempty( matchCol1{1} )
ans =
logical
0
>> isempty( matchCol1{2} )
ans =
logical
1
Notice that the output of ISEMPTY is a logical. So are we able to apply ISEMPTY to all elements of |matchCol1? Googling would lead us to CELLFUN:
>> lIndexCol1 = ~cellfun( @isempty, matchCol1 )
lIndexCol1 =
4×1 logical array
1
0
1
1
where the ~ is the logical NOT (as the output of CELLFUN is [0;1;0;0]). Putting everything together, we can write:
>> lIndexCol1 = ~cellfun( @isempty, strfind( raw(:,1), 'A' ))
lIndexCol1 =
4×1 logical array
1
0
1
1
>> lIndexCol2 = ~cellfun( @isempty, strfind( raw(:,2), 'X' ))
lIndexCol2 =
4×1 logical array
1
1
1
0
Now we want rows that satisfy conditions on both columns:
>> lIndex = lIndexCol1 & lIndexCol2
lIndex =
4×1 logical array
1
0
1
0
that we can use to extract relevant rows:
>> selected = raw(lIndex,:)
selected =
2×3 cell array
'A,C' 'X' [20]
'F,A' 'X,Z' [10]
Then we need to sort based on column 3. Again, there are several options. Looking at SORT we see that it has a second argument that we could use to re-order selected based on the sorting order of selected(:,3), but SORTROWS may do that in one shot:
>> sorted = sortrows( selected, 3 )
sorted =
2×3 cell array
'F,A' 'X,Z' [10]
'A,C' 'X' [20]
which can be exported to an Excel file using XLSWRITE.
Chance is that the solution to your problem will involve a little bit of all these concepts.
  8 Comments
connor  mcgarry
connor mcgarry on 4 Apr 2018
Hey Cedric, may I include your name in a paper I am writing? I would like to cite you and give credit where it is due.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!