Reading multiple excel files with data store when the columns of the files change position
    2 views (last 30 days)
  
       Show older comments
    
    Juan Carlos Pozuelos Buezo
 on 30 Oct 2019
  
    
    
    
    
    Commented: Juan Carlos Pozuelos Buezo
 on 31 Oct 2019
            Hi. I am trying to read hundreds of excel files with data store, but the files vary the position of the columns. Data Store reads the position of the columns in the first file and assumes that it is the same for the rest. when reviewing the data of the last excel files does not match, because its position was different. how do i solve it?
thanks.
2 Comments
  Guillaume
      
      
 on 31 Oct 2019
				Unfortunately, it's not something that's supported by the SpreadsheetDatastore. In theory, you would have to write your own custom datastore to handle this but it would be a lot of work.
However, have all the files got the exact same columns just in different position? If so, you could handle it with a TransformedDatastore. If you can attach two sample files (actual content can be rubbish as long as they've got the correct format), I can show you how to do that.
Accepted Answer
  Guillaume
      
      
 on 31 Oct 2019
        This works with the very simple test case I built (of two excel files with columns in different order and the 2nd file with extra columns).
First create your datastore as normal, e.g:
ds = datastore('C:\somewhere\*.xlsx');
ds.SelectedVariableNames = {'Something', 'SomethingElse', 'Whatever'};
Then create a TransformedDatastore with that underlying datastore:
tds = transform(ds, @reorderds, 'IncludeInfo', true);
with the following reorderds.m file:
function [tout, infoin] = reorderds(tin, infoin)
    %reorder the variables of a table read by datastore.read so that they match what is actually in the file
    %works with a spreadsheetdatastore only
    currentvars = tin.Properties.VariableNames;  %these names come from the first file in the datastore and may not match the actual names in the file
    opts = detectImportOptions(infoin.Filename, 'Sheet', infoin.SheetNames{1});   %so get the actual names with detectimportoptions
    actualvars = opts.VariableNames;
    [found, where] = ismember(currentvars, actualvars); %find actual location of the variables in the file
    assert(all(found), 'Some expected variables not found in %s', infoin.Filename);
    tout = tin(:, where);  %reorder columns
    tout.Properties.VariableNames = tin.Properties.VariableNames; %and set the name back to what they should be
end
and use tds instead of ds. Any read or readall operation with tds will automatically reorder the columns to their true location.
There will be a performance penalty to this since in effect each file is parsed twice, once by the internal datastore read and another time in reorderds by the detectImportOptions.
More Answers (1)
  Roofus Milton
      
 on 31 Oct 2019
        I would use a cell array populated by readtable then simply reference the colums by name.
0 Comments
See Also
Categories
				Find more on Spreadsheets in Help Center and File Exchange
			
	Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
