Why does 'readtable' generate NaN values for text data imported from a file in R2020a or later?

275 views (last 30 days)
I'm reading in an Excel spreadsheet file that has columns containing numerical and text data.
Depending on the order of the values, 'readtable' appears to choose different data types.
The data in the spreadsheet is:
COL_1 | COL_2
TEXT_1 | 2
TEXT_2 | TEXT
-1 | 2
'readtable' returns the following data:
>> readtable('MixedInputs.xlsx')
ans =
  3×2 table
     COL_1      COL_2
    ________    _____
    {'TEXT_1'}       2 
    {'TEXT_2'}     NaN 
    {'-1'  }       2 
COL_1 imports all data as character vectors.
COL_2 has broken data 'NaN' where "TEXT" should be read.
Is this expected behavior? I was expecting 'readtable' to revert all data to character vectors or strings, for any column containing mixed data types.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 23 Nov 2021
Edited: MathWorks Support Team on 23 Nov 2021
This is the expected behavior in R2020a or later. The 'readtable' function determined that the data type appropriate for the values detected in the second column is a numeric array. However "TEXT" is not numeric. Then, 'readtable' fills in the gaps with appropriate missing values — a 'NaN' value for numeric variables, and an empty character vector for text.
Option 1:
To get a table with character vector columns or string columns, use the 'setvartype' file import option:
filename = 'MixedInputs.xlsx';
opts = detectImportOptions(filename);
opts = setvartype(opts,'char'); % or 'string'
T = readtable(filename,opts)
T =
  3×2 table
     COL_1       COL_2  
    ________    ________
    {'TEXT_1'}    {'2'   }
    {'TEXT_2'}    {'TEXT'}
    {'-1'  }     {'2'   }
Refer to the following documentation page for more information regarding 'setvartype':
Option 2:
If your table data has text headers, as is the case here, you can restore the default behavior from previous releases by specifying the ('Format','auto') name-value pair argument. 'readtable' then reads the headers as data, and as a result, it converts all the table variables to text:
>> T = readtable('MixedInputs.xlsx','Format','auto')
T =
3×2 table
COL_1 COL_2
________ ________
{'TEXT_1'} {'2' }
{'TEXT_2'} {'TEXT'}
{'-1' } {'2' }
Refer to the following documentation page for more information regarding name-value pair arguments for text files:
Option 3:
If there are strings and numbers mixed in a single column, then 'readtable' may not be the right function. 'readcell' might be better to represent the data as mixed cell array:
>> C = readcell('MixedInputs.xlsx')
C =
4×2 cell array
{'COL_1'} {'COL_2'}
{'TEXT_1' } {[ 2]}
{'TEXT_2' } {'TEXT' }
{[ -1]} {[ 2]}
Refer to the following documentation page for more information about 'readcell':

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!