Difference between readmatrix() and readtable()

Question

Robert Kugler on 24 Apr 2019

3
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/458100-difference-between-readmatrix-and-readtable

Commented: Jim Nicholson on 25 May 2022

Hi community,

After reading that xlsread() is not recommended anymore, I tried to get my excel table read through the readmatrix() function. However, I don't seem to get my excel file read in the correct way. The output that readmatrix gives me is never what what I can read in the excel file. I assume this has something to do with the coding, so I tried to get the output right through the "OutputType" option, but this did not yield satisfactory results either.

Later then, I found the readtable() function, and this seems to give me what I want. However, I still need to understand a bit more why it does so, and that brings me to my basic question:

I read the documentations on both functions and from what I understand the most basic difference between them seems to be that readmatrix() returns an array and readtable() returns a table. But why does this cause me trouble in getting the data from the excel file returned correctly/the way I need it when I use readmatrix()? What is the underlying difference in how Matlab handles the data in this case? And, if readtable() actually turns out to the function that is better suited in this case, for what prupose would readmatrix() be better suited then? Could you give an example?

Note on the data I am intending to import from the excel file: it is a relatively simple database-like structured sheet where each row is an item and each column represents a propoerty of that item (e.g.: item may be a certain material, and the colums can be "price", "weight", etc.). The data contained in the property colums can be of both text or numeric type, or empty.

Thanks in advance for your kind support!

4 Comments
Show 2 older commentsHide 2 older comments

dpb on 24 Apr 2019

Edited: dpb on 24 Apr 2019

Most excellent question, indeed! As remarks in earlier answer indicate, I think it's incomprehensible what TMW's intent is in the longer view and this continual switching horses in midstream is most difficult to keep up with..it's one thing when some new specialized data class shows up in a toolbox and then almost immediately gets picked up in the base product excepting by a different name and syntax (dataset and table in particular come to mind) because at least the one is in the toolbox so may not pollute everybody's code, but when duplicated functionality of this nature and threats of deprecation/potential removal are made in base product functions, that's a whole different kettle of fish.

One thing I'd note in relation to your particular spreadsheet and that you say fields can have variable data or be missing...a weakness in the import object and how readtable works and I presume the new readmatrix is that it uses the data in what if finds to be the first record containing data and not header info to set the type of the variable for the column. Thus, even if a column is actually numeric, if the first record is missing that data, the import data will come in as a cellstr because it couldn't convert that field to a number successfully.

So, even if you use detectimportOptions, you still may need to edit the options object it returns manually to match what you know/want the data types to be.

This is particularly frustrating in a case such as this because two different files may well return different variable types for the same column owing to simply data in the files being different between cases even though both files are perfectly correct.

What it means is that the only real way to use the importoptions object reliably for a class of files is to create one for the file structure and save it for reuse; NOT to scan each file in a sequence of processing files. This adds complexity and another place for code to break or for the corollary options file to get lost and, while helpful in some ways, the introduction of the external object creates additional troubles besides. "There ain't no free lunch!"

Robert Kugler on 6 May 2019

Hi Jeremy, thanks for the additional explanation! This helps a lot

dpb on 6 May 2019

If there are some user tools for defining and then cleaning up import option objects, that's a big step, potentially...(I'm stuck at R2017b for the time being for other reasons so can't go investigate just now).

While it's doable as is, the interface to the object through the supplied functions is very painful to deal with for complex files with large numbers of variables that may have issues such as outlined earlier (been there, done that number of times on the forum). Being able to make such changes interactively to specific variables would be great.

The same caveat as above must still be observed, however, and isn't documented nor discussed at all that I am aware of -- once one does this, one MUST retain that import object for use with all files of the type; rescanning another file may break the association thus so carefully and painfully created.

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 24 Apr 2019

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/458100-difference-between-readmatrix-and-readtable#answer_371942

Edited: dpb on 26 Apr 2019

It appears readmatrix is limited to returning one type of data in the output array as the 'OutputType' named parameter is limited to a scalar string/cell string. Therefore, there's no way to tell it to return the various types of data that may be in different columns in a spreadsheet(*).

readtable on the other hand, can and does support variables of different type by column and so can handle a spreadsheet similar to your description and is certainly the best option for that type of a spreadsheet when you want all the data and not just a subset of a given data type. A table, however, is limited to have the same number of rows in every variable so if the spreadsheet isn't perfectly regular, it will fill in a bunch of missing values down to the furtherest extent in the spreadsheet unless you limit the range. Sometimes this is ok, other times it's a pain.

xlsread is slow but has the facility to return all of what is in the spreadsheet; it does so by returning numeric and text fields separately, however, whereas a table contains them in one data structure retaining their type (with some help, sometimes from an import object with, particularly, dates). If you want the full spreadsheet unadulterated, the last optional raw output array is a cell array of the whole spreadsheet from which one can retrieve any piece of information from any row/col cell regardless of its type or whether it is consistent with the type of those cells around it. Sometimes, this feature is also invaluable. xlsread is the one option that existed before the introduction of any of the more general data classes except for cell arrays which is why it returns separate variables; there was no other option at the time.

The doc implies readmatrix will return a table if one passes the optional opts structure but I see little reason then for it in lieu of readtable which also can use an opts object.

Seems to me TMW is continuing to just add stuff just to be adding it without any clearly defined (to the user base, anyway) direction of where they're headed and why they don't just improve existing functions instead of overloading namespace and bloating code size with multiple essentially duplicated functionalities.

(*) And, of course, for an array other than a cell array, there can only be one data class for the array, anyway. Without the importoptions optional input, readmatrix just punts and returns only what numeric data it can find without offering the option of returning the text data or the combined that does xlsread. This, somehow, seems a step backwards and why not just rewrite xlsread if want to improve it? Instead, for historical reasons, it can never be actually removed and we've just got yet another eventually more or less orphaned function hanging around like textread (which, incidentally, still has some features that make it more useful on occasion than the favored textscan).

2 Comments
Show NoneHide None

Jeremy Hughes on 3 May 2019

I think what you want is READCELL.

Jim Nicholson on 25 May 2022

I find 'readtable' noticeably slow, and I to would ask why a useful function such as 'xlsread' is being retired. It doesn't benefit me, the user.

Sign in to comment.

Difference between readmatrix() and readtable()

4 Comments
Show 2 older commentsHide 2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Difference between readmatrix() and readtable()

4 Comments Show 2 older commentsHide 2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments

2 Comments
Show NoneHide None