readtable on xml with single line

47 views (last 30 days)
Arthur
Arthur on 14 Oct 2025 at 12:12
Edited: Kevin Gurney 19 minuten ago
I store a table type into a xml file with writetable.
When reading this file with readtable, it parses the xml into a matlab table only if it is composed of multiple lines.
Otherwise, I have to use readstruct and then struct2table to read a "one line" table. If I use readtable instead, it returns an empty table.
Is there a trick to use readtable anyway when the table is composed of one line ?
Thank you very much,
Arthur

Answers (2)

dpb
dpb on 14 Oct 2025 at 15:17
Edited: dpb on 15 Oct 2025 at 16:06
Using the example to construct a sample table
InsectSpecies = {'Monarch Butterfly';'Seven-spot Ladybird';'Orchid Mantis'; ...
'American Bumblebee';'Blue Dasher Dragonfly'};
InsectOrder = {'Lepidoptera';'Coleoptera';'Mantodea';'Hymenoptera';'Odonata'};
InsectFamily = {'Nymphalidae';'Coccinellidae';'Hymenopodidae'; ...
'Apidae';'Libellulidae'};
PredatoryInsect = logical([0;1;1;0;1]);
T = table(InsectOrder,InsectFamily,PredatoryInsect);
T.Properties.RowNames = InsectSpecies;
T % display table content
T = 5×3 table
InsectOrder InsectFamily PredatoryInsect _______________ _________________ _______________ Monarch Butterfly {'Lepidoptera'} {'Nymphalidae' } false Seven-spot Ladybird {'Coleoptera' } {'Coccinellidae'} true Orchid Mantis {'Mantodea' } {'Hymenopodidae'} true American Bumblebee {'Hymenoptera'} {'Apidae' } false Blue Dasher Dragonfly {'Odonata' } {'Libellulidae' } true
writetable(T,'InsectCollectionAll.xml',"WriteRowNames",true,"RowNodeName","Insect")
type InsectCollectionAll.xml % show what was written
<?xml version="1.0" encoding="UTF-8"?> <table> <Insect Row="Monarch Butterfly"> <InsectOrder>Lepidoptera</InsectOrder> <InsectFamily>Nymphalidae</InsectFamily> <PredatoryInsect>false</PredatoryInsect> </Insect> <Insect Row="Seven-spot Ladybird"> <InsectOrder>Coleoptera</InsectOrder> <InsectFamily>Coccinellidae</InsectFamily> <PredatoryInsect>true</PredatoryInsect> </Insect> <Insect Row="Orchid Mantis"> <InsectOrder>Mantodea</InsectOrder> <InsectFamily>Hymenopodidae</InsectFamily> <PredatoryInsect>true</PredatoryInsect> </Insect> <Insect Row="American Bumblebee"> <InsectOrder>Hymenoptera</InsectOrder> <InsectFamily>Apidae</InsectFamily> <PredatoryInsect>false</PredatoryInsect> </Insect> <Insect Row="Blue Dasher Dragonfly"> <InsectOrder>Odonata</InsectOrder> <InsectFamily>Libellulidae</InsectFamily> <PredatoryInsect>true</PredatoryInsect> </Insect> </table>
T=T(1,:); % keep only one row
writetable(T,'InsectCollectionOne.xml',"WriteRowNames",true,"RowNodeName","Insect")
type InsectCollectionOne.xml % show what was written
<?xml version="1.0" encoding="UTF-8"?> <table> <Insect Row="Monarch Butterfly"> <InsectOrder>Lepidoptera</InsectOrder> <InsectFamily>Nymphalidae</InsectFamily> <PredatoryInsect>false</PredatoryInsect> </Insect> </table>
clearvars % clean workspace
T=readtable('InsectCollectionAll.xml') % try read table back
T = 5×4 table
RowAttribute InsectOrder InsectFamily PredatoryInsect _______________________ _____________ _______________ _______________ "Monarch Butterfly" "Lepidoptera" "Nymphalidae" "false" "Seven-spot Ladybird" "Coleoptera" "Coccinellidae" "true" "Orchid Mantis" "Mantodea" "Hymenopodidae" "true" "American Bumblebee" "Hymenoptera" "Apidae" "false" "Blue Dasher Dragonfly" "Odonata" "Libellulidae" "true"
clearvars
T=readtable('InsectCollectionOne.xml') % try one row read table back
Warning: Unable to automatically detect a table in the file: 'InsectCollectionOne.xml'.

To specify the location of a table in the XML file, use a name-value pair, such as 'TableNodeName' or 'RowSelector'.

Consider using READSTRUCT to import heterogeneous structured data.
T = 0×0 empty table
T=readtable('InsectCollectionOne.xml','TableNodeName','Insect') % try one row read table back
T = 0×0 empty table
removes the warning, but it still didn't find a table.
There isn't anything different between the two xml files' content other than having only one record in one and more than one in the other; whatever code logic figures out the one should be able to figure out the other.
It appears to me to be a real bug, agreed.
Submit this to Mathworks as an official support request/bug at <Product Support Page>
ADDENDUM
I don't much use xml so not particularly familiar with expectations, but the parsing also doesn't preserve the type of the logical variable; I don't know whether this should be considered a bug or not(?).
  1 Comment
Kevin Gurney
Kevin Gurney ongeveer 14 uur ago
@dpb - regarding your addendum, readtable does not detect logical variables by default, so this is expected (although, this is something we have considered enhancing in the past). To import logical variables, you can use setvartype on the associated XMLImportOptions object to force import of a particular variable as a logical array.

Sign in to comment.


Kevin Gurney
Kevin Gurney ongeveer 19 uur ago
We are sorry to hear that you ran into this limitation @Arthur.
It looks like you are encountering a case where the detection algorithm used under the hood by readtable by default isn't able to reliably detect a table.
The detection algorithm looks for "uniform repeating structure" in a file. When the frequency and uniformity of this structure is high, the algorithm is generally more reliable at detecting "tabular" structure. This is why readtable works when there are multiple rows in the file, but does not when there is only one.
The fact that an empty table is returned when you specify "TableNodeName" explicitly is definitely unintuitive. This is something we will look into.
As a workaround you can specify "RowNowName" as "Insect" instead of "TableNodeName" and this should reliably detect the table regardless of the number of rows.
You can also specify full "XPath Selectors" to each tabular "component" in the file if you want to be even more precise when importing XML data as a table.
Please refer to the xmlImportOptions documentation for examples of how to specify XPath Selectors:
I hope that helps.
  2 Comments
dpb
dpb ongeveer 11 uur ago
@Kevin Gurney - thanks for the background. As noted, I know virtually nothing about the internals of xml files, but naively I'd have thought that the section was labelled
<table>
....
</table>
would have been pertinent?
Kevin Gurney
Kevin Gurney 22 minuten ago
Edited: Kevin Gurney 19 minuten ago
@dpb - that's a very reasonable assumption to make! However, the XML detection algorithm actually doesn't do any kind of "semantic analysis". Rather its doing a form of "frequency analysis" where it is essentially just looking for groups of consistent repeating structure in the file. So, the fact that the element is named "table" doesn't help the algorithm identify where the table is. While we could consider leveraging the names of nodes as another heuristic in the algorithm, it starts to become unclear where the algorithm draw a line (e.g. should "MyTable" also count? What about "tab"? or "data"?). What we have seen in practice is that tables embedded in an XML file often have names that are "descriptive" like "Animals" or "Cars". Sometimes they happen to contain the name "table", but it is hard to make a broad generalization about how a "table node" is typically named.

Sign in to comment.

Products


Release

R2025b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!