readtable on xml with single line

InsectSpecies = {'Monarch Butterfly';'Seven-spot Ladybird';'Orchid Mantis'; ...
    'American Bumblebee';'Blue Dasher Dragonfly'};
InsectOrder = {'Lepidoptera';'Coleoptera';'Mantodea';'Hymenoptera';'Odonata'};
InsectFamily = {'Nymphalidae';'Coccinellidae';'Hymenopodidae'; ...
    'Apidae';'Libellulidae'};
PredatoryInsect = logical([0;1;1;0;1]); 
T = table(InsectOrder,InsectFamily,PredatoryInsect);
T.Properties.RowNames = InsectSpecies;
T                                       % display table content
T = 5×3 table
                               InsectOrder        InsectFamily       PredatoryInsect
                             _______________    _________________    _______________

    Monarch Butterfly        {'Lepidoptera'}    {'Nymphalidae'  }         false     
    Seven-spot Ladybird      {'Coleoptera' }    {'Coccinellidae'}         true      
    Orchid Mantis            {'Mantodea'   }    {'Hymenopodidae'}         true      
    American Bumblebee       {'Hymenoptera'}    {'Apidae'       }         false     
    Blue Dasher Dragonfly    {'Odonata'    }    {'Libellulidae' }         true      
writetable(T,'InsectCollectionAll.xml',"WriteRowNames",true,"RowNodeName","Insect")
type InsectCollectionAll.xml            % show what was written
<?xml version="1.0" encoding="UTF-8"?>
<table>
    <Insect Row="Monarch Butterfly">
        <InsectOrder>Lepidoptera</InsectOrder>
        <InsectFamily>Nymphalidae</InsectFamily>
        <PredatoryInsect>false</PredatoryInsect>
    </Insect>
    <Insect Row="Seven-spot Ladybird">
        <InsectOrder>Coleoptera</InsectOrder>
        <InsectFamily>Coccinellidae</InsectFamily>
        <PredatoryInsect>true</PredatoryInsect>
    </Insect>
    <Insect Row="Orchid Mantis">
        <InsectOrder>Mantodea</InsectOrder>
        <InsectFamily>Hymenopodidae</InsectFamily>
        <PredatoryInsect>true</PredatoryInsect>
    </Insect>
    <Insect Row="American Bumblebee">
        <InsectOrder>Hymenoptera</InsectOrder>
        <InsectFamily>Apidae</InsectFamily>
        <PredatoryInsect>false</PredatoryInsect>
    </Insect>
    <Insect Row="Blue Dasher Dragonfly">
        <InsectOrder>Odonata</InsectOrder>
        <InsectFamily>Libellulidae</InsectFamily>
        <PredatoryInsect>true</PredatoryInsect>
    </Insect>
</table>
T=T(1,:);                               % keep only one row
writetable(T,'InsectCollectionOne.xml',"WriteRowNames",true,"RowNodeName","Insect")
type InsectCollectionOne.xml            % show what was written
<?xml version="1.0" encoding="UTF-8"?>
<table>
    <Insect Row="Monarch Butterfly">
        <InsectOrder>Lepidoptera</InsectOrder>
        <InsectFamily>Nymphalidae</InsectFamily>
        <PredatoryInsect>false</PredatoryInsect>
    </Insect>
</table>
clearvars                               % clean workspace
T=readtable('InsectCollectionAll.xml')  % try read table back
T = 5×4 table
         RowAttribute           InsectOrder      InsectFamily      PredatoryInsect
    _______________________    _____________    _______________    _______________

    "Monarch Butterfly"        "Lepidoptera"    "Nymphalidae"          "false"    
    "Seven-spot Ladybird"      "Coleoptera"     "Coccinellidae"        "true"     
    "Orchid Mantis"            "Mantodea"       "Hymenopodidae"        "true"     
    "American Bumblebee"       "Hymenoptera"    "Apidae"               "false"    
    "Blue Dasher Dragonfly"    "Odonata"        "Libellulidae"         "true"     

clearvars

T=readtable('InsectCollectionOne.xml') % try one row read table back

Warning: Unable to automatically detect a table in the file: 'InsectCollectionOne.xml'.

To specify the location of a table in the XML file, use a name-value pair, such as 'TableNodeName' or 'RowSelector'.

Consider using READSTRUCT to import heterogeneous structured data.

T = 0×0 empty table

T=readtable('InsectCollectionOne.xml','TableNodeName','Insect') % try one row read table back

T = 0×0 empty table

removes the warning, but it still didn't find a table.

There isn't anything different between the two xml files' content other than having only one record in one and more than one in the other; whatever code logic figures out the one should be able to figure out the other.

It appears to me to be a real bug, agreed.

Submit this to Mathworks as an official support request/bug at <Product Support Page>

ADDENDUM

I don't much use xml so not particularly familiar with expectations, but the parsing also doesn't preserve the type of the logical variable; I don't know whether this should be considered a bug or not(?).

1 Comment
Show -1 older commentsHide -1 older comments

Kevin Gurney ongeveer 14 uur ago

@dpb - regarding your addendum, readtable does not detect logical variables by default, so this is expected (although, this is something we have considered enhancing in the past). To import logical variables, you can use setvartype on the associated XMLImportOptions object to force import of a particular variable as a logical array.

Sign in to comment.

Answer 2

Kevin Gurney ongeveer 19 uur ago

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2180584-readtable-on-xml-with-single-line#answer_1571196

We are sorry to hear that you ran into this limitation @Arthur.

It looks like you are encountering a case where the detection algorithm used under the hood by readtable by default isn't able to reliably detect a table.

The detection algorithm looks for "uniform repeating structure" in a file. When the frequency and uniformity of this structure is high, the algorithm is generally more reliable at detecting "tabular" structure. This is why readtable works when there are multiple rows in the file, but does not when there is only one.

The fact that an empty table is returned when you specify "TableNodeName" explicitly is definitely unintuitive. This is something we will look into.

As a workaround you can specify "RowNowName" as "Insect" instead of "TableNodeName" and this should reliably detect the table regardless of the number of rows.

You can also specify full "XPath Selectors" to each tabular "component" in the file if you want to be even more precise when importing XML data as a table.

Please refer to the xmlImportOptions documentation for examples of how to specify XPath Selectors:

https://www.mathworks.com/help/matlab/ref/matlab.io.xml.xmlimportoptions.html#mw_f759a73a-720f-4239-9d92-a11187fea1b5

I hope that helps.

2 Comments
Show NoneHide None

dpb ongeveer 11 uur ago

Open in MATLAB Online

@Kevin Gurney - thanks for the background. As noted, I know virtually nothing about the internals of xml files, but naively I'd have thought that the section was labelled

<table>
....
</table>

would have been pertinent?

Kevin Gurney 22 minuten ago

Edited: Kevin Gurney 19 minuten ago

@dpb - that's a very reasonable assumption to make! However, the XML detection algorithm actually doesn't do any kind of "semantic analysis". Rather its doing a form of "frequency analysis" where it is essentially just looking for groups of consistent repeating structure in the file. So, the fact that the element is named "table" doesn't help the algorithm identify where the table is. While we could consider leveraging the names of nodes as another heuristic in the algorithm, it starts to become unclear where the algorithm draw a line (e.g. should "MyTable" also count? What about "tab"? or "data"?). What we have seen in practice is that tables embedded in an XML file often have names that are "descriptive" like "Animals" or "Cars". Sometimes they happen to contain the name "table", but it is hard to make a broad generalization about how a "table node" is typically named.

Sign in to comment.

readtable on xml with single line

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

readtable on xml with single line

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None