DOCTYPE error from xmlread

2 views (last 30 days)
KAE
KAE on 14 Nov 2019
Edited: KAE on 18 Nov 2019
I have been given an XML file and am trying to read it with xmlread. If I call it either of the following ways,
DOMNode = xmlread(fileXml, 'AllowDoctype', true);
DOMNode = xmlread(fileXml, 'AllowDoctype', false);
xmlread crashes at the line indicated below,
try
parseResult = p.parse(fileObj);
catch ME
% If trying to parse an XML document containing a DOCYTYPE declaration
% with 'AllowDoctype' set to false, then throw an appropriate error
% message.
if isa(ME, 'matlab.exception.JavaException') && ...
contains(char(ME.ExceptionObject.getLocalizedMessage), ...
'http://apache.org/xml/features/disallow-doctype-decl')
error(message('MATLAB:xmlread:DoctypeDisabled', filename));
end
rethrow(ME); % crashes here
end

Accepted Answer

KAE
KAE on 14 Nov 2019
Edited: KAE on 18 Nov 2019
It turns out this was not an xmlread issue, and has nothing to do with AllowDoc, but instead is due to a problem with the XML file. Here is info in case it helps someone.
The XML file contains international place names, so there are non-English characters which appear as question marks and seem to mess up adjacent field closings. For example it crashes on this line
<field name="geocity">Matar?/field>
for Mataró (in Spain) but not if it is manually edited to
<field name="geocity">Mataro</field>
Incidentally a good way to find problem lines in an XML file is to open it in a web browser, which will tell you which line it couldn't read (if you have a long XML file scroll to the top once it's opened in the browser to see the error message).
I will mention that the first line of the XML file does not specify the encoding, which I believe can cause problems with non-English characters, but I was never able to find an encoding choice that eliminated the errors,
<?xml version="1.0"?>

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!