[lxml-dev] how to get line,col position
Stefan Behnel
stefan_ml at behnel.de
Thu May 7 07:17:23 CEST 2009
Hi,
Mary Lei wrote:
> How can I get dtd.validate to return the
> line, column number for the xhtml in error?
You can't if you use the HTML parser, that's a known bug in libxml2:
http://bugzilla.gnome.org/show_bug.cgi?id=580705
Note that this bug has a patch associated to it, which you can apply to
libxml2 to get what you want.
Otherwise, for parsing XHTML you should use the XML parser anyway, which
will track line numbers correctly.
> But if I apply xmllint, it gives the same messages but with positional info:
> /home/lei/python-stuff/CoRoTHome.html:83: HTML parser error :
> htmlParseStartTag: invalid element name
> dedicated to asteroseismology of bright stars (typically V<10mag) and
> ^
> /home/lei/python-stuff/CoRoTHome.html:23: element tr: validity error :
> standalone: tr declared in the external subset contains white spaces nodes
> ...
> Document /home/lei/python-stuff/CoRoTHome.html does not validate against
> xhtml1-transitional.dtd
You didn't say if you used the HTML parser or the XML parser in xmllint. In
any case, xmllint does the DTD validation at parse time, where the line
information is still available. It only gets lost when building the tree,
so that running a validator on the tree cannot report line numbers anymore.
lxml.etree does not currently support parse-time validation against a
user-provided DTD (i.e. one that is not referenced by the document itself).
Might be worth a bug report.
Stefan
More information about the lxml-dev
mailing list