[lxml-dev] how to get line,col position

Stefan Behnel stefan_ml at behnel.de
Thu May 7 07:17:23 CEST 2009


Hi,

Mary Lei wrote:
> How can I get dtd.validate to return the
> line, column number for the xhtml in error?

You can't if you use the HTML parser, that's a known bug in libxml2:

http://bugzilla.gnome.org/show_bug.cgi?id=580705

Note that this bug has a patch associated to it, which you can apply to
libxml2 to get what you want.

Otherwise, for parsing XHTML you should use the XML parser anyway, which
will track line numbers correctly.


> But if I apply xmllint, it gives the same messages but with positional info:
> /home/lei/python-stuff/CoRoTHome.html:83: HTML parser error : 
> htmlParseStartTag: invalid element name
> dedicated to asteroseismology of bright stars (typically V<10mag) and
>                                                             ^
> /home/lei/python-stuff/CoRoTHome.html:23: element tr: validity error : 
> standalone: tr declared in the external subset contains white spaces nodes
> ...
> Document /home/lei/python-stuff/CoRoTHome.html does not validate against 
> xhtml1-transitional.dtd

You didn't say if you used the HTML parser or the XML parser in xmllint. In
any case, xmllint does the DTD validation at parse time, where the line
information is still available. It only gets lost when building the tree,
so that running a validator on the tree cannot report line numbers anymore.

lxml.etree does not currently support parse-time validation against a
user-provided DTD (i.e. one that is not referenced by the document itself).
Might be worth a bug report.

Stefan


More information about the lxml-dev mailing list