[lxml-dev] how to get line,col position
Mary Lei
lei at ipac.caltech.edu
Thu May 7 02:27:10 CEST 2009
How can I get dtd.validate to return the
line, column number for the xhtml in error?
Here is my code to validate an xhtml doc
against the dtd using lxml:
# no need to write a temp html file
CoRotHomeFile = open ( 'CoRoTHome.html', 'r' )
contents = CoRotHomeFile.read()
CoRotHomeFile.close()
dtd1 = etree.DTD(file='xhtml1-transitional.dtd') (the ent files are present)
etree.clear_error_log()
root1 = etree.HTML(contents)
try:
rc = dtd1.validate(root1)
except (DTDValidateError,DTDError),e:
print "e ", e
print "dtd errors"
len = len(dtd1.error_log)
error = dtd1.error_log[0]
print "line", (error.line)
print "column", (error.column)
print dtd1.error_log
If I use xmllint, I got column
/home/lei/lxml-2.2/src:.:/home/lei/python-stuff/BeautifulSoup-3.1.0.1
<lxml.etree.DTD object at 0x2ede10>
dtd errors without any position info
line 0
column 0
<string>:0:0:ERROR:VALID:DTD_STANDALONE_WHITE_SPACE: standalone: tr
declared in the external subset contains white spaces nodes
<string>:0:0:ERROR:VALID:DTD_STANDALONE_WHITE_SPACE: standalone: tr
declared in the external subset contains white spaces nodes
<string>:0:0:ERROR:VALID:DTD_STANDALONE_WHITE_SPACE: standalone: tr
declared in the external subset contains white spaces nodes
<string>:0:0:ERROR:VALID:DTD_STANDALONE_WHITE_SPACE: standalone: tr
declared in the external subset contains white spaces nodes
But if I apply xmllint, it gives the same messages but with positional info:
/home/lei/python-stuff/CoRoTHome.html:83: HTML parser error :
htmlParseStartTag: invalid element name
dedicated to asteroseismology of bright stars (typically V<10mag) and
^
/home/lei/python-stuff/CoRoTHome.html:23: element tr: validity error :
standalone: tr declared in the external subset contains white spaces nodes
/home/lei/python-stuff/CoRoTHome.html:93: element tr: validity error :
standalone: tr declared in the external subset contains white spaces nodes
/home/lei/python-stuff/CoRoTHome.html:99: element tr: validity error :
standalone: tr declared in the external subset contains white spaces nodes
...
Document /home/lei/python-stuff/CoRoTHome.html does not validate against
xhtml1-transitional.dtd
Thanks.
--
Mary Lei
Software Testing
IPAC-NExScl
Rm: KS-233
MS: 220-6
Phone: 395-1998
More information about the lxml-dev
mailing list