[lxml-dev] finding the line number of a parsed element
Stephan Richter
srichter at cosmos.phy.tufts.edu
Fri Mar 16 20:08:41 CET 2007
On Friday 16 March 2007 11:49, Stefan Behnel wrote:
> Hi,
>
> Stephan Richter wrote:
> > I have recently reimplemented RML (Reportlab's XML format to generate
> > PDFs) using lxml. All works well.
>
> Interesting. Any chance you could provide a link?
Sure: http://svn.zope.org/z3c.rml/trunk/src/z3c/rml/
> > Now, I would like to give my users some more information when an error
> > occurs. For a pure XML parsing error, everything is fine (though I found
> > the failure points hard to interpret at times). But what if the XML
> > parses correctly, but while working with the element tree an error
> > occurs? In this case I would like to tell the user not only the error
> > message, but also the line/column and filename of point of failure.
>
> This sounds a lot like a problem you could try to solve with validation.
No, I cannot, since some stuff cannot be decided until I do Python calls. For
example, I look up colors by names, but this is not a static list.
> > Ideally I would have the filename, start row and start column of each
> > element available as part of the etree Element. I have tried to find this
> > information or hooks for it.unsuccessfully.
>
> There is no API for it, but internally, we have this information for parsed
> trees, at least the line number - note that exceptions contain the line
> number already. So we could easily add a property "_line" to elements that
> returns the line number at which the element was parsed (*if* it was
> parsed). I don't like the fact so much that libxml2 puts a zero there if
> the node was created by hand, but I assume that is not too much of a
> problem either.
I think a zero is no problem. None would be better. :-)
> I personally prefer "_line" over "line", as this only applies to parsed
> elements, not all of them, so this is more of a half-working API.
That would be perfect.
> Additionally, any additional attribute there goes off the list of children
> accessible in objectify.
I don't understand this sentence. :-)
> We could also consider adding an external utility module to provide helpers
> like this that are not really worth poluting the API. Something like
>
> lxml.tools.lineof(element)
That would be icing on the cake; either way is fine, If you consider such a
tool, I would probably call it "parseInfo" or so, where maybe the filename,
endline, and column info is available too.
> Any comments?
How fast can you do this? :-)
Regards,
Stephan
--
Stephan Richter
CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student)
Web2k - Web Software Design, Development and Training
More information about the lxml-dev
mailing list