[lxml-dev] finding the line number of a parsed element

Stefan Behnel stefan_ml at behnel.de
Fri Mar 16 20:29:13 CET 2007


Hi,

Stephan Richter wrote:
>>> Ideally I would have the filename, start row and start column of each
>>> element available as part of the etree Element. I have tried to find this
>>> information or hooks for it.unsuccessfully.
>> There is no API for it, but internally, we have this information for parsed
>> trees, at least the line number - note that exceptions contain the line
>> number already. So we could easily add a property "_line" to elements that
>> returns the line number at which the element was parsed (*if* it was
>> parsed). I don't like the fact so much that libxml2 puts a zero there if
>> the node was created by hand, but I assume that is not too much of a
>> problem either.
> 
> I think a zero is no problem. None would be better. :-)

Problem is: how would you distinguish 'parsed in line 0' from 'not parsed at
all' in this case?


>> Additionally, any additional attribute there goes off the list of children
>> accessible in objectify.
> 
> I don't understand this sentence. :-)

I was talking about lxml.objectify that uses Python object attributes to
access XML element children (sort of like data binding to an object tree).
Every name that is used as a Python attribute of the _Element class shadows
XML children that would otherwise be accessible under that name. Check out the
objectify docs to see what I mean.


>> We could also consider adding an external utility module to provide helpers
>> like this that are not really worth poluting the API. Something like
>>
>> lxml.tools.lineof(element)
> 
> That would be icing on the cake; either way is fine, If you consider such a 
> tool, I would probably call it "parseInfo" or so, where maybe the filename, 
> endline, and column info is available too.

The filename would be available from documents, I don't know what you mean
with "endline" (the last line number?) and the parser column is not available
from libxml2 (at least not once the parser has passed the element...)

So, what about an 'lxml.docinfo' module then that provides this kind of info
helper functions? I was never really happy with the DocInfo class, so it might
be a good idea to just move this kind of information to a separate module that
people can use if they need it.

I'm pretty confident that there is even more that we could provide at that
level. And it would help us in keeping the already bigger-than-big-enough API
of lxml at least a little smaller.

Stefan


More information about the lxml-dev mailing list