[lxml-dev] finding the line number of a parsed element

Martijn Faassen faassen at startifact.com
Sat Mar 17 11:24:29 CET 2007


Stefan Behnel wrote:
[snip]
>>> We could also consider adding an external utility module to provide helpers
>>> like this that are not really worth poluting the API. Something like
>>>
>>> lxml.tools.lineof(element)
>> That would be icing on the cake; either way is fine, If you consider such a 
>> tool, I would probably call it "parseInfo" or so, where maybe the filename, 
>> endline, and column info is available too.
> 
> The filename would be available from documents, I don't know what you mean
> with "endline" (the last line number?) and the parser column is not available
> from libxml2 (at least not once the parser has passed the element...)
> 
> So, what about an 'lxml.docinfo' module then that provides this kind of info
> helper functions? I was never really happy with the DocInfo class, so it might
> be a good idea to just move this kind of information to a separate module that
> people can use if they need it.
> 
> I'm pretty confident that there is even more that we could provide at that
> level. And it would help us in keeping the already bigger-than-big-enough API
> of lxml at least a little smaller.

I really think this is overkill. I think an attribute 'line' is fine. 
lxml has an explicit mission to take ElementTree and expand its API with 
more functionality. We do this with namespaces, we do this with xpath, 
and why wouldn't we do this with line numbers? I don't understand how 
line numbers are different.

By the way, even if 0 is both used for line 0 and elements that have an 
unknown line number, it seems actually possible to distinguish between 
the two! What would be required if 'line 0' is found is to go backwards 
in document order, until a textnode is found that contains a newline. If 
so, the answer is None. If not (and this can be done quickly), the 
answer is 0. Oh, possibly even more efficient would be to look for 
*another* node. If this node contains a line number that's non-0, you 
know you can return None. That would make the 'line' API pretty reliable.

Regards,

Martijn



More information about the lxml-dev mailing list