[lxml-dev] .base and docinfo.URL

Stefan Behnel stefan_ml at behnel.de
Tue Mar 11 08:54:03 CET 2008


Hi Ian,

Ian Bicking wrote:
> Does .base inherit from docinfo.URL?  It doesn't seem like it does.  I 
> tried changing .base_url to just return self.base, but if I do:
> 
>  >>> from lxml.html import parse
>  >>> doc = parse('http://python.org').getroot()
>  >>> print doc.base
> None
>  >>> doc.getroottree().docinfo.URL
> 'http://python.org'

I just checked the libxml2 source, it actually behaves completely different
for HTML documents. Here, it looks for

    <html><head><base href="...">

and takes that. It completely ignores the document URL for HTML.

I think it would be good to override that (directly in etree), so that it
returns the document URL if nothing is returned from the base search. That
way, it's consistent with the fallback in XML.

Stefan



More information about the lxml-dev mailing list