[lxml-dev] .base and docinfo.URL

Stefan Behnel stefan_ml at behnel.de
Tue Mar 11 10:35:01 CET 2008


Hi,

fixed on the trunk.

Stefan

Stefan Behnel wrote:
> Ian Bicking wrote:
>> Does .base inherit from docinfo.URL?  It doesn't seem like it does.  I 
>> tried changing .base_url to just return self.base, but if I do:
>>
>>  >>> from lxml.html import parse
>>  >>> doc = parse('http://python.org').getroot()
>>  >>> print doc.base
>> None
>>  >>> doc.getroottree().docinfo.URL
>> 'http://python.org'
> 
> I just checked the libxml2 source, it actually behaves completely different
> for HTML documents. Here, it looks for
> 
>     <html><head><base href="...">
> 
> and takes that. It completely ignores the document URL for HTML.
> 
> I think it would be good to override that (directly in etree), so that it
> returns the document URL if nothing is returned from the base search. That
> way, it's consistent with the fallback in XML.
> 
> Stefan
> 
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev


More information about the lxml-dev mailing list