[lxml-dev] Setting URL from lxml.html.fromstring, etc

Stefan Behnel stefan_ml at behnel.de
Sat Mar 1 09:33:56 CET 2008


Hi Ian,

Ian Bicking wrote:
> OK.  Then would the html base attribute just be a read-only property
> then?  Like:
> 
>   def base(self):
>       return super(HtmlElement, self).base
>   base = property(base)
>
> I'm not terribly concerned about whether it is read-only or not.  It's a
> little fuzzy, since HTML is parsed to the lxml representation, and
> though it will probably be serialized to HTML again (if it is serialized
> at all) and HTML doesn't have anything like xml:base, the lxml
> representation is not itself exactly HTML.  And if you serialize to
> XHTML, then xml:base is available.

Hmm, true. However, if you use lxml.html, you're likely to stay in the HTML
world, so I would prefer making this read-only. If you really want an xml:base
attribute, you can set it yourself, and if you really want to set the document
URL, it's better to be explicit than setting it through an Element.


> Also translating HTML to XHTML is kind of an outstanding issue for
> lxml.html, and it seems reasonable to me that XHTML could be parsed into
> the same classes as HTML.  The only real caveat there is that XHTML uses
> different (namespaced) tag names.  If you remove the tag names, then the
> classes and the lookup applies just fine.  (Presumably the lookup could
> be changed to support XHTML fairly easily.)

That's a different topic, so I think we should discuss that in a separate thread.

Stefan



More information about the lxml-dev mailing list