[lxml-dev] Setting URL from lxml.html.fromstring, etc
Stefan Behnel
stefan_ml at behnel.de
Sat Mar 1 09:33:56 CET 2008
Hi Ian,
Ian Bicking wrote:
> OK. Then would the html base attribute just be a read-only property
> then? Like:
>
> def base(self):
> return super(HtmlElement, self).base
> base = property(base)
>
> I'm not terribly concerned about whether it is read-only or not. It's a
> little fuzzy, since HTML is parsed to the lxml representation, and
> though it will probably be serialized to HTML again (if it is serialized
> at all) and HTML doesn't have anything like xml:base, the lxml
> representation is not itself exactly HTML. And if you serialize to
> XHTML, then xml:base is available.
Hmm, true. However, if you use lxml.html, you're likely to stay in the HTML
world, so I would prefer making this read-only. If you really want an xml:base
attribute, you can set it yourself, and if you really want to set the document
URL, it's better to be explicit than setting it through an Element.
> Also translating HTML to XHTML is kind of an outstanding issue for
> lxml.html, and it seems reasonable to me that XHTML could be parsed into
> the same classes as HTML. The only real caveat there is that XHTML uses
> different (namespaced) tag names. If you remove the tag names, then the
> classes and the lookup applies just fine. (Presumably the lookup could
> be changed to support XHTML fairly easily.)
That's a different topic, so I think we should discuss that in a separate thread.
Stefan
More information about the lxml-dev
mailing list