[lxml-dev] ET 1.3

Stefan Behnel stefan_ml at behnel.de
Wed Sep 12 21:59:43 CEST 2007


Ian Bicking wrote:
> I was just reading the ElementTree 1.3 release notes:
> 
>    http://effbot.org/zone/elementtree-13-intro.htm

Ah, good to know. I already had a few discussions with Fredrik about a couple
of features or changes in lxml.etree or ET 1.3, so both are continuously
getting closer (especially now that parsers are almost compatible :).


> Generally I like the changes.  The change from Element as a factory 
> function to Element as a subclassable class (akin to ElementBase), is 
> nice

Hmm, I'm not even sure we could do that in Cython. Sounds like he's been
playing with __new__, not sure Cython supports that.


> -- I never understood why there was a distinction.  Except... 
> because "el = Element(tag)" doesn't necessarily mean that "el.__class__ 
> is Element"...?

At least in lxml that's getting pretty rare these days...


> getiterator to iter is a simple seeming change.  Since getiterator 
> actually returns an iterable, not an iterator, it's also just a little 
> more accurate.  Looks like it also moves to an iterator, not a list.

That's one of the changes Fredrik mentioned a while ago, so lxml.etree already
has it in 1.3.


> I don't have much of an opinion on the parser and serializer stuff, 
> though I'd love it if there was a proper serializer for HTML (not the 
> dumb XSLT-based thing I put in lxml.html).

I know. Actually, libxml2 distinguishes between HTML documents and XML
documents internally, so we could already take that as a serialisation hint.
So, if you parse stuff with HTML() or an HTMLParser, you'd get an HTML
document on serialisation, otherwise you'd get an XML document.

I could also imagine something like a separate ElementTree class in lxml.html
that you could wrap any Element in to make sure it gets serialised as plain
HTML (and not XHTML).


> I notice that elements now give warnings when treated as booleans.  I 
> like this a lot, as I've found many bugs in my code where I did "if el" 
> where I should have done "if el is not None".  And an element with no 
> children doesn't feel falsish at all to me.  I've actually already taken 
> to using len(el) to test for children, just because I can't get myself 
> to commit to this weird-seeming behavior.

I guess lxml.etree will just follow in 2.0.

I'll also take a look through the other changes. There were a few that I had
not yet heard of. I like the fact that ET 1.3 and lxml 2.0 share a common
alpha phase. That makes additions and learning from each other pretty easy.

Stefan


More information about the lxml-dev mailing list