[lxml-dev] ET 1.3
Stefan Behnel
stefan_ml at behnel.de
Wed Sep 12 21:59:43 CEST 2007
Ian Bicking wrote:
> I was just reading the ElementTree 1.3 release notes:
>
> http://effbot.org/zone/elementtree-13-intro.htm
Ah, good to know. I already had a few discussions with Fredrik about a couple
of features or changes in lxml.etree or ET 1.3, so both are continuously
getting closer (especially now that parsers are almost compatible :).
> Generally I like the changes. The change from Element as a factory
> function to Element as a subclassable class (akin to ElementBase), is
> nice
Hmm, I'm not even sure we could do that in Cython. Sounds like he's been
playing with __new__, not sure Cython supports that.
> -- I never understood why there was a distinction. Except...
> because "el = Element(tag)" doesn't necessarily mean that "el.__class__
> is Element"...?
At least in lxml that's getting pretty rare these days...
> getiterator to iter is a simple seeming change. Since getiterator
> actually returns an iterable, not an iterator, it's also just a little
> more accurate. Looks like it also moves to an iterator, not a list.
That's one of the changes Fredrik mentioned a while ago, so lxml.etree already
has it in 1.3.
> I don't have much of an opinion on the parser and serializer stuff,
> though I'd love it if there was a proper serializer for HTML (not the
> dumb XSLT-based thing I put in lxml.html).
I know. Actually, libxml2 distinguishes between HTML documents and XML
documents internally, so we could already take that as a serialisation hint.
So, if you parse stuff with HTML() or an HTMLParser, you'd get an HTML
document on serialisation, otherwise you'd get an XML document.
I could also imagine something like a separate ElementTree class in lxml.html
that you could wrap any Element in to make sure it gets serialised as plain
HTML (and not XHTML).
> I notice that elements now give warnings when treated as booleans. I
> like this a lot, as I've found many bugs in my code where I did "if el"
> where I should have done "if el is not None". And an element with no
> children doesn't feel falsish at all to me. I've actually already taken
> to using len(el) to test for children, just because I can't get myself
> to commit to this weird-seeming behavior.
I guess lxml.etree will just follow in 2.0.
I'll also take a look through the other changes. There were a few that I had
not yet heard of. I like the fact that ET 1.3 and lxml 2.0 share a common
alpha phase. That makes additions and learning from each other pretty easy.
Stefan
More information about the lxml-dev
mailing list