[lxml-dev] Python 3 changes in lxml 2.1
Stefan Behnel
stefan_ml at behnel.de
Sat May 24 11:55:39 CEST 2008
Hi,
as it currently seems, lxml 2.1 will support Python 2.6 and Python 3 out of
the box.
While fixing up lxml 2.1beta to make this work, I found a couple of things
that I needed to change. Here's an (incomplete) list, so that people can start
shouting at me for breaking their code. ;)
One major thing that changed is that the API will now always return unicode
strings for non byte stream data (.text, .tag, namespaces, ...), whereas it
continues to return a byte string for plain ASCII data in Py2.
Two things have become a bit quirky now. We currently return a subclass of
ElementTree from XSLT, and you can call str(tree) on it to get the result.
Returning a byte string here raises an exception in Py3, so that str(result)
now behaves as unicode(result) did before, i.e. it returns a Python unicode
string. To get the expected result as a byte string, people will have to use
the new buffer protocol instead (memoryview&friends). This also means that
bytes(xslt_result)
will work as expected. Sadly, this means that there isn't a way to get the
result in a portable way. I'm thinking about adding a .tobytes() method, but
I'm not sure this is really helpful.
The second quirk is serialisation to a unicode string. Instead of
tostring(root, encoding=unicode)
you now have to write
tostring(root, encoding=str)
so this requires source adaptation. Then again, this is (hopefully) a rare
usage anyway and most Python code will require Py3 changes anyway. Haven't
checked, but the 2to3 tool should normally take care of this.
The ugliest problem I found so far is with doctests. There just isn't a way to
write a Py2/Py3 portable doctest that accepts exactly a byte string or unicode
strings as output, as both look different in Py2 and Py3. Also, exception
names are now fully qualified, so that tracebacks look different. Tons of
failing tests for nothing...
Stefan
More information about the lxml-dev
mailing list