[lxml-dev] lxml 2.1 released
Stefan Behnel
stefan_ml at behnel.de
Wed Jul 9 15:27:21 CEST 2008
Hi,
lxml 2.1 finally made it to PyPI!
This is a major new release that follows the 2.0 series with a couple of
cleanups and tons of new features. The complete changelog follows below.
This is also the first version that officially supports Python 3, as released
in 3.0beta1.
Have fun,
Stefan
2.1 (2008-07-09)
================
Features added
--------------
* Smart strings can be switched off in XPath (``smart_string`` keyword
option).
* ``lxml.html.rewrite_links()`` strips links to work around documents
with whitespace in URL attributes.
* Pickling ``ElementTree`` objects in lxml.objectify.
* Major overhaul of ``tools/xpathgrep.py`` script.
* Pickling ``ElementTree`` objects in lxml.objectify.
* Support for parsing from file-like objects that return unicode
strings.
* New function ``etree.cleanup_namespaces(el)`` that removes unused
namespace declarations from a (sub)tree (experimental).
* XSLT results support the buffer protocol in Python 3.
* Polymorphic functions in ``lxml.html`` that accept either a tree or
a parsable string will return either a UTF-8 encoded byte string, a
unicode string or a tree, based on the type of the input.
Previously, the result was always a byte string or a tree.
* Support for Python 2.6 and 3.0 beta.
* File name handling now uses a heuristic to convert between byte
strings (usually filenames) and unicode strings (usually URLs).
* Parsing from a plain file object frees the GIL under Python 2.x.
* Running ``iterparse()`` on a plain file (or filename) frees the GIL
on reading under Python 2.x.
* Conversion functions ``html_to_xhtml()`` and ``xhtml_to_html()`` in
lxml.html (experimental).
* Most features in lxml.html work for XHTML namespaced tag names
(experimental).
* All parse functions in lxml.html take a ``parser`` keyword argument.
* lxml.html has a new parser class ``XHTMLParser`` and a module
attribute ``xhtml_parser`` that provide XML parsers that are
pre-configured for the lxml.html package.
* Error logging in Schematron (requires libxml2 2.6.32 or later).
* Parser option ``strip_cdata`` for normalising or keeping CDATA
sections. Defaults to ``True`` as before, thus replacing CDATA
sections by their text content.
* ``CDATA()`` factory to wrap string content as CDATA section.
* New event types 'comment' and 'pi' in ``iterparse()``.
* ``XSLTAccessControl`` instances have a property ``options`` that
returns a dict of access configuration options.
* Constant instances ``DENY_ALL`` and ``DENY_WRITE`` on
``XSLTAccessControl`` class.
* Extension elements for XSLT (experimental!)
* ``Element.base`` property returns the xml:base or HTML base URL of
an Element.
* ``docinfo.URL`` property is writable.
Bugs fixed
----------
* Custom resolvers were not used for XMLSchema includes/imports and
XInclude processing.
* CSS selector parser dropped remaining expression after a function
with parameters.
* Descending dot-separated classes in CSS selectors were not resolved
correctly.
* ``ElementTree.parse()`` didn't handle target parser result.
* Potential threading problem in XInclude.
* Crash in Element class lookup classes when the __init__() method of
the super class is not called from Python subclasses.
* A number of problems related to unicode/byte string conversion of
filenames and error messages were fixed.
* Building on MacOS-X now passes the "flat_namespace" option to the C
compiler, which reportedly prevents build quirks and crashes on this
platform.
* Windows build was broken.
* Rare crash when serialising to a file object with certain encodings.
* Incorrect evaluation of ``el.find("tag[child]")``.
* Moving a subtree from a document created in one thread into a
document of another thread could crash when the rest of the source
document is deleted while the subtree is still in use.
* Passing an nsmap when creating an Element will no longer strip
redundantly defined namespace URIs. This prevented the definition
of more than one prefix for a namespace on the same Element.
* Resolving to a filename in custom resolvers didn't work.
* lxml did not honour libxslt's second error state "STOPPED", which
let some XSLT errors pass silently.
* Memory leak in Schematron with libxml2 >= 2.6.31.
* lxml.etree accepted non well-formed namespace prefix names.
* Hanging thread in conjunction with GTK threading.
* Crash bug in iterparse when moving elements into other documents.
* HTML elements' ``.cssselect()`` method was broken.
* ``ElementTree.find*()`` didn't accept QName objects.
* Default encoding for plain text serialisation was different from
that of XML serialisation (UTF-8 instead of ASCII).
Other changes
-------------
* ``objectify.enableRecursiveStr()`` was removed, use
``objectify.enable_recursive_str()`` instead
* Speed-up when running XSLTs on documents from other threads
* Non-ASCII characters in attribute values are no longer escaped on
serialisation.
* Passing non-ASCII byte strings or invalid unicode strings as .tag,
namespaces, etc. will result in a ValueError instead of an
AssertionError (just like the tag well-formedness check).
* Up to several times faster attribute access (i.e. tree traversal) in
lxml.objectify.
* lxml should now build without problems on MacOS-X.
* If the default namespace is redundantly defined with a prefix on the
same Element, the prefix will now be preferred for subelements and
attributes. This allows users to work around a problem in libxml2
where attributes from the default namespace could serialise without
a prefix even when they appear on an Element with a different
namespace (i.e. they would end up in the wrong namespace).
* Major cleanup in internal ``moveNodeToDocument()`` function, which
takes care of namespace cleanup when moving elements between
different namespace contexts.
* New Elements created through the ``makeelement()`` method of an HTML
parser or through lxml.html now end up in a new HTML document
(doctype HTML 4.01 Transitional) instead of a generic XML document.
This mostly impacts the serialisation and the availability of a DTD
context.
* Minor API speed-ups.
* The benchmark suite now uses tail text in the trees, which makes the
absolute numbers incomparable to previous results.
* Generating the HTML documentation now requires Pygments_, which is
used to enable syntax highlighting for the doctest examples.
.. _Pygments: http://pygments.org/
Most long-time deprecated functions and methods were removed:
- ``etree.clearErrorLog()``, use ``etree.clear_error_log()``
- ``etree.useGlobalPythonLog()``, use
``etree.use_global_python_log()``
- ``etree.ElementClassLookup.setFallback()``, use
``etree.ElementClassLookup.set_fallback()``
- ``etree.getDefaultParser()``, use ``etree.get_default_parser()``
- ``etree.setDefaultParser()``, use ``etree.set_default_parser()``
- ``etree.setElementClassLookup()``, use
``etree.set_element_class_lookup()``
Note that ``parser.setElementClassLookup()`` has not been removed
yet, although ``parser.set_element_class_lookup()`` should be used
instead.
- ``xpath_evaluator.registerNamespace()``, use
``xpath_evaluator.register_namespace()``
- ``xpath_evaluator.registerNamespaces()``, use
``xpath_evaluator.register_namespaces()``
- ``objectify.setPytypeAttributeTag``, use
``objectify.set_pytype_attribute_tag``
- ``objectify.setDefaultParser()``, use
``objectify.set_default_parser()``
More information about the lxml-dev
mailing list