[lxml-dev] lxml 2.0alpha5 released

Stefan Behnel stefan_ml at behnel.de
Sat Nov 24 12:51:34 CET 2007


Hi all,

lxml 2.0alpha5 made it to PyPI. This is (hopefully) the last alpha in the
pre-2.0 series, so please report any remaining API quirks, weirdnesses and
bugs now to make sure they get fixed before 2.0 gets its API freeze during the
beta cycle. If all works out well, there should not be more than one beta
release before the final version.

This release features a major overhaul of the target parser, including an
internal SAX parser framework and an ET compatible TreeBuilder implementation.
The complete Changelog follows below.

Note that the API now enforces keyword-only arguments in a couple of places.
This can require some syntactic changes in existing code.

Have fun,
Stefan


2.0alpha5 (2007-11-24)
Features added

    * Rich comparison of element.attrib proxies.
    * ElementTree compatible TreeBuilder class.
    * Use default prefixes for some common XML namespaces.
    * lxml.html.clean.Cleaner now allows for a host_whitelist, and two
      overridable methods: allow_embedded_url(el, url) and the more general
      allow_element(el).
    * Extended slicing of Elements as in element[1:-1:2], both in etree and in
      objectify
    * Resolvers can now provide a base_url keyword argument when resolving a
      document as string data.
    * When using lxml.doctestcompare you can give the doctest option
      NOPARSE_MARKUP (like # doctest: +NOPARSE_MARKUP) to suppress the special
      checking for one test.

Bugs fixed

    * Target parser failed to report comments.
    * In the lxml.html iter_links() method, links in <object> tags weren't
      recognized. (Note: plugin-specific link parameters still aren't
      recognized.) Also, the <embed> tag, though not standard, is now included
      in lxml.html.defs.special_inline_tags.
    * Using custom resolvers on XSLT stylesheets parsed from a string could
      request ill-formed URLs.
    * With lxml.doctestcompare if you do <tag xmlns="..."> in your output, it
      will then be namespace-neutral (before the ellipsis was treated as a
      real namespace).

Other changes

    * The module source files were renamed to "lxml.*.pyx", such as
      "lxml.etree.pyx". This was changed for consistency with the way Pyrex
      commonly handles package imports. The main effect is that classes now
      know about their fully qualified class name, including the package name
      of their module.
    * Keyword-only arguments in some API functions, especially in the parsers
      and serialisers.



More information about the lxml-dev mailing list