[Lxml-checkins] r46156 - lxml/trunk/doc

scoder at codespeak.net scoder at codespeak.net
Wed Aug 29 14:28:29 CEST 2007


Author: scoder
Date: Wed Aug 29 14:28:29 2007
New Revision: 46156

Added:
   lxml/trunk/doc/lxml2.txt
Modified:
   lxml/trunk/doc/mkhtml.py
Log:
new doc file: what's new in lxml 2.0

Added: lxml/trunk/doc/lxml2.txt
==============================================================================
--- (empty file)
+++ lxml/trunk/doc/lxml2.txt	Wed Aug 29 14:28:29 2007
@@ -0,0 +1,150 @@
+=======================
+What's new in lxml 2.0?
+=======================
+
+.. contents::
+..
+   1  Changes in etree and objectify
+     1.1  Incompatible changes
+     1.2  Enhancements
+     1.3  Other changes
+   2  New modules
+     2.1  lxml.html
+     2.2  lxml.cssselect
+     2.3  lxml.doctestcompare
+
+
+During the development of the lxml 1.x series, a couple of quirks were
+discovered in the design that made the API less obvious and its future
+extensions harder than necessary. lxml 2.0 is a soft evolution of lxml 1.x
+towards a simpler, more consistent and more powerful API - with some major
+extensions.  Wherever possible, lxml 1.3 comes close to the semantics of lxml
+2.0, so that migrating should be easier for code that currently runs with 1.3.
+
+
+Changes in etree and objectify
+==============================
+
+A graduation towards a more consistent API cannot go without a certain amount
+of incompatible changes.  The following is a list of those differences that
+applications need to take into account when migrating from lxml 1.x to lxml
+2.0.
+
+Incompatible changes
+--------------------
+
+* lxml 0.9 introduced a feature called `namespace implementation`_.  The
+  global ``Namespace`` factory was added to register custom element classes
+  and have lxml.etree look them up automatically.  However, the later
+  development of further class lookup mechanisms made it appear less and less
+  adequate to register this mapping at a global level, so lxml 1.1 first
+  removed the namespace based lookup from the default setup and lxml 2.0
+  finally removes the global namespace registry completely.  As all other
+  lookup mechanisms, the namespace lookup is now local to a parser, including
+  the registry itself.  Applications that use a module-level parser can easily
+  map its ``get_namespace()`` method to a global ``Namespace`` function to
+  mimic the old behaviour.
+
+  .. _`namespace implementation`: element_classes.html#implementing-namespaces
+
+* XPath now raises exceptions specific to the part of the execution that
+  failed: ``XPathSyntaxError`` for parser errors and ``XPathEvalError`` for
+  errors that occurred during the evaluation.  Note that the distinction only
+  works for the ``XPath()`` class.  The other two evaluators only have a
+  single evaluation call that includes the parsing step, and will therefore
+  only raise an ``XPathEvalError``.  Applications can catch both exceptions
+  through the common base class ``XPathError`` (which also exists in earlier
+  lxml versions).
+
+* Network access in parsers is now disabled by default, i.e. the
+  ``no_network`` option defaults to True.  Due to a somewhat 'interesting'
+  implementation in libxml2, this does not affect the first document (i.e. the
+  URL that is parsed), but only subsequent documents, such as a DTD when
+  parsing with validation.  This means that you will have to check the URL you
+  pass, instead of relying on lxml to prevent *any* access to external
+  resources.  As this can be helpful in some use cases, lxml does not work
+  around it.
+
+* The type annotations in lxml.objectify (the ``pytype`` attribute) now use
+  ``NoneType`` for the None value as this is the correct Python type name.
+  Previously, lxml 1.x used a lower case ``ǹone``.
+
+* Another change in objectify regards the way it deals with ambiguous types.
+  Previously, setting a value like the string ``"3"`` through normal attribute
+  access would let it come back as an integer when reading the object
+  attribute.  lxml 2.0 prevents this by always setting the ``pytype``
+  attribute to the type the user passed in, so ``"3"`` will come back as a
+  string, while the number ``3`` will come back as a number.  To remove the
+  type annotation on serialisation, you can use the ``deannotate()`` function.
+
+* The C-API function ``findOrBuildNodeNs()`` was replaced by the more generic
+  ``findOrBuildNodeNsPrefix()``
+
+
+Enhancements
+------------
+
+Most of the enhancements of lxml 2.0 were made under the hood.  Most people
+won't even notice them, but they make the maintenance of lxml easier and thus
+facilitate further enhancements and an improved integration between lxml's
+features.
+
+* lxml.objectify now has its own implementation of the ``E factory``.  It uses
+  the built-in type lookup mechanism of lxml.objectify, thus removing the need
+  for an additional type registry mechanism (as previously available through
+  the ``typemap`` parameter).
+
+* XML entities are supported through the ``Entity()`` factory, an Entity
+  element class and a parser option ``resolve_entities`` that allows to keep
+  entities in the element tree when set to False.  Also, the parser will now
+  report undefined entities as errors if it needs to resolve them (which is
+  still the default, as in lxml 1.x).
+
+* A major part of the XPath code was rewritten and can now benefit from a
+  bigger overlap with the XSLT code.  The main benefits are improved thread
+  safety in the XPath evaluators and Python RegExp support in standard XPath.
+
+
+New modules
+===========
+
+The most visible changes in lxml 2.0 regard the new modules that were added.
+
+
+lxml.usedoctest
+---------------
+
+A very useful module for doctests based on XML or HTML is
+``lxml.doctestcompare``.  It provides a relaxed comparison mechanism for XML
+and HTML in doctests.  Using it is as simple as::
+
+    >>> import lxml.usedoctest
+
+for XML comparisons and::
+
+    >>> import lxml.html.usedoctest
+
+for HTML comparisons.
+
+
+lxml.html
+---------
+
+The largest new package that was added to lxml 2.0 is `lxml.html`_.  It
+contains various tools and modules for HTML handling.  The major features
+include support for cleaning up HTML (removing unwanted content), a readable
+HTML diff and various tools for working with links.
+
+.. _`lxml.html`: lxmlhtml.html
+
+
+lxml.cssselect
+--------------
+
+The Cascading Stylesheet Language (CSS_) has a very short and generic path
+language for pointing at elements in XML/HTML trees (`CSS selectors`_).  The module
+lxml.cssselect_ provides an implementation based on XPath.
+
+.. _lxml.cssselect: cssselect.html
+.. _CSS: http://www.w3.org/Style/CSS/
+.. _`CSS selectors`: http://www.w3.org/TR/CSS21/selector.html

Modified: lxml/trunk/doc/mkhtml.py
==============================================================================
--- lxml/trunk/doc/mkhtml.py	(original)
+++ lxml/trunk/doc/mkhtml.py	Wed Aug 29 14:28:29 2007
@@ -2,8 +2,8 @@
 import os, shutil, re, sys, copy, time
 
 SITE_STRUCTURE = [
-    ('lxml', ('main.txt', 'intro.txt', 'FAQ.txt', 'compatibility.txt',
-              'performance.txt', 'build.txt')),
+    ('lxml', ('main.txt', 'intro.txt', 'lxml2.txt', 'FAQ.txt',
+              'compatibility.txt', 'performance.txt', 'build.txt')),
     ('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt',
                               'validation.txt', 'xpathxslt.txt',
                               'objectify.txt', 'lxmlhtml.txt',


More information about the lxml-checkins mailing list