[Lxml-checkins] r46156 - lxml/trunk/doc
scoder at codespeak.net
scoder at codespeak.net
Wed Aug 29 14:28:29 CEST 2007
Author: scoder
Date: Wed Aug 29 14:28:29 2007
New Revision: 46156
Added:
lxml/trunk/doc/lxml2.txt
Modified:
lxml/trunk/doc/mkhtml.py
Log:
new doc file: what's new in lxml 2.0
Added: lxml/trunk/doc/lxml2.txt
==============================================================================
--- (empty file)
+++ lxml/trunk/doc/lxml2.txt Wed Aug 29 14:28:29 2007
@@ -0,0 +1,150 @@
+=======================
+What's new in lxml 2.0?
+=======================
+
+.. contents::
+..
+ 1 Changes in etree and objectify
+ 1.1 Incompatible changes
+ 1.2 Enhancements
+ 1.3 Other changes
+ 2 New modules
+ 2.1 lxml.html
+ 2.2 lxml.cssselect
+ 2.3 lxml.doctestcompare
+
+
+During the development of the lxml 1.x series, a couple of quirks were
+discovered in the design that made the API less obvious and its future
+extensions harder than necessary. lxml 2.0 is a soft evolution of lxml 1.x
+towards a simpler, more consistent and more powerful API - with some major
+extensions. Wherever possible, lxml 1.3 comes close to the semantics of lxml
+2.0, so that migrating should be easier for code that currently runs with 1.3.
+
+
+Changes in etree and objectify
+==============================
+
+A graduation towards a more consistent API cannot go without a certain amount
+of incompatible changes. The following is a list of those differences that
+applications need to take into account when migrating from lxml 1.x to lxml
+2.0.
+
+Incompatible changes
+--------------------
+
+* lxml 0.9 introduced a feature called `namespace implementation`_. The
+ global ``Namespace`` factory was added to register custom element classes
+ and have lxml.etree look them up automatically. However, the later
+ development of further class lookup mechanisms made it appear less and less
+ adequate to register this mapping at a global level, so lxml 1.1 first
+ removed the namespace based lookup from the default setup and lxml 2.0
+ finally removes the global namespace registry completely. As all other
+ lookup mechanisms, the namespace lookup is now local to a parser, including
+ the registry itself. Applications that use a module-level parser can easily
+ map its ``get_namespace()`` method to a global ``Namespace`` function to
+ mimic the old behaviour.
+
+ .. _`namespace implementation`: element_classes.html#implementing-namespaces
+
+* XPath now raises exceptions specific to the part of the execution that
+ failed: ``XPathSyntaxError`` for parser errors and ``XPathEvalError`` for
+ errors that occurred during the evaluation. Note that the distinction only
+ works for the ``XPath()`` class. The other two evaluators only have a
+ single evaluation call that includes the parsing step, and will therefore
+ only raise an ``XPathEvalError``. Applications can catch both exceptions
+ through the common base class ``XPathError`` (which also exists in earlier
+ lxml versions).
+
+* Network access in parsers is now disabled by default, i.e. the
+ ``no_network`` option defaults to True. Due to a somewhat 'interesting'
+ implementation in libxml2, this does not affect the first document (i.e. the
+ URL that is parsed), but only subsequent documents, such as a DTD when
+ parsing with validation. This means that you will have to check the URL you
+ pass, instead of relying on lxml to prevent *any* access to external
+ resources. As this can be helpful in some use cases, lxml does not work
+ around it.
+
+* The type annotations in lxml.objectify (the ``pytype`` attribute) now use
+ ``NoneType`` for the None value as this is the correct Python type name.
+ Previously, lxml 1.x used a lower case ``ǹone``.
+
+* Another change in objectify regards the way it deals with ambiguous types.
+ Previously, setting a value like the string ``"3"`` through normal attribute
+ access would let it come back as an integer when reading the object
+ attribute. lxml 2.0 prevents this by always setting the ``pytype``
+ attribute to the type the user passed in, so ``"3"`` will come back as a
+ string, while the number ``3`` will come back as a number. To remove the
+ type annotation on serialisation, you can use the ``deannotate()`` function.
+
+* The C-API function ``findOrBuildNodeNs()`` was replaced by the more generic
+ ``findOrBuildNodeNsPrefix()``
+
+
+Enhancements
+------------
+
+Most of the enhancements of lxml 2.0 were made under the hood. Most people
+won't even notice them, but they make the maintenance of lxml easier and thus
+facilitate further enhancements and an improved integration between lxml's
+features.
+
+* lxml.objectify now has its own implementation of the ``E factory``. It uses
+ the built-in type lookup mechanism of lxml.objectify, thus removing the need
+ for an additional type registry mechanism (as previously available through
+ the ``typemap`` parameter).
+
+* XML entities are supported through the ``Entity()`` factory, an Entity
+ element class and a parser option ``resolve_entities`` that allows to keep
+ entities in the element tree when set to False. Also, the parser will now
+ report undefined entities as errors if it needs to resolve them (which is
+ still the default, as in lxml 1.x).
+
+* A major part of the XPath code was rewritten and can now benefit from a
+ bigger overlap with the XSLT code. The main benefits are improved thread
+ safety in the XPath evaluators and Python RegExp support in standard XPath.
+
+
+New modules
+===========
+
+The most visible changes in lxml 2.0 regard the new modules that were added.
+
+
+lxml.usedoctest
+---------------
+
+A very useful module for doctests based on XML or HTML is
+``lxml.doctestcompare``. It provides a relaxed comparison mechanism for XML
+and HTML in doctests. Using it is as simple as::
+
+ >>> import lxml.usedoctest
+
+for XML comparisons and::
+
+ >>> import lxml.html.usedoctest
+
+for HTML comparisons.
+
+
+lxml.html
+---------
+
+The largest new package that was added to lxml 2.0 is `lxml.html`_. It
+contains various tools and modules for HTML handling. The major features
+include support for cleaning up HTML (removing unwanted content), a readable
+HTML diff and various tools for working with links.
+
+.. _`lxml.html`: lxmlhtml.html
+
+
+lxml.cssselect
+--------------
+
+The Cascading Stylesheet Language (CSS_) has a very short and generic path
+language for pointing at elements in XML/HTML trees (`CSS selectors`_). The module
+lxml.cssselect_ provides an implementation based on XPath.
+
+.. _lxml.cssselect: cssselect.html
+.. _CSS: http://www.w3.org/Style/CSS/
+.. _`CSS selectors`: http://www.w3.org/TR/CSS21/selector.html
Modified: lxml/trunk/doc/mkhtml.py
==============================================================================
--- lxml/trunk/doc/mkhtml.py (original)
+++ lxml/trunk/doc/mkhtml.py Wed Aug 29 14:28:29 2007
@@ -2,8 +2,8 @@
import os, shutil, re, sys, copy, time
SITE_STRUCTURE = [
- ('lxml', ('main.txt', 'intro.txt', 'FAQ.txt', 'compatibility.txt',
- 'performance.txt', 'build.txt')),
+ ('lxml', ('main.txt', 'intro.txt', 'lxml2.txt', 'FAQ.txt',
+ 'compatibility.txt', 'performance.txt', 'build.txt')),
('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt',
'validation.txt', 'xpathxslt.txt',
'objectify.txt', 'lxmlhtml.txt',
More information about the lxml-checkins
mailing list