From scoder at codespeak.net Sat Mar 1 17:59:51 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 1 Mar 2008 17:59:51 +0100 (CET) Subject: [Lxml-checkins] r52006 - in lxml/trunk: . doc Message-ID: <20080301165951.77B341684EC@codespeak.net> Author: scoder Date: Sat Mar 1 17:59:49 2008 New Revision: 52006 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3651 at delle: sbehnel | 2008-03-01 17:00:46 +0100 clarification on MacOS-X crashes Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Mar 1 17:59:49 2008 @@ -408,15 +408,15 @@ the FAQ section on threading_ to check if you touch on one of the potential pitfalls. -b) If you are on Mac-OS X, make sure lxml uses the correct libraries. If you - have updated the old system libraries (e.g. through fink), this is best - achieved by building lxml statically to prevent the different library - versions from interfering. If you choose to use a dynamically linked - version, make sure the ``DYLD_LIBRARY_PATH`` environment variable contains - the directory where you installed the libraries. To make sure the correct - libraries are used, print the module level version numbers that - ``lxml.etree`` provides from *within* your application rather than relying - on what your operating system tells you. +b) If you are on Mac-OS X, make sure lxml uses the correct libraries. + Since the normal system libraries are pretty much outdated, you + likely have installed newer versions through a package management + system like fink or macports. In this case, please make sure the + ``DYLD_LIBRARY_PATH`` environment variable contains the directory + where you installed the libraries. There are other Python packages + that depend on libxml2, so it is up to you to make sure that *all* + packages that dynamically load libxml2 load the *same* library + version. Loading conflicting versions *will* lead to a crash. In any case, try to reproduce the problem with the latest versions of libxml2 and libxslt. From time to time, bugs and race conditions are found From scoder at codespeak.net Sat Mar 1 17:59:55 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 1 Mar 2008 17:59:55 +0100 (CET) Subject: [Lxml-checkins] r52007 - in lxml/trunk: . doc Message-ID: <20080301165955.934AB1684EC@codespeak.net> Author: scoder Date: Sat Mar 1 17:59:55 2008 New Revision: 52007 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3652 at delle: sbehnel | 2008-03-01 17:58:57 +0100 more clarifications on MacOS-X crashes Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Mar 1 17:59:55 2008 @@ -396,33 +396,53 @@ My application crashes! ----------------------- -One of the goals of lxml is "no segfaults", so if there is no clear warning in -the documentation that you were doing something potentially harmful, you have -found a bug and we would like to hear about it. Please report this bug to the -`mailing list`_. See the next section on how to do that. - -However, there are a few things to try first, to make sure the problem is -really within lxml (or libxml2 or libxslt): - -a) If your application (or e.g. your web container) uses threads, please see - the FAQ section on threading_ to check if you touch on one of the - potential pitfalls. - -b) If you are on Mac-OS X, make sure lxml uses the correct libraries. - Since the normal system libraries are pretty much outdated, you - likely have installed newer versions through a package management - system like fink or macports. In this case, please make sure the - ``DYLD_LIBRARY_PATH`` environment variable contains the directory - where you installed the libraries. There are other Python packages - that depend on libxml2, so it is up to you to make sure that *all* - packages that dynamically load libxml2 load the *same* library - version. Loading conflicting versions *will* lead to a crash. +One of the goals of lxml is "no segfaults", so if there is no clear +warning in the documentation that you were doing something potentially +harmful, you have found a bug and we would like to hear about it. +Please report this bug to the `mailing list`_. See the section on bug +reporting to learn how to do that. + +If your application (or e.g. your web container) uses threads, please +see the FAQ section on threading_ to check if you touch on one of the +potential pitfalls. In any case, try to reproduce the problem with the latest versions of libxml2 and libxslt. From time to time, bugs and race conditions are found in these libraries, so a more recent version might already contain a fix for your problem. +Remember: even if you see lxml appear in a crash stack trace, it is +not necessarily lxml that *caused* the crash. + + +My application crashes on MacOS-X! +---------------------------------- + +Since the normal system libraries are pretty much outdated, you likely +have installed newer versions through a package management system like +fink or macports in addition to the system libraries. Chances are +high that your system is confused by the conflicting library versions. + +To work around this, please set the ``DYLD_LIBRARY_PATH`` environment +variable *at runtime* to the directory where you installed the newer +libraries. There are other Python packages that depend on libxml2, so +it is up to you to make sure that *all* packages that dynamically load +libxml2 load the *same* library version. Loading conflicting versions +*will* lead to a crash and has confused a lot of MacOS users already. + +Please understand that if your system uses conflicting library +versions, there is nothing lxml can do about it. It is up to you as a +user to make sure you have a sane execution environment. + +See `bug 197243`_ for more information. + +.. _`bug 197243`: https://bugs.launchpad.net/lxml/+bug/197243 + +If you want a sane, reliable execution environment, especially for +production systems, `using a buildout`_ might be a good idea. + +.. _`using a buildout`: http://comments.gmane.org/gmane.comp.python.lxml.devel/3297?set_lines=100000 + I think I have found a bug in lxml. What should I do? ----------------------------------------------------- @@ -604,11 +624,13 @@ more robust against possible pitfalls. So newer versions might already fix your problem in a reliable way. -* make sure the library versions you installed are really used. Do not rely - on what your operating system tells you! Print the version constants in - ``lxml.etree`` from within your runtime environment to make sure it is the - case. This is especially a problem under MacOS-X when newer library - versions were installed in addition to the outdated system libraries. +* make sure the library versions you installed are really used. Do + not rely on what your operating system tells you! Print the version + constants in ``lxml.etree`` from within your runtime environment to + make sure it is the case. This is especially a problem under + MacOS-X when newer library versions were installed in addition to + the outdated system libraries. Please read the bugs section + regarding MacOS-X in this FAQ. * if you use ``mod_python``, try setting this option: From scoder at codespeak.net Sun Mar 2 09:31:22 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:22 +0100 (CET) Subject: [Lxml-checkins] r52025 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080302083122.37468168538@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:20 2008 New Revision: 52025 Modified: lxml/trunk/ (props changed) lxml/trunk/TODO.txt lxml/trunk/src/lxml/readonlytree.pxi lxml/trunk/src/lxml/tests/test_xslt.py lxml/trunk/src/lxml/xslt.pxd Log: r3664 at delle: sbehnel | 2008-03-02 08:56:19 +0100 r3650 at delle: sbehnel | 2008-02-29 21:39:21 +0100 initial import: will use read-only elements to access the XSLT tree, the input tree and the output tree Modified: lxml/trunk/TODO.txt ============================================================================== --- lxml/trunk/TODO.txt (original) +++ lxml/trunk/TODO.txt Sun Mar 2 09:31:20 2008 @@ -45,6 +45,22 @@ by libxml2 (patch exists) +XSLT extension elements +----------------------- + +* implementation: one base class that represents the result parent + + - .append(), .extend() and .text will add to the result tree (no .tail) + + - difference: Elements should be copied, not moved? (will break + later changes, but this just means that Elements in the result + tree are immutable, including those that were added) + + - how to make input tree read-only? maybe just document? + + - docs: "once in the result tree, Elements must no longer be changed"? + + lxml 2.0 ======== Modified: lxml/trunk/src/lxml/readonlytree.pxi ============================================================================== --- lxml/trunk/src/lxml/readonlytree.pxi (original) +++ lxml/trunk/src/lxml/readonlytree.pxi Sun Mar 2 09:31:20 2008 @@ -207,17 +207,21 @@ cdef _ReadOnlyElementProxy NEW_RO_PROXY "PY_NEW" (object t) cdef _ReadOnlyElementProxy _newReadOnlyProxy( - _ReadOnlyElementProxy sourceProxy, xmlNode* c_node): + _ReadOnlyElementProxy source_proxy, xmlNode* c_node): cdef _ReadOnlyElementProxy el el = NEW_RO_PROXY(_ReadOnlyElementProxy) el._c_node = c_node - if sourceProxy is None: + _initReadOnlyProxy(el, source_proxy) + return el + +cdef inline _initReadOnlyProxy(_ReadOnlyElementProxy el, + _ReadOnlyElementProxy source_proxy): + if source_proxy is None: el._source_proxy = el el._dependent_proxies = [el] else: - el._source_proxy = sourceProxy - python.PyList_Append(sourceProxy._dependent_proxies, el) - return el + el._source_proxy = source_proxy + python.PyList_Append(source_proxy._dependent_proxies, el) cdef _freeReadOnlyProxies(_ReadOnlyElementProxy sourceProxy): cdef _ReadOnlyElementProxy el @@ -228,3 +232,71 @@ for el in sourceProxy._dependent_proxies: el._c_node = NULL del sourceProxy._dependent_proxies[:] + + +cdef class _ReadOnlyRootElementProxy(_ReadOnlyElementProxy): + """A read-only element that frees the subtree on deallocation. + """ + def __dealloc__(self): + if self._c_node is not NULL: + tree.xmlFreeNode(self._c_node) + +cdef class _AppendOnlyElementProxy(_ReadOnlyElementProxy): + """A read-only element that allows adding children and changing the + text content (i.e. everything that adds to the subtree). + """ + cpdef append(self, other_element): + """Append a copy of an Element to the list of children. + """ + cdef xmlNode* c_next + cdef xmlNode* c_node + self._assertNode() + c_node = _roNodeOf(other_element) + c_node = _copyNodeToDoc(c_node, self._c_node.doc) + c_next = c_node.next + tree.xmlAddChild(self._c_node, c_node) + _moveTail(c_next, c_node) + + def extend(self, elements): + """Append a copy of all Elements from a sequence to the list of + children. + """ + self._assertNode() + for element in elements: + self.append(element) + + property text: + """Text before the first subelement. This is either a string or the + value None, if there was no text. + """ + def __get__(self): + self._assertNode() + return _collectText(self._c_node.children) + + def __set__(self, value): + self._assertNode() + if isinstance(value, QName): + value = python.PyUnicode_FromEncodedObject( + _resolveQNameText(self, value), 'UTF-8', 'strict') + _setNodeText(self._c_node, value) + +cdef _AppendOnlyElementProxy _newAppendOnlyProxy( + _ReadOnlyElementProxy source_proxy, xmlNode* c_node): + cdef _AppendOnlyElementProxy el + el = <_AppendOnlyElementProxy>NEW_RO_PROXY(_AppendOnlyElementProxy) + el._c_node = c_node + _initReadOnlyProxy(el, source_proxy) + return el + +cdef xmlNode* _roNodeOf(element) except NULL: + cdef xmlNode* c_node + if isinstance(element, _Element): + c_node = (<_Element>element)._c_node + elif isinstance(element, _ReadOnlyElementProxy): + c_node = (<_ReadOnlyElementProxy>element)._c_node + else: + raise TypeError("invalid value to append()") + + if c_node is NULL: + raise TypeError("invalid element") + return c_node Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Sun Mar 2 09:31:20 2008 @@ -604,6 +604,26 @@ self.assertEquals(self._rootstring(result), 'X') + def test_extension_element(self): + tree = self.parse('B') + style = self.parse('''\ + + + b + +''') + + class mytext(etree.XSLTExtension): + pass + + result = tree.xslt(style, extensions={}) + self.assertEquals(self._rootstring(result), + 'X') + def test_xslt_document_XML(self): # make sure document('') works from parsed strings xslt = etree.XSLT(etree.XML("""\ Modified: lxml/trunk/src/lxml/xslt.pxd ============================================================================== --- lxml/trunk/src/lxml/xslt.pxd (original) +++ lxml/trunk/src/lxml/xslt.pxd Sun Mar 2 09:31:20 2008 @@ -1,4 +1,4 @@ -from tree cimport xmlDoc, xmlDict +from tree cimport xmlDoc, xmlNode, xmlDict from xpath cimport xmlXPathContext, xmlXPathFunction cdef extern from "libxslt/xslt.h": @@ -22,6 +22,11 @@ void* _private xmlDict* dict int profile + xmlNode* node + xmlDoc* output + xmlNode* insert + + ctypedef struct xsltStackElem cdef xsltStylesheet* xsltParseStylesheetDoc(xmlDoc* doc) nogil cdef void xsltFreeStylesheet(xsltStylesheet* sheet) nogil @@ -59,6 +64,9 @@ char** params, char* output, void* profile, xsltTransformContext* context) nogil + cdef void xsltProcessOneNode(xsltTransformContext* ctxt, + xmlNode* contextNode, + xsltStackElem* params) cdef xsltTransformContext* xsltNewTransformContext(xsltStylesheet* style, xmlDoc* doc) nogil cdef void xsltFreeTransformContext(xsltTransformContext* context) nogil From scoder at codespeak.net Sun Mar 2 09:31:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:28 +0100 (CET) Subject: [Lxml-checkins] r52026 - in lxml/trunk: . doc src/lxml src/lxml/tests Message-ID: <20080302083128.96BF3168539@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:28 2008 New Revision: 52026 Added: lxml/trunk/src/lxml/xsltext.pxi Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/extensions.txt lxml/trunk/doc/xpathxslt.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/readonlytree.pxi lxml/trunk/src/lxml/tests/test_xslt.py lxml/trunk/src/lxml/xslt.pxd lxml/trunk/src/lxml/xslt.pxi Log: r3665 at delle: sbehnel | 2008-03-02 08:56:20 +0100 r3655 at delle: sbehnel | 2008-03-01 22:27:17 +0100 partial reimplementation of the extension element mechanism, works now Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Mar 2 09:31:28 2008 @@ -8,6 +8,8 @@ Features added -------------- +* Extension elements for XSLT + * ``Element.base`` property returns the xml:base or HTML base URL of an Element. Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Sun Mar 2 09:31:28 2008 @@ -1,15 +1,24 @@ -Extension functions for XPath and XSLT -====================================== +Python extensions for XPath and XSLT +==================================== -This document describes how to use Python extension functions in XPath and -XSLT. They allow you to do things like this:: +This document describes how to use Python extension functions in XPath +and XSLT like this:: -Here is how such a function looks like. As the first argument, it always -receives a context object (see below). The other arguments are provided by -the respective call in the XPath expression, one in the following examples. -Any number of arguments is allowed:: +It also describes how to use Python extension elements in XSLT like +this:: + + + + + + + +Here is how an extension function looks like. As the first argument, +it always receives a context object (see below). The other arguments +are provided by the respective call in the XPath expression, one in +the following examples. Any number of arguments is allowed:: >>> def hello(dummy, a): ... return "Hello %s" % a @@ -18,14 +27,23 @@ >>> def loadsofargs(dummy, *args): ... return "Got %d arguments." % len(args) +And here is how an extension element looks like:: + + >>> from lxml import etree + >>> class MyExtElement(etree.XSLTExtension): + ... def execute(self, context, self_node, input_node, output_parent): + ... # just copy own content input to output + ... output_parent.extend( list(self_node) ) + .. contents:: .. 1 The FunctionNamespace 2 Global prefix assignment - 3 Evaluators and XSLT - 4 Evaluator-local extensions - 5 What to return from a function + 3 The XPath context + 4 Evaluators and XSLT + 5 Evaluator-local extensions + 6 What to return from a function The FunctionNamespace @@ -36,7 +54,6 @@ FunctionNamespace class. For simplicity, we choose the empty namespace (None):: - >>> from lxml import etree >>> ns = etree.FunctionNamespace(None) >>> ns['hello'] = hello >>> ns['countargs'] = loadsofargs Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Sun Mar 2 09:31:28 2008 @@ -454,6 +454,14 @@ '\nText\n' +Extension elements +------------------ + +Just like `custom extension functions`_, lxml supports custom +extension *elements*. + + + The ``xslt()`` tree method -------------------------- Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Sun Mar 2 09:31:28 2008 @@ -2578,9 +2578,15 @@ include "iterparse.pxi" # incremental XML parsing include "xmlid.pxi" # XMLID and IDDict include "xinclude.pxi" # XInclude + + +################################################################################ +# Include submodules for XPath and XSLT + include "extensions.pxi" # XPath/XSLT extension functions include "xpath.pxi" # XPath evaluation include "xslt.pxi" # XSL transformations +include "xsltext.pxi" # XSL extension elements ################################################################################ Modified: lxml/trunk/src/lxml/readonlytree.pxi ============================================================================== --- lxml/trunk/src/lxml/readonlytree.pxi (original) +++ lxml/trunk/src/lxml/readonlytree.pxi Sun Mar 2 09:31:28 2008 @@ -2,6 +2,7 @@ cdef class _ReadOnlyElementProxy: "The main read-only Element proxy class (for internal use only!)." + cdef bint _free_after_use cdef xmlNode* _c_node cdef object _source_proxy cdef object _dependent_proxies @@ -12,6 +13,11 @@ assert self._c_node is not NULL, "Proxy invalidated!" return 0 + cdef void free_after_use(self): + """Should the xmlNode* be freed when releasing the proxy? + """ + self._free_after_use = 1 + property tag: """Element tag """ @@ -216,6 +222,7 @@ cdef inline _initReadOnlyProxy(_ReadOnlyElementProxy el, _ReadOnlyElementProxy source_proxy): + el._free_after_use = 0 if source_proxy is None: el._source_proxy = el el._dependent_proxies = [el] @@ -224,23 +231,19 @@ python.PyList_Append(source_proxy._dependent_proxies, el) cdef _freeReadOnlyProxies(_ReadOnlyElementProxy sourceProxy): + cdef xmlNode* c_node cdef _ReadOnlyElementProxy el if sourceProxy is None: return if sourceProxy._dependent_proxies is None: return for el in sourceProxy._dependent_proxies: + c_node = el._c_node el._c_node = NULL + if el._free_after_use: + tree.xmlFreeNode(c_node) del sourceProxy._dependent_proxies[:] - -cdef class _ReadOnlyRootElementProxy(_ReadOnlyElementProxy): - """A read-only element that frees the subtree on deallocation. - """ - def __dealloc__(self): - if self._c_node is not NULL: - tree.xmlFreeNode(self._c_node) - cdef class _AppendOnlyElementProxy(_ReadOnlyElementProxy): """A read-only element that allows adding children and changing the text content (i.e. everything that adds to the subtree). Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Sun Mar 2 09:31:28 2008 @@ -613,17 +613,68 @@ extension-element-prefixes="myns" exclude-result-prefixes="myns"> - b + b ''') - class mytext(etree.XSLTExtension): - pass + class MyExt(etree.XSLTExtension): + def execute(self, context, self_node, input_node, output_parent): + child = etree.Element(self_node.text) + child.text = 'X' + output_parent.append(child) + + extensions = { ('testns', 'myext') : MyExt() } - result = tree.xslt(style, extensions={}) + result = tree.xslt(style, extensions=extensions) self.assertEquals(self._rootstring(result), 'X') + def test_extension_element_content(self): + tree = self.parse('B') + style = self.parse('''\ + + + XY + +''') + + class MyExt(etree.XSLTExtension): + def execute(self, context, self_node, input_node, output_parent): + output_parent.extend(list(self_node)[1:]) + + extensions = { ('testns', 'myext') : MyExt() } + + result = tree.xslt(style, extensions=extensions) + self.assertEquals(self._rootstring(result), + 'Y') + + def test_extension_element_raise(self): + tree = self.parse('B') + style = self.parse('''\ + + + b + +''') + + class MyError(Exception): + pass + + class MyExt(etree.XSLTExtension): + def execute(self, context, self_node, input_node, output_parent): + raise MyError("expected!") + + extensions = { ('testns', 'myext') : MyExt() } + self.assertRaises(MyError, tree.xslt, style, extensions=extensions) + def test_xslt_document_XML(self): # make sure document('') works from parsed strings xslt = etree.XSLT(etree.XML("""\ Modified: lxml/trunk/src/lxml/xslt.pxd ============================================================================== --- lxml/trunk/src/lxml/xslt.pxd (original) +++ lxml/trunk/src/lxml/xslt.pxd Sun Mar 2 09:31:28 2008 @@ -8,6 +8,11 @@ cdef int LIBXSLT_VERSION cdef extern from "libxslt/xsltInternals.h": + ctypedef enum xsltTransformState: + XSLT_STATE_OK # 0 + XSLT_STATE_ERROR # 1 + XSLT_STATE_STOPPED # 2 + ctypedef struct xsltDocument: xmlDoc* doc @@ -25,6 +30,7 @@ xmlNode* node xmlDoc* output xmlNode* insert + xsltTransformState state ctypedef struct xsltStackElem @@ -32,6 +38,11 @@ cdef void xsltFreeStylesheet(xsltStylesheet* sheet) nogil cdef extern from "libxslt/extensions.h": + ctypedef void (*xsltTransformFunction)(xsltTransformContext* ctxt, + xmlNode* context_node, + xmlNode* inst, + void* precomp_unused) + cdef int xsltRegisterExtFunction(xsltTransformContext* ctxt, char* name, char* URI, @@ -43,6 +54,9 @@ char* name, char* URI) nogil cdef int xsltRegisterExtPrefix(xsltStylesheet* style, char* prefix, char* URI) nogil + cdef int xsltRegisterExtElement(xsltTransformContext* ctxt, + char* name, char* URI, + xsltTransformFunction function) nogil cdef extern from "libxslt/documents.h": ctypedef enum xsltLoadType: @@ -82,7 +96,9 @@ cdef void xsltSetTransformErrorFunc( xsltTransformContext*, void* ctxt, void (*handler)(void* ctxt, char* msg, ...)) nogil - + cdef void xsltTransformError(xsltTransformContext* ctxt, + xsltStylesheet* style, + xmlNode* node, char* msg, ...) cdef extern from "libxslt/security.h": ctypedef struct xsltSecurityPrefs ctypedef enum xsltSecurityOption: Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Sun Mar 2 09:31:28 2008 @@ -229,15 +229,34 @@ cdef class _XSLTContext(_BaseContext): cdef xslt.xsltTransformContext* _xsltCtxt + cdef object _extension_elements + cdef _ReadOnlyElementProxy _extension_element_proxy def __init__(self, namespaces, extensions, enable_regexp): self._xsltCtxt = NULL - if extensions is not None: - for ns, prefix in extensions: - if ns is None: + self._extension_elements = EMPTY_READ_ONLY_DICT + if extensions is not None and extensions: + for ns_name_tuple, extension in extensions.items(): + if ns_name_tuple[0] is None: raise XSLTExtensionError( "extensions must not have empty namespaces") + if isinstance(extension, XSLTExtension): + if self._extension_elements is EMPTY_READ_ONLY_DICT: + self._extension_elements = {} + extensions = python.PyDict_Copy(extensions) + ns_utf = _utf8(ns_name_tuple[0]) + name_utf = _utf8(ns_name_tuple[1]) + python.PyDict_SetItem( + self._extension_elements, (ns_utf, name_utf), + extension) + python.PyDict_DelItem(extensions, ns_name_tuple) _BaseContext.__init__(self, namespaces, extensions, enable_regexp) + cdef _BaseContext _copy(self): + cdef _XSLTContext context + context = <_XSLTContext>_BaseContext._copy(self) + context._extension_elements = self._extension_elements + return context + cdef register_context(self, xslt.xsltTransformContext* xsltCtxt, _Document doc): self._xsltCtxt = xsltCtxt @@ -245,6 +264,7 @@ self._register_context(doc) self.registerLocalFunctions(xsltCtxt, _register_xslt_function) self.registerGlobalFunctions(xsltCtxt, _register_xslt_function) + _registerXSLTExtensions(xsltCtxt, self._extension_elements) cdef free_context(self): self._cleanup_context() @@ -437,6 +457,11 @@ tree.xmlFreeDoc(c_result) resolver_context._raise_if_stored() + if context._exc._has_raised(): + if c_result is not NULL: + tree.xmlFreeDoc(c_result) + context._exc._raise_if_stored() + if c_result is NULL: # last error seems to be the most accurate here error = self._error_log.last_error Added: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- (empty file) +++ lxml/trunk/src/lxml/xsltext.pxi Sun Mar 2 09:31:28 2008 @@ -0,0 +1,111 @@ +# XSLT extension elements + +cdef class XSLTExtension: + """Base class of an XSLT extension element. + """ + def execute(self, context, self_node, input_node, output_parent): + """execute(self, context, self_node, input_node, output_parent) + Execute this extension element. + + Subclasses may append elements to the `output_parent` element + here, or set its text content. To this end, the `input_node` + provides read-only access to the current node in the input + document, and the `self_node` points to the extension element + in the stylesheet. + """ + pass + + def apply_templates(self, _XSLTContext context not None, node): + """apply_templates(self, context, node) + + Call this method to continue applying templates to the input + document. Starts at the + + The return value is a list of elements that were generated. + """ + cdef xmlNode* c_parent + cdef xmlNode* c_node + cdef xmlNode* c_next + cdef xmlNode* c_context_node + cdef _ReadOnlyElementProxy proxy + c_context_node = _roNodeOf(node) + #assert c_context_node.doc is context._xsltContext.node.doc, \ + # "switching input documents during transformation is not currently supported" + + c_parent = tree.xmlNewDocNode( + context._xsltCtxt.output, NULL, "fake-parent", NULL) + + c_node = context._xsltCtxt.insert + context._xsltCtxt.insert = c_parent + xslt.xsltProcessOneNode( + context._xsltCtxt, c_context_node, NULL) + context._xsltCtxt.insert = c_node + + results = [] + c_node = c_parent.children + try: + while c_node is not NULL: + c_next = c_node.next + tree.xmlUnlinkNode(c_node) + proxy = _newReadOnlyProxy( + context._extension_element_proxy, c_node) + proxy.free_after_use() + python.PyList_Append(results, proxy) + c_node = c_next + finally: + tree.xmlFreeNode(c_parent) + return results + + +cdef _registerXSLTExtensions(xslt.xsltTransformContext* c_ctxt, + extension_dict): + for ns, name in extension_dict: + xslt.xsltRegisterExtElement( + c_ctxt, _cstr(name), _cstr(ns), _callExtensionElement) + +cdef void _callExtensionElement(xslt.xsltTransformContext* c_ctxt, + xmlNode* c_context_node, + xmlNode* c_inst_node, + void* dummy) with gil: + cdef _XSLTContext context + cdef XSLTExtension extension + cdef python.PyObject* dict_result + cdef char* c_uri + cdef _ReadOnlyElementProxy context_node, self_node, output_parent + c_uri = _getNs(c_inst_node) + if c_uri is NULL: + # not allowed, and should never happen + return + if c_ctxt.xpathCtxt.userData is NULL: + # just for safety, should never happen + return + context = <_XSLTContext>c_ctxt.xpathCtxt.userData + try: + dict_result = python.PyDict_GetItem( + context._extension_elements, (c_uri, c_inst_node.name)) + if dict_result is NULL: + raise KeyError("extension element %s not found", + c_inst_node.name) + extension = dict_result + + try: + self_node = _newReadOnlyProxy(None, c_inst_node) + context_node = _newReadOnlyProxy(self_node, c_context_node) + output_parent = _newAppendOnlyProxy(self_node, c_ctxt.insert) + + context._extension_element_proxy = self_node + extension.execute(context, self_node, context_node, output_parent) + finally: + context._extension_element_proxy = None + if self_node is not None: + _freeReadOnlyProxies(self_node) + except Exception, e: + message = "Error executing extension element '%s': %s" % ( + c_inst_node.name, e) + xslt.xsltTransformError(c_ctxt, NULL, c_inst_node, message) + context._exc._store_raised() + except: + # just in case + message = "Error executing extension element '%s'" % c_inst_node.name + xslt.xsltTransformError(c_ctxt, NULL, c_inst_node, message) + context._exc._store_raised() From scoder at codespeak.net Sun Mar 2 09:31:32 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:32 +0100 (CET) Subject: [Lxml-checkins] r52027 - in lxml/trunk: . doc Message-ID: <20080302083132.EE45016853E@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:32 2008 New Revision: 52027 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/extensions.txt Log: r3666 at delle: sbehnel | 2008-03-02 08:56:20 +0100 r3656 at delle: sbehnel | 2008-03-02 07:52:50 +0100 reverted doc changes Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Sun Mar 2 09:31:32 2008 @@ -1,20 +1,11 @@ -Python extensions for XPath and XSLT -==================================== +Extension functions for XPath and XSLT +====================================== This document describes how to use Python extension functions in XPath and XSLT like this:: -It also describes how to use Python extension elements in XSLT like -this:: - - - - - - - Here is how an extension function looks like. As the first argument, it always receives a context object (see below). The other arguments are provided by the respective call in the XPath expression, one in @@ -27,14 +18,6 @@ >>> def loadsofargs(dummy, *args): ... return "Got %d arguments." % len(args) -And here is how an extension element looks like:: - - >>> from lxml import etree - >>> class MyExtElement(etree.XSLTExtension): - ... def execute(self, context, self_node, input_node, output_parent): - ... # just copy own content input to output - ... output_parent.extend( list(self_node) ) - .. contents:: .. @@ -54,6 +37,7 @@ FunctionNamespace class. For simplicity, we choose the empty namespace (None):: + >>> from lxml import etree >>> ns = etree.FunctionNamespace(None) >>> ns['hello'] = hello >>> ns['countargs'] = loadsofargs From scoder at codespeak.net Sun Mar 2 09:31:36 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:36 +0100 (CET) Subject: [Lxml-checkins] r52028 - in lxml/trunk: . src/lxml Message-ID: <20080302083136.6577216853F@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:36 2008 New Revision: 52028 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xsltext.pxi Log: r3667 at delle: sbehnel | 2008-03-02 08:56:21 +0100 r3657 at delle: sbehnel | 2008-03-02 07:53:17 +0100 support text nodes as XSLT result of apply_templates() Modified: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- lxml/trunk/src/lxml/xsltext.pxi (original) +++ lxml/trunk/src/lxml/xsltext.pxi Sun Mar 2 09:31:36 2008 @@ -47,10 +47,16 @@ while c_node is not NULL: c_next = c_node.next tree.xmlUnlinkNode(c_node) - proxy = _newReadOnlyProxy( - context._extension_element_proxy, c_node) - proxy.free_after_use() - python.PyList_Append(results, proxy) + if c_node.type == tree.XML_TEXT_NODE: + python.PyList_Append(results, _collectText(c_node)) + elif c_node.type == tree.XML_ELEMENT_NODE: + proxy = _newReadOnlyProxy( + context._extension_element_proxy, c_node) + proxy.free_after_use() + python.PyList_Append(results, proxy) + else: + raise TypeError("unsupported XSLT result type: %d" % + c_node.type) c_node = c_next finally: tree.xmlFreeNode(c_parent) From scoder at codespeak.net Sun Mar 2 09:31:40 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:40 +0100 (CET) Subject: [Lxml-checkins] r52029 - in lxml/trunk: . src/lxml/tests Message-ID: <20080302083140.AE6FC168538@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:40 2008 New Revision: 52029 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_xslt.py Log: r3668 at delle: sbehnel | 2008-03-02 08:56:21 +0100 r3658 at delle: sbehnel | 2008-03-02 07:55:07 +0100 test case for apply_templates() Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Sun Mar 2 09:31:40 2008 @@ -652,6 +652,38 @@ self.assertEquals(self._rootstring(result), 'Y') + def test_extension_element_apply_templates(self): + tree = self.parse('B') + style = self.parse('''\ + + + XY + + + XYZ +''') + + class MyExt(etree.XSLTExtension): + def execute(self, context, self_node, input_node, output_parent): + for child in self_node: + for result in self.apply_templates(context, child): + if isinstance(result, basestring): + el = etree.Element("T") + el.text = result + else: + el = result + output_parent.append(el) + + extensions = { ('testns', 'myext') : MyExt() } + + result = tree.xslt(style, extensions=extensions) + self.assertEquals(self._rootstring(result), + 'YXYZ') + def test_extension_element_raise(self): tree = self.parse('B') style = self.parse('''\ From scoder at codespeak.net Sun Mar 2 09:31:44 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:44 +0100 (CET) Subject: [Lxml-checkins] r52030 - in lxml/trunk: . src/lxml/tests Message-ID: <20080302083144.B1A4B168540@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:44 2008 New Revision: 52030 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_xslt.py Log: r3669 at delle: sbehnel | 2008-03-02 08:56:21 +0100 r3659 at delle: sbehnel | 2008-03-02 08:40:19 +0100 cleanup Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Sun Mar 2 09:31:44 2008 @@ -635,8 +635,7 @@ + extension-element-prefixes="myns"> XY @@ -658,8 +657,7 @@ + extension-element-prefixes="myns"> XY From scoder at codespeak.net Sun Mar 2 09:31:48 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:48 +0100 (CET) Subject: [Lxml-checkins] r52031 - in lxml/trunk: . src/lxml Message-ID: <20080302083148.5B325168540@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:48 2008 New Revision: 52031 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xsltext.pxi Log: r3670 at delle: sbehnel | 2008-03-02 08:56:21 +0100 r3660 at delle: sbehnel | 2008-03-02 08:40:41 +0100 docstring fix Modified: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- lxml/trunk/src/lxml/xsltext.pxi (original) +++ lxml/trunk/src/lxml/xsltext.pxi Sun Mar 2 09:31:48 2008 @@ -18,10 +18,11 @@ def apply_templates(self, _XSLTContext context not None, node): """apply_templates(self, context, node) - Call this method to continue applying templates to the input - document. Starts at the + Call this method to retrieve the result of applying templates + to an element. - The return value is a list of elements that were generated. + The return value is a list of elements or text strings that + were generated by the XSLT processor. """ cdef xmlNode* c_parent cdef xmlNode* c_node From scoder at codespeak.net Sun Mar 2 09:31:52 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:52 +0100 (CET) Subject: [Lxml-checkins] r52032 - in lxml/trunk: . src/lxml Message-ID: <20080302083152.3D3A4168540@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:51 2008 New Revision: 52032 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/readonlytree.pxi Log: r3671 at delle: sbehnel | 2008-03-02 08:56:22 +0100 r3661 at delle: sbehnel | 2008-03-02 08:41:06 +0100 support for deep copying read-only Elements Modified: lxml/trunk/src/lxml/readonlytree.pxi ============================================================================== --- lxml/trunk/src/lxml/readonlytree.pxi (original) +++ lxml/trunk/src/lxml/readonlytree.pxi Sun Mar 2 09:31:51 2008 @@ -119,6 +119,28 @@ c_node = _findChildBackwards(self._c_node, 0) return c_node != NULL + def __deepcopy__(self, memo): + "__deepcopy__(self, memo)" + return self.__copy__() + + def __copy__(self): + "__copy__(self)" + cdef xmlDoc* c_doc + cdef xmlNode* c_node + cdef _Document new_doc + c_doc = _copyDocRoot(self._c_node.doc, self._c_node) # recursive + new_doc = _documentFactory(c_doc, None) + root = new_doc.getroot() + if root is not None: + return root + # Comment/PI + c_node = c_doc.children + while c_node is not NULL and c_node.type != self._c_node.type: + c_node = c_node.next + if c_node is NULL: + return None + return _elementFactory(new_doc, c_node) + def __iter__(self): return iter(self.getchildren()) From scoder at codespeak.net Sun Mar 2 09:31:56 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:31:56 +0100 (CET) Subject: [Lxml-checkins] r52033 - in lxml/trunk: . doc/html Message-ID: <20080302083156.254A3168541@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:55 2008 New Revision: 52033 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/html/style.css Log: r3672 at delle: sbehnel | 2008-03-02 08:56:22 +0100 r3662 at delle: sbehnel | 2008-03-02 08:55:38 +0100 web site styling Modified: lxml/trunk/doc/html/style.css ============================================================================== --- lxml/trunk/doc/html/style.css (original) +++ lxml/trunk/doc/html/style.css Sun Mar 2 09:31:55 2008 @@ -190,6 +190,16 @@ background-color: transparent; } +dt { + line-height: 1.5em; + margin-left: 1em; + content: "\00BB" " "; +} + +dt:before { + content: "\00BB" " "; +} + ul { line-height: 1.5em; margin-left: 1em; From scoder at codespeak.net Sun Mar 2 09:32:00 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:32:00 +0100 (CET) Subject: [Lxml-checkins] r52034 - in lxml/trunk: . doc Message-ID: <20080302083200.2C89E168541@codespeak.net> Author: scoder Date: Sun Mar 2 09:31:59 2008 New Revision: 52034 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/xpathxslt.txt Log: r3673 at delle: sbehnel | 2008-03-02 08:56:22 +0100 r3663 at delle: sbehnel | 2008-03-02 08:56:13 +0100 doc section on XSLT extension elements Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Sun Mar 2 09:31:59 2008 @@ -458,8 +458,127 @@ ------------------ Just like `custom extension functions`_, lxml supports custom -extension *elements*. +extension *elements* in XSLT. This means, you can write XSLT code +like this:: + + + + + + +And then you can implement the element in Python like this:: + + >>> class MyExtElement(etree.XSLTExtension): + ... def execute(self, context, self_node, input_node, output_parent): + ... print "Hello from XSLT!" + ... output_parent.text = "I did it!" + ... # just copy own content input to output + ... output_parent.extend( list(self_node) ) + +The arguments passed to this function are + +context + The opaque evaluation context. You need this when calling back + into the XSLT processor. + +self_node + A read-only Element object that represents the extension element + in the stylesheet. + +input_node + The current context Element in the input document (also read-only). + +output_parent + The current insertion point in the output document. You can + append elements or set the text value (not the tail). Apart from + that, the Element is read-only. + +In XSLT, extension elements can be used like any other XSLT element, +except that they must be declared as extensions using the standard +XSLT ``extension-element-prefixes`` option:: + + >>> xslt_ext_tree = etree.XML(''' + ... + ... + ... XYZ + ... + ... + ... --xyz-- + ... + ... ''') + +To register the extension, add its name and namespace to the extension +mapping of the XSLT object:: + + >>> my_extension = MyExtElement() + >>> extensions = { ('testns', 'ext') : my_extension } + >>> transform = etree.XSLT(xslt_ext_tree, extensions = extensions) + +Note how we pass an instance here, not the class of the extension. +Now we can run the transformation and see how our extension is +called:: + + >>> root = etree.XML('') + >>> result = transform(root) + Hello from XSLT! + >>> str(result) + '\nI did it!XYZ\n' + +XSLT extensions are a very powerful feature that allows you to +interact directly with the XSLT processor. You have full access to +the input document and the stylesheet, and you can even call back into +the XSLT processor to process templates. Here is an example that +passes an Element into the ``.apply_templates()`` method of the +``XSLTExtension`` instance:: + + >>> class MyExtElement(etree.XSLTExtension): + ... def execute(self, context, self_node, input_node, output_parent): + ... child = self_node[0] + ... results = self.apply_templates(context, child) + ... output_parent.append(results[0]) + + >>> my_extension = MyExtElement() + >>> extensions = { ('testns', 'ext') : my_extension } + >>> transform = etree.XSLT(xslt_ext_tree, extensions = extensions) + + >>> root = etree.XML('') + >>> result = transform(root) + >>> str(result) + '\n--xyz--\n' + +Note how we applied the templates to a child of the extension element +itself, i.e. to an element inside the stylesheet instead of an element +of the input document. + +There is one important thing to keep in mind: all Elements that the +``execute()`` method gets to deal with are read-only Elements, so you +cannot modify them. They also will not easily work in the API. For +example, you cannot pass them to the ``tostring()`` function or wrap +them in an ``ElementTree``. + +What you can do, however, is to deepcopy them to make them normal +Elements, and then modify them using the normal etree API. So this +will work:: + + >>> from copy import deepcopy + >>> class MyExtElement(etree.XSLTExtension): + ... def execute(self, context, self_node, input_node, output_parent): + ... child = deepcopy(self_node[0]) + ... child.text = "NEW TEXT" + ... output_parent.append(child) + + >>> my_extension = MyExtElement() + >>> extensions = { ('testns', 'ext') : my_extension } + >>> transform = etree.XSLT(xslt_ext_tree, extensions = extensions) + + >>> root = etree.XML('') + >>> result = transform(root) + >>> str(result) + '\nNEW TEXT\n' The ``xslt()`` tree method From scoder at codespeak.net Sun Mar 2 09:32:04 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:32:04 +0100 (CET) Subject: [Lxml-checkins] r52035 - lxml/trunk Message-ID: <20080302083204.3479C168541@codespeak.net> Author: scoder Date: Sun Mar 2 09:32:03 2008 New Revision: 52035 Modified: lxml/trunk/ (props changed) lxml/trunk/TODO.txt Log: r3675 at delle: sbehnel | 2008-03-02 09:22:29 +0100 cleanup Modified: lxml/trunk/TODO.txt ============================================================================== --- lxml/trunk/TODO.txt (original) +++ lxml/trunk/TODO.txt Sun Mar 2 09:32:03 2008 @@ -45,22 +45,6 @@ by libxml2 (patch exists) -XSLT extension elements ------------------------ - -* implementation: one base class that represents the result parent - - - .append(), .extend() and .text will add to the result tree (no .tail) - - - difference: Elements should be copied, not moved? (will break - later changes, but this just means that Elements in the result - tree are immutable, including those that were added) - - - how to make input tree read-only? maybe just document? - - - docs: "once in the result tree, Elements must no longer be changed"? - - lxml 2.0 ======== From scoder at codespeak.net Sun Mar 2 09:32:08 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:32:08 +0100 (CET) Subject: [Lxml-checkins] r52036 - lxml/trunk Message-ID: <20080302083208.541C7168539@codespeak.net> Author: scoder Date: Sun Mar 2 09:32:07 2008 New Revision: 52036 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3676 at delle: sbehnel | 2008-03-02 09:23:16 +0100 mark extension elements experimental Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Mar 2 09:32:07 2008 @@ -8,7 +8,7 @@ Features added -------------- -* Extension elements for XSLT +* Extension elements for XSLT (experimental!) * ``Element.base`` property returns the xml:base or HTML base URL of an Element. From scoder at codespeak.net Sun Mar 2 09:32:59 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:32:59 +0100 (CET) Subject: [Lxml-checkins] r52037 - lxml/trunk Message-ID: <20080302083259.65249168538@codespeak.net> Author: scoder Date: Sun Mar 2 09:32:59 2008 New Revision: 52037 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/version.txt Log: r3690 at delle: sbehnel | 2008-03-02 09:32:14 +0100 make next trunk version 2.1 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Mar 2 09:32:59 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -2.0.3 (Under development) -========================= +2.1alpha1 (Under development) +============================= Features added -------------- Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Sun Mar 2 09:32:59 2008 @@ -1 +1 @@ -2.0.2 +2.1alpha1 From scoder at codespeak.net Sun Mar 2 09:40:30 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 2 Mar 2008 09:40:30 +0100 (CET) Subject: [Lxml-checkins] r52038 - in lxml/branch/lxml-2.0: doc src/lxml src/lxml/html Message-ID: <20080302084030.3F5F6168538@codespeak.net> Author: scoder Date: Sun Mar 2 09:40:29 2008 New Revision: 52038 Added: lxml/branch/lxml-2.0/src/lxml/saxparser.pxi - copied unchanged from r51849, lxml/trunk/src/lxml/saxparser.pxi Modified: lxml/branch/lxml-2.0/doc/lxml-source-howto.txt lxml/branch/lxml-2.0/src/lxml/html/__init__.py lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx lxml/branch/lxml-2.0/src/lxml/parser.pxi Log: trunk merge Modified: lxml/branch/lxml-2.0/doc/lxml-source-howto.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/lxml-source-howto.txt (original) +++ lxml/branch/lxml-2.0/doc/lxml-source-howto.txt Sun Mar 2 09:40:29 2008 @@ -114,6 +114,9 @@ ... element = _elementFactory(doc, c_node) +A good place to see how this factory is used are the Element methods +``getparent()``, ``getnext()`` and ``getprevious()``. + The documentation ----------------- @@ -216,12 +219,15 @@ modules at the C level. For example, ``lxml.objectify`` makes use of these. See the `C-level API` documentation. +saxparser.pxi + SAX-like parser interfaces as known from ElementTree's TreeBuilder. + serializer.pxi XML output functions. Basically everything that creates byte sequences from XML trees. xinclude.pxi - XInclude implementation. + XInclude support. xmlerror.pxi Error log handling. All error messages that libxml2 generates Modified: lxml/branch/lxml-2.0/src/lxml/html/__init__.py ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/html/__init__.py (original) +++ lxml/branch/lxml-2.0/src/lxml/html/__init__.py Sun Mar 2 09:40:29 2008 @@ -713,11 +713,11 @@ You can use this like:: - >>> form = doc.forms[0] # doctest: +SKIP - >>> form.inputs['foo'].value = 'bar' # etc # doctest: +SKIP - >>> response = form.submit() # doctest: +SKIP - >>> doc = parse(response) # doctest: +SKIP - >>> doc.make_links_absolute(response.geturl()) # doctest: +SKIP + form = doc.forms[0] + form.inputs['foo'].value = 'bar' # etc + response = form.submit() + doc = parse(response) + doc.make_links_absolute(response.geturl()) To change the HTTP requester, pass a function as ``open_http`` keyword argument that opens the URL for you. The function must have the following Modified: lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx (original) +++ lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx Sun Mar 2 09:40:29 2008 @@ -1110,8 +1110,7 @@ c_node = _parentElement(self._c_node) if c_node is NULL: return None - else: - return _elementFactory(self._doc, c_node) + return _elementFactory(self._doc, c_node) def getnext(self): """getnext(self) @@ -1120,9 +1119,9 @@ """ cdef xmlNode* c_node c_node = _nextElement(self._c_node) - if c_node is not NULL: - return _elementFactory(self._doc, c_node) - return None + if c_node is NULL: + return None + return _elementFactory(self._doc, c_node) def getprevious(self): """getprevious(self) @@ -1131,9 +1130,9 @@ """ cdef xmlNode* c_node c_node = _previousElement(self._c_node) - if c_node is not NULL: - return _elementFactory(self._doc, c_node) - return None + if c_node is NULL: + return None + return _elementFactory(self._doc, c_node) def itersiblings(self, tag=None, *, preceding=False): """itersiblings(self, tag=None, preceding=False) @@ -2534,6 +2533,7 @@ include "nsclasses.pxi" # Namespace implementation and registry include "docloader.pxi" # Support for custom document loaders include "parser.pxi" # XML Parser +include "saxparser.pxi" # SAX-like Parser interface and tree builder include "parsertarget.pxi" # ET Parser target include "serializer.pxi" # XML output functions include "iterparse.pxi" # incremental XML parsing Modified: lxml/branch/lxml-2.0/src/lxml/parser.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/parser.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/parser.pxi Sun Mar 2 09:40:29 2008 @@ -1011,441 +1011,6 @@ return htmlparser.htmlParseChunk(c_ctxt, c_data, buffer_len, 0) return 0 - -############################################################ -## SAX event handler -############################################################ - -ctypedef enum _SaxParserEvents: - SAX_EVENT_START = 1 - SAX_EVENT_END = 2 - SAX_EVENT_DATA = 4 - SAX_EVENT_DOCTYPE = 8 - SAX_EVENT_PI = 16 - SAX_EVENT_COMMENT = 32 - -cdef class _SaxParserTarget: - cdef int _sax_event_filter - cdef int _sax_event_propagate - cdef _handleSaxStart(self, tag, attrib, nsmap): - return None - cdef _handleSaxEnd(self, tag): - return None - cdef int _handleSaxData(self, data) except -1: - return 0 - cdef int _handleSaxDoctype(self, root_tag, public_id, system_id) except -1: - return 0 - cdef _handleSaxPi(self, target, data): - return None - cdef _handleSaxComment(self, comment): - return None - -cdef class _SaxParserContext(_ParserContext): - """This class maps SAX2 events to method calls. - """ - cdef _SaxParserTarget _target - cdef xmlparser.startElementNsSAX2Func _origSaxStart - cdef xmlparser.endElementNsSAX2Func _origSaxEnd - cdef xmlparser.startElementSAXFunc _origSaxStartNoNs - cdef xmlparser.endElementSAXFunc _origSaxEndNoNs - cdef xmlparser.charactersSAXFunc _origSaxData - cdef xmlparser.internalSubsetSAXFunc _origSaxDoctype - cdef xmlparser.commentSAXFunc _origSaxComment - cdef xmlparser.processingInstructionSAXFunc _origSaxPi - - cdef void _setSaxParserTarget(self, _SaxParserTarget target): - self._target = target - - cdef void _initParserContext(self, xmlparser.xmlParserCtxt* c_ctxt): - "wrap original SAX2 callbacks" - cdef xmlparser.xmlSAXHandler* sax - _ParserContext._initParserContext(self, c_ctxt) - sax = c_ctxt.sax - if self._target._sax_event_propagate & SAX_EVENT_START: - # propagate => keep orig callback - self._origSaxStart = sax.startElementNs - self._origSaxStartNoNs = sax.startElement - else: - # otherwise: never call orig callback - self._origSaxStart = sax.startElementNs = NULL - self._origSaxStartNoNs = sax.startElement = NULL - if self._target._sax_event_filter & SAX_EVENT_START: - # intercept => overwrite orig callback - if sax.initialized == xmlparser.XML_SAX2_MAGIC: - sax.startElementNs = _handleSaxStart - sax.startElement = _handleSaxStartNoNs - - if self._target._sax_event_propagate & SAX_EVENT_END: - self._origSaxEnd = sax.endElementNs - self._origSaxEndNoNs = sax.endElement - else: - self._origSaxEnd = sax.endElementNs = NULL - self._origSaxEndNoNs = sax.endElement = NULL - if self._target._sax_event_filter & SAX_EVENT_END: - if sax.initialized == xmlparser.XML_SAX2_MAGIC: - sax.endElementNs = _handleSaxEnd - sax.endElement = _handleSaxEndNoNs - - if self._target._sax_event_propagate & SAX_EVENT_DATA: - self._origSaxData = sax.characters - else: - self._origSaxData = sax.characters = NULL - if self._target._sax_event_filter & SAX_EVENT_DATA: - sax.characters = _handleSaxData - - if self._target._sax_event_propagate & SAX_EVENT_DOCTYPE: - self._origSaxDoctype = sax.internalSubset - else: - self._origSaxDoctype = sax.internalSubset = NULL - if self._target._sax_event_filter & SAX_EVENT_DOCTYPE: - sax.internalSubset = _handleSaxDoctype - - if self._target._sax_event_propagate & SAX_EVENT_PI: - self._origSaxPi = sax.processingInstruction - else: - self._origSaxPi = sax.processingInstruction = NULL - if self._target._sax_event_filter & SAX_EVENT_PI: - sax.processingInstruction = _handleSaxPI - - if self._target._sax_event_propagate & SAX_EVENT_COMMENT: - self._origSaxComment = sax.comment - else: - self._origSaxComment = sax.comment = NULL - if self._target._sax_event_filter & SAX_EVENT_COMMENT: - sax.comment = _handleSaxComment - - cdef void _handleSaxException(self, xmlparser.xmlParserCtxt* c_ctxt): - self._store_raised() - if c_ctxt.errNo == xmlerror.XML_ERR_OK: - c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR - c_ctxt.disableSAX = 1 - -cdef void _handleSaxStart(void* ctxt, char* c_localname, char* c_prefix, - char* c_namespace, int c_nb_namespaces, - char** c_namespaces, - int c_nb_attributes, int c_nb_defaulted, - char** c_attributes) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - cdef _Element element - cdef int i - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxStart is not NULL: - context._origSaxStart(c_ctxt, c_localname, c_prefix, c_namespace, - c_nb_namespaces, c_namespaces, c_nb_attributes, - c_nb_defaulted, c_attributes) - try: - tag = _namespacedNameFromNsName(c_namespace, c_localname) - if c_nb_defaulted > 0: - # only add default attributes if we asked for them - if c_ctxt.loadsubset & xmlparser.XML_COMPLETE_ATTRS == 0: - c_nb_attributes = c_nb_attributes - c_nb_defaulted - if c_nb_attributes == 0: - attrib = EMPTY_READ_ONLY_DICT - else: - attrib = {} - for i from 0 <= i < c_nb_attributes: - name = _namespacedNameFromNsName( - c_attributes[2], c_attributes[0]) - if c_attributes[3] is NULL: - value = "" - else: - value = python.PyUnicode_DecodeUTF8( - c_attributes[3], c_attributes[4] - c_attributes[3], - "strict") - python.PyDict_SetItem(attrib, name, value) - c_attributes = c_attributes + 5 - if c_nb_namespaces == 0: - nsmap = EMPTY_READ_ONLY_DICT - else: - nsmap = {} - for i from 0 <= i < c_nb_namespaces: - if c_namespaces[0] is NULL: - prefix = None - else: - prefix = funicode(c_namespaces[0]) - python.PyDict_SetItem( - nsmap, prefix, funicode(c_namespaces[1])) - c_namespaces = c_namespaces + 2 - element = context._target._handleSaxStart(tag, attrib, nsmap) - if element is not None and c_ctxt.input is not NULL: - if c_ctxt.input.line < 65535: - element._c_node.line = c_ctxt.input.line - else: - element._c_node.line = 65535 - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxStartNoNs(void* ctxt, char* c_name, - char** c_attributes) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - cdef _Element element - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxStartNoNs is not NULL: - context._origSaxStartNoNs(c_ctxt, c_name, c_attributes) - try: - tag = funicode(c_name) - if c_attributes is NULL: - attrib = EMPTY_READ_ONLY_DICT - else: - attrib = {} - while c_attributes[0] is not NULL: - name = funicode(c_attributes[0]) - if c_attributes[1] is NULL: - value = "" - else: - value = funicode(c_attributes[1]) - c_attributes = c_attributes + 2 - python.PyDict_SetItem(attrib, name, value) - element = context._target._handleSaxStart( - tag, attrib, EMPTY_READ_ONLY_DICT) - if element is not None and c_ctxt.input is not NULL: - if c_ctxt.input.line < 65535: - element._c_node.line = c_ctxt.input.line - else: - element._c_node.line = 65535 - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxEnd(void* ctxt, char* c_localname, char* c_prefix, - char* c_namespace) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxEnd is not NULL: - context._origSaxEnd(c_ctxt, c_localname, c_prefix, c_namespace) - try: - tag = _namespacedNameFromNsName(c_namespace, c_localname) - context._target._handleSaxEnd(tag) - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxEndNoNs(void* ctxt, char* c_name) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxEndNoNs is not NULL: - context._origSaxEndNoNs(c_ctxt, c_name) - try: - context._target._handleSaxEnd(funicode(c_name)) - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxData(void* ctxt, char* c_data, int data_len) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxData is not NULL: - context._origSaxData(c_ctxt, c_data, data_len) - try: - context._target._handleSaxData( - python.PyUnicode_DecodeUTF8(c_data, data_len, NULL)) - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxDoctype(void* ctxt, char* c_name, char* c_public, - char* c_system) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxDoctype is not NULL: - context._origSaxDoctype(c_ctxt, c_name, c_public, c_system) - try: - if c_public is not NULL: - public_id = funicode(c_public) - if c_system is not NULL: - system_id = funicode(c_system) - context._target._handleSaxDoctype( - funicode(c_name), public_id, system_id) - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxPI(void* ctxt, char* c_target, char* c_data) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxPi is not NULL: - context._origSaxPi(c_ctxt, c_target, c_data) - try: - if c_data is not NULL: - data = funicode(c_data) - context._target._handleSaxPi(funicode(c_target), data) - except: - context._handleSaxException(c_ctxt) - -cdef void _handleSaxComment(void* ctxt, char* c_data) with gil: - cdef _SaxParserContext context - cdef xmlparser.xmlParserCtxt* c_ctxt - c_ctxt = ctxt - if c_ctxt._private is NULL: - return - context = <_SaxParserContext>c_ctxt._private - if context._origSaxComment is not NULL: - context._origSaxComment(c_ctxt, c_data) - try: - context._target._handleSaxComment(funicode(c_data)) - except: - context._handleSaxException(c_ctxt) - - -############################################################ -## ET compatible XML tree builder -############################################################ - -cdef class TreeBuilder(_SaxParserTarget): - """TreeBuilder(self, element_factory=None, parser=None) - Parser target that builds a tree. - - The final tree is returned by the ``close()`` method. - """ - cdef _BaseParser _parser - cdef object _factory - cdef object _data - cdef object _element_stack - cdef object _element_stack_pop - cdef _Element _last - cdef bint _in_tail - - def __init__(self, *, element_factory=None, parser=None): - self._sax_event_filter = \ - SAX_EVENT_START | SAX_EVENT_END | SAX_EVENT_DATA | \ - SAX_EVENT_PI | SAX_EVENT_COMMENT - self._data = [] # data collector - self._element_stack = [] # element stack - self._element_stack_pop = self._element_stack.pop - self._last = None # last element - self._in_tail = 0 # true if we're after an end tag - self._factory = element_factory - self._parser = parser - - cdef int _flush(self) except -1: - if python.PyList_GET_SIZE(self._data) > 0: - if self._last is not None: - text = "".join(self._data) - if self._in_tail: - assert self._last.tail is None, "internal error (tail)" - self._last.tail = text - else: - assert self._last.text is None, "internal error (text)" - self._last.text = text - del self._data[:] - return 0 - - # Python level event handlers - - def close(self): - """close(self) - - Flushes the builder buffers, and returns the toplevel document - element. - """ - assert python.PyList_GET_SIZE(self._element_stack) == 0, "missing end tags" - assert self._last is not None, "missing toplevel element" - return self._last - - def data(self, data): - """data(self, data) - - Adds text to the current element. The value should be either an - 8-bit string containing ASCII text, or a Unicode string. - """ - self._handleSaxData(data) - - def start(self, tag, attrs, nsmap=None): - """start(self, tag, attrs, nsmap=None) - - Opens a new element. - """ - if nsmap is None: - nsmap = EMPTY_READ_ONLY_DICT - return self._handleSaxStart(tag, attrs, nsmap) - - def end(self, tag): - """end(self, tag) - - Closes the current element. - """ - element = self._handleSaxEnd(tag) - assert self._last.tag == tag,\ - "end tag mismatch (expected %s, got %s)" % ( - self._last.tag, tag) - return element - - def pi(self, target, data): - """pi(self, target, data) - """ - return self._handleSaxPi(target, data) - - def comment(self, comment): - """comment(self, comment) - """ - return self._handleSaxComment(comment) - - # internal SAX event handlers - - cdef _handleSaxStart(self, tag, attrib, nsmap): - self._flush() - if self._factory is not None: - self._last = self._factory(tag, attrib) - if python.PyList_GET_SIZE(self._element_stack) > 0: - _appendChild(self._element_stack[-1], self._last) - elif python.PyList_GET_SIZE(self._element_stack) > 0: - self._last = _makeSubElement( - self._element_stack[-1], tag, None, None, attrib, nsmap, None) - else: - self._last = _makeElement( - tag, NULL, None, self._parser, None, None, attrib, nsmap, None) - python.PyList_Append(self._element_stack, self._last) - self._in_tail = 0 - return self._last - - cdef _handleSaxEnd(self, tag): - self._flush() - self._last = self._element_stack_pop() - self._in_tail = 1 - return self._last - - cdef int _handleSaxData(self, data) except -1: - python.PyList_Append(self._data, data) - - cdef _handleSaxPi(self, target, data): - self._flush() - self._last = ProcessingInstruction(target, data) - if python.PyList_GET_SIZE(self._element_stack) > 0: - _appendChild(self._element_stack[-1], self._last) - self._in_tail = 1 - return self._last - - cdef _handleSaxComment(self, comment): - self._flush() - self._last = Comment(comment) - if python.PyList_GET_SIZE(self._element_stack) > 0: - _appendChild(self._element_stack[-1], self._last) - self._in_tail = 1 - return self._last - ############################################################ ## XML parser ############################################################ From scoder at codespeak.net Mon Mar 3 19:41:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:41:03 +0100 (CET) Subject: [Lxml-checkins] r52103 - in lxml/trunk: . src/lxml Message-ID: <20080303184103.BC9C8169ECF@codespeak.net> Author: scoder Date: Mon Mar 3 19:41:03 2008 New Revision: 52103 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/readonlytree.pxi Log: r3694 at delle: sbehnel | 2008-03-03 08:51:05 +0100 tag fix in read-only tree Modified: lxml/trunk/src/lxml/readonlytree.pxi ============================================================================== --- lxml/trunk/src/lxml/readonlytree.pxi (original) +++ lxml/trunk/src/lxml/readonlytree.pxi Mon Mar 3 19:41:03 2008 @@ -150,7 +150,7 @@ Iterate over the children of this element. """ children = self.getchildren() - if tag is not None: + if tag is not None and tag != '*': children = [ el for el in children if el.tag == tag ] if reversed: children = children[::-1] From scoder at codespeak.net Mon Mar 3 19:41:33 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:41:33 +0100 (CET) Subject: [Lxml-checkins] r52104 - in lxml/trunk: . doc Message-ID: <20080303184133.7EEF3169EB3@codespeak.net> Author: scoder Date: Mon Mar 3 19:41:32 2008 New Revision: 52104 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/build.txt lxml/trunk/doc/lxml-source-howto.txt lxml/trunk/doc/main.txt Log: r3695 at delle: sbehnel | 2008-03-03 08:51:17 +0100 doc updates Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Mon Mar 3 19:41:32 2008 @@ -58,11 +58,13 @@ svn co http://codespeak.net/svn/lxml/trunk lxml -This will create a directory ``lxml`` and download the source into it. You -can also `browse the repository through the web`_ or use your favourite SVN -client to access it. +This will create a directory ``lxml`` and download the source into it. +You can also browse the `Subversion repository`_ through the web, use +your favourite SVN client to access it, or browse the `Subversion +history`_. -.. _`browse the repository through the web`: http://codespeak.net/svn/lxml +.. _`Subversion repository`: http://codespeak.net/svn/lxml/ +.. _`Subversion history`: https://codespeak.net/viewvc/lxml/ Setuptools Modified: lxml/trunk/doc/lxml-source-howto.txt ============================================================================== --- lxml/trunk/doc/lxml-source-howto.txt (original) +++ lxml/trunk/doc/lxml-source-howto.txt Mon Mar 3 19:41:32 2008 @@ -153,8 +153,9 @@ lxml.etree ========== -The main module, ``lxml.etree``, is in the file **lxml.etree.pyx**. -It implements the main functions and types of the ElementTree API, as +The main module, ``lxml.etree``, is in the file `lxml.etree.pyx +`_. It +implements the main functions and types of the ElementTree API, as well as all the factory functions for proxies. It is the best place to start if you want to find out how a specific feature is implemented. @@ -219,6 +220,12 @@ modules at the C level. For example, ``lxml.objectify`` makes use of these. See the `C-level API` documentation. +readonlytree.pxi + A separate read-only implementation of the Element API. This is + used in places where non-intrusive access to a tree is required, + such as the ``PythonElementClassLookup`` or XSLT extension + elements. + saxparser.pxi SAX-like parser interfaces as known from ElementTree's TreeBuilder. @@ -295,15 +302,8 @@ A Cython implemented extension module that uses the public C-API of lxml.etree. It provides a Python object-like interface to XML trees. - - -lxml.pyclasslookup -================== - -A Cython implemented extension module that uses the public C-API of -lxml.etree. It provides a class lookup scheme that duplicates lxml's -ElementTree API in a very simple way to provide Python access to the -tree *before* instantiating the real Python proxies in lxml.etree. +The implementation resides in the file `lxml.objectify.pyx +`_. lxml.html Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Mon Mar 3 19:41:32 2008 @@ -159,13 +159,16 @@ svn co http://codespeak.net/svn/lxml/trunk lxml -You can also `browse it through the web`_. Please read `how to build lxml -from source`_ first. The `latest CHANGES`_ of the developer version are also -accessible. You can check there if a bug you found has been fixed or a -feature you want has been implemented in the latest trunk version. +You can also browse the `Subversion repository`_ through the web, or +take a look at the `Subversion history`_. Please read `how to build lxml +from source`_ first. The `latest CHANGES`_ of the developer version +are also accessible. You can check there if a bug you found has been +fixed or a feature you want has been implemented in the latest trunk +version. .. _`how to build lxml from source`: build.html -.. _`browse it through the web`: http://codespeak.net/svn/lxml +.. _`Subversion repository`: http://codespeak.net/svn/lxml/ +.. _`Subversion history`: https://codespeak.net/viewvc/lxml/ .. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt From scoder at codespeak.net Mon Mar 3 19:41:43 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:41:43 +0100 (CET) Subject: [Lxml-checkins] r52105 - in lxml/trunk: . doc Message-ID: <20080303184143.EF4E2169EAE@codespeak.net> Author: scoder Date: Mon Mar 3 19:41:43 2008 New Revision: 52105 Modified: lxml/trunk/ (props changed) lxml/trunk/INSTALL.txt lxml/trunk/doc/main.txt lxml/trunk/doc/mkhtml.py Log: r3696 at delle: sbehnel | 2008-03-03 10:17:04 +0100 doc cleanup and fixes Modified: lxml/trunk/INSTALL.txt ============================================================================== --- lxml/trunk/INSTALL.txt (original) +++ lxml/trunk/INSTALL.txt Mon Mar 3 19:41:43 2008 @@ -40,12 +40,12 @@ Building lxml from sources -------------------------- -If you want to build lxml from SVN you should read `how to build lxml from -source`_ (or the file ``build.txt`` in the ``doc`` directory of the source -tree). Building from Subversion sources or from modified distribution sources -requires Cython_ to translate the lxml sources into C code. The source -distribution ships with pre-generated C source files, so you do not need -Cython installed to build from release sources. +If you want to build lxml from SVN you should read `how to build lxml +from source`_ (or the file ``doc/build.txt`` in the source tree). +Building from Subversion sources or from modified distribution sources +requires Cython_ to translate the lxml sources into C code. The +source distribution ships with pre-generated C source files, so you do +not need Cython installed to build from release sources. .. _Cython: http://www.cython.org .. _`how to build lxml from source`: build.html @@ -60,10 +60,10 @@ MS Windows ---------- -For MS Windows, the `binary egg distribution of lxml`_ is statically built -against the libraries, i.e. it already includes them. There is no need to -install the external libraries if you use an official lxml build from -cheeseshop. +For MS Windows, the `binary egg distribution of lxml`_ is statically +built against the libraries, i.e. it already includes them. There is +no need to install the external libraries if you use an official lxml +build from PyPI. If you want to upgrade the libraries and/or compile lxml from sources, you should install a `binary distribution`_ of libxml2 and libxslt. You need both @@ -76,13 +76,17 @@ MacOS-X ------- -On MacOS-X 10.4, you can try to use the installed system libraries when you -build lxml yourself. However, the library versions on this system are older -than the required versions, so you may encounter certain differences in -behaviour or even crashes. A number of users reported success with updated -libraries (e.g. using fink_), but needed to set the environment variable +The system libraries of libxml2 and libxslt installed under MacOS-X +tend to be rather outdated. In any case, they are older than the +required versions for lxml 2.x, so you will have a hard time getting +lxml to work without installing newer libraries. + +A number of users reported success with updated libraries (e.g. using +fink_ or macports), but needed to set the runtime environment variable ``DYLD_LIBRARY_PATH`` to the directory where fink keeps the libraries. +See the `FAQ entry on MacOS-X`_ for more information. .. _fink: http://finkproject.org/ +.. _`FAQ entry on MacOS-X`: FAQ.html#my-application-crashes-on-macos-x -A MacPort of lxml is available. Try ``port install py25-lxml``. +A macport of lxml is available. Try ``port install py25-lxml``. Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Mon Mar 3 19:41:43 2008 @@ -140,19 +140,17 @@ The source distribution is signed with `this key`_. Binary builds for MS Windows usually become available through PyPI a few days after a source release. If you can't wait, consider trying a less recent -version first. - -.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ -.. _`this key`: pubkey.asc +release version first. The latest version is `lxml 2.0.2`_, released 2008-02-22 (`changes for 2.0.2`_). `Older versions`_ are listed below. -.. _`Older versions`: #old-versions - Please take a look at the `installation instructions`_! -.. _`installation instructions`: installation.html +This complete web site (including the generated API documentation) is +part of the source distribution, so if you want to download the +documentation for offline use, take the source archive and copy the +``doc/html`` directory out of the source tree. It's also possible to check out the latest development version of lxml from svn directly, using a command like this:: @@ -166,6 +164,10 @@ fixed or a feature you want has been implemented in the latest trunk version. +.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ +.. _`this key`: pubkey.asc +.. _`Older versions`: #old-versions +.. _`installation instructions`: installation.html .. _`how to build lxml from source`: build.html .. _`Subversion repository`: http://codespeak.net/svn/lxml/ .. _`Subversion history`: https://codespeak.net/viewvc/lxml/ Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Mon Mar 3 19:41:43 2008 @@ -3,8 +3,8 @@ import os, shutil, re, sys, copy, time SITE_STRUCTURE = [ - ('lxml', ('main.txt', 'intro.txt', 'lxml2.txt', 'FAQ.txt', - 'compatibility.txt', 'performance.txt')), + ('lxml', ('main.txt', 'intro.txt', '../INSTALL.txt', 'lxml2.txt', + 'FAQ.txt', 'compatibility.txt', 'performance.txt')), ('Developing with lxml', ('tutorial.txt', '@API reference', 'api.txt', 'parsing.txt', 'validation.txt', 'xpathxslt.txt', @@ -12,7 +12,8 @@ 'cssselect.txt', 'elementsoup.txt')), ('Extending lxml', ('resolvers.txt', 'extensions.txt', 'element_classes.txt', 'sax.txt', 'capi.txt')), - ('Developing lxml', ('build.txt', 'lxml-source-howto.txt')), + ('Developing lxml', ('build.txt', 'lxml-source-howto.txt', + '@Release Changelog')), ] RST2HTML_OPTIONS = " ".join([ @@ -26,6 +27,11 @@ "API reference" : "api/index.html" } +BASENAME_MAP = { + 'main' : 'index', + 'INSTALL' : 'installation', +} + htmlnsmap = {"h" : "http://www.w3.org/1999/xhtml"} find_title = XPath("/h:html/h:head/h:title/text()", namespaces=htmlnsmap) @@ -51,7 +57,7 @@ if page_title: page_title = page_title[0] else: - page_title = replace_invalid(' ', basename.capitalize()) + page_title = replace_invalid('', basename.capitalize()) build_menu_entry(page_title, basename+".html", section_head, headings=find_headings(tree)) @@ -78,7 +84,7 @@ tag = el.tag if tag[0] != '{': el.tag = "{http://www.w3.org/1999/xhtml}" + tag - current_menu = find_menu(menu_root, name=name) + current_menu = find_menu(menu_root, name=replace_invalid('', name)) if current_menu: for submenu in current_menu: submenu.set("class", submenu.get("class", ""). @@ -102,6 +108,10 @@ shutil.copy(pubkey, dirname) + href_map = HREF_MAP.copy() + changelog_basename = 'changes-%s' % release + href_map['Release Changelog'] = changelog_basename + '.html' + trees = {} menu = Element("div", {"class":"sidemenu"}) # build HTML pages and parse them back @@ -111,13 +121,12 @@ if filename.startswith('@'): # special menu entry page_title = filename[1:] - url = HREF_MAP[page_title] + url = href_map[page_title] build_menu_entry(page_title, url, section_head) else: path = os.path.join(doc_dir, filename) - basename = os.path.splitext(filename)[0] - if basename == 'main': - basename = 'index' + basename = os.path.splitext(os.path.basename(filename))[0] + basename = BASENAME_MAP.get(basename, basename) outname = basename + '.html' outpath = os.path.join(dirname, outname) @@ -128,20 +137,16 @@ build_menu(tree, basename, section_head) - # integrate menu - for tree, basename, outpath in trees.itervalues(): - new_tree = merge_menu(tree, menu, basename) - new_tree.write(outpath) - # also convert INSTALL.txt and CHANGES.txt rest2html(script, - os.path.join(lxml_path, 'INSTALL.txt'), - os.path.join(dirname, 'installation.html'), - stylesheet_url) - rest2html(script, os.path.join(lxml_path, 'CHANGES.txt'), os.path.join(dirname, 'changes-%s.html' % release), stylesheet_url) + # integrate menu + for tree, basename, outpath in trees.itervalues(): + new_tree = merge_menu(tree, menu, basename) + new_tree.write(outpath) + if __name__ == '__main__': publish(sys.argv[1], sys.argv[2], sys.argv[3]) From scoder at codespeak.net Mon Mar 3 19:41:50 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:41:50 +0100 (CET) Subject: [Lxml-checkins] r52106 - in lxml/trunk: . doc Message-ID: <20080303184150.F15D7169EB3@codespeak.net> Author: scoder Date: Mon Mar 3 19:41:50 2008 New Revision: 52106 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt lxml/trunk/doc/mkhtml.py Log: r3697 at delle: sbehnel | 2008-03-03 10:52:47 +0100 FAQ fixes Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Mon Mar 3 19:41:50 2008 @@ -63,16 +63,17 @@ important concepts in ``lxml.etree``. If you want to help out, improving the tutorial is a very good place to start. -There is also a `tutorial for ElementTree`_ which works for ``lxml.etree``. -The `API documentation`_ also contains many examples for ``lxml.etree``. To -learn using ``lxml.objectify``, read the `objectify documentation`_. +There is also a `tutorial for ElementTree`_ which works for +``lxml.etree``. The documentation of the `extended etree API`_ also +contains many examples for ``lxml.etree``. To learn using +``lxml.objectify``, read the `objectify documentation`_. John Shipman has written another tutorial called `Python XML processing with lxml`_ that contains lots of examples. .. _`lxml.etree Tutorial`: tutorial.html .. _`tutorial for ElementTree`: http://effbot.org/zone/element.htm -.. _`API documentation`: api.html +.. _`extended etree API`: api.html .. _`objectify documentation`: objectify.html .. _`Python XML processing with lxml`: http://www.nmt.edu/tcc/help/pubs/pylxml/ @@ -80,33 +81,56 @@ Where can I find more documentation about lxml? ----------------------------------------------- -There is a lot of documentation as lxml implements the well-known `ElementTree -API`_ and tries to follow its documentation as closely as possible. There are -a couple of issues where lxml cannot keep up compatibility. They are -described in the compatibility_ documentation. The lxml specific extensions -to the API are described by individual files in the ``doc`` directory of the -distribution and on `the web page`_. +There is a lot of documentation on the web and also in the Python +standard library documentation, as lxml implements the well-known +`ElementTree API`_ and tries to follow its documentation as closely as +possible. There are a couple of issues where lxml cannot keep up +compatibility. They are described in the compatibility_ +documentation. + +The lxml specific extensions to the API are described by individual +files in the ``doc`` directory of the source distribution and on `the +web page`_. + +The `generated API documentation`_ is a comprehensive API reference +for the lxml package. .. _`ElementTree API`: http://effbot.org/zone/element-index.htm .. _`the web page`: http://codespeak.net/lxml/#documentation +.. _`generated API documentation`: api/index.html What standards does lxml implement? ----------------------------------- The compliance to XML Standards depends on the support in libxml2 and libxslt. -Here is a quote from `http://xmlsoft.org/`: +Here is a quote from `http://xmlsoft.org/ `_: In most cases libxml2 tries to implement the specifications in a relatively strictly compliant way. As of release 2.4.16, libxml2 passed all 1800+ tests from the OASIS XML Tests Suite. -lxml currently supports libxml2 2.6.20 or later, which has even better support -for various XML standards. Some of the more important ones are: HTML, XML -namespaces, XPath, XInclude, XSLT, XML catalogs, canonical XML, RelaxNG, -XML:ID. Support for XML Schema and especially Schematron is currently -incomplete in libxml2, but is definitely usable and actively being worked on. -libxml2 also supports loading documents through HTTP and FTP. +lxml currently supports libxml2 2.6.20 or later, which has even better +support for various XML standards. The important ones are: + +* XML 1.0 +* HTML 4 +* XML namespaces +* XML Schema 1.0 +* XPath 1.0 +* XInclude 1.0 +* XSLT 1.0 +* EXSLT +* XML catalogs +* canonical XML +* RelaxNG +* xml:id +* xml:base + +Support for XML Schema is currently not 100% complete in libxml2, but +is definitely very close to compliance. Schematron is supported, +although not necessarily complete. libxml2 also supports loading +documents through HTTP and FTP. Who uses lxml? Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Mon Mar 3 19:41:50 2008 @@ -4,7 +4,7 @@ SITE_STRUCTURE = [ ('lxml', ('main.txt', 'intro.txt', '../INSTALL.txt', 'lxml2.txt', - 'FAQ.txt', 'compatibility.txt', 'performance.txt')), + 'performance.txt', 'compatibility.txt', 'FAQ.txt')), ('Developing with lxml', ('tutorial.txt', '@API reference', 'api.txt', 'parsing.txt', 'validation.txt', 'xpathxslt.txt', From scoder at codespeak.net Mon Mar 3 19:41:59 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:41:59 +0100 (CET) Subject: [Lxml-checkins] r52107 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080303184159.B2D87169EAE@codespeak.net> Author: scoder Date: Mon Mar 3 19:41:59 2008 New Revision: 52107 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/TODO.txt lxml/trunk/src/lxml/tests/test_xslt.py lxml/trunk/src/lxml/xslt.pxi Log: r3698 at delle: sbehnel | 2008-03-03 11:49:57 +0100 constant instances DENY_ALL/DENY_WRITE on XSLTAccessControl class Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Mar 3 19:41:59 2008 @@ -8,6 +8,9 @@ Features added -------------- +* Constant instances ``DENY_ALL`` and ``DENY_WRITE`` on + ``XSLTAccessControl`` class. + * Extension elements for XSLT (experimental!) * ``Element.base`` property returns the xml:base or HTML base URL of Modified: lxml/trunk/TODO.txt ============================================================================== --- lxml/trunk/TODO.txt (original) +++ lxml/trunk/TODO.txt Mon Mar 3 19:41:59 2008 @@ -45,6 +45,13 @@ by libxml2 (patch exists) +XSLT +---- + +* Support subclassing XSLTAccessControl to provide custom per-URL + access check methods + + lxml 2.0 ======== Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Mon Mar 3 19:41:59 2008 @@ -819,6 +819,29 @@ self.assertEquals(root[3].get("value"), 'B') + def test_xslt_document_parse_allow(self): + access_control = etree.XSLTAccessControl(read_file=True) + xslt = etree.XSLT(etree.parse(fileInTestDir("test-document.xslt")), + access_control = access_control) + result = xslt(etree.XML('')) + root = result.getroot() + self.assertEquals(root.tag, + 'test') + self.assertEquals(root[0].tag, + '{http://www.w3.org/1999/XSL/Transform}stylesheet') + + def test_xslt_document_parse_deny(self): + access_control = etree.XSLTAccessControl(read_file=False) + xslt = etree.XSLT(etree.parse(fileInTestDir("test-document.xslt")), + access_control = access_control) + self.assertRaises(etree.XSLTApplyError, xslt, etree.XML('')) + + def test_xslt_document_parse_deny_all(self): + access_control = etree.XSLTAccessControl.DENY_ALL + xslt = etree.XSLT(etree.parse(fileInTestDir("test-document.xslt")), + access_control = access_control) + self.assertRaises(etree.XSLTApplyError, xslt, etree.XML('')) + def test_xslt_move_result(self): root = etree.XML('''\ Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Mon Mar 3 19:41:59 2008 @@ -180,6 +180,11 @@ - read_network - write_network + For convenience, there is also a class member `DENY_ALL` that + provides an XSLTAccessControl instance that is readily configured + to deny everything, and a `DENY_WRITE` member that denies all + write access but allows read access. + See `XSLT`. """ cdef xslt.xsltSecurityPrefs* _prefs @@ -194,6 +199,14 @@ self._setAccess(xslt.XSLT_SECPREF_READ_NETWORK, read_network) self._setAccess(xslt.XSLT_SECPREF_WRITE_NETWORK, write_network) + DENY_ALL = XSLTAccessControl( + read_file=False, write_file=False, create_dir=False, + read_network=False, write_network=False) + + DENY_WRITE = XSLTAccessControl( + read_file=True, write_file=False, create_dir=False, + read_network=True, write_network=False) + def __dealloc__(self): if self._prefs is not NULL: xslt.xsltFreeSecurityPrefs(self._prefs) From scoder at codespeak.net Mon Mar 3 19:42:05 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:05 +0100 (CET) Subject: [Lxml-checkins] r52108 - lxml/trunk Message-ID: <20080303184205.9FB3A169ECE@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:05 2008 New Revision: 52108 Modified: lxml/trunk/ (props changed) lxml/trunk/Makefile Log: r3699 at delle: sbehnel | 2008-03-03 12:01:18 +0100 docclean target in Makefile Modified: lxml/trunk/Makefile ============================================================================== --- lxml/trunk/Makefile (original) +++ lxml/trunk/Makefile Mon Mar 3 19:42:05 2008 @@ -41,7 +41,6 @@ $(PYTHON) test.py -f $(TESTFLAGS) $(TESTOPTS) html: inplace - mkdir -p doc/html PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . `cat version.txt` rm -fr doc/html/api @[ -x "`which epydoc`" ] \ @@ -65,7 +64,11 @@ find . \( -name '*.o' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; rm -rf build -realclean: clean +docclean: + rm -f doc/html/*.html + rm -fr doc/html/api + +realclean: clean docclean find . -name '*.c' -exec rm -f {} \; rm -f TAGS $(PYTHON) setup.py clean -a From scoder at codespeak.net Mon Mar 3 19:42:25 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:25 +0100 (CET) Subject: [Lxml-checkins] r52109 - in lxml/trunk: . doc src/lxml src/lxml/tests Message-ID: <20080303184225.B1D03169EAE@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:19 2008 New Revision: 52109 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/api.txt lxml/trunk/doc/extensions.txt lxml/trunk/src/lxml/classlookup.pxi lxml/trunk/src/lxml/lxml.objectify.pyx lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/tests/test_classlookup.py lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/tests/test_htmlparser.py lxml/trunk/src/lxml/tests/test_nsclasses.py lxml/trunk/src/lxml/tests/test_xslt.py lxml/trunk/src/lxml/xmlerror.pxi lxml/trunk/src/lxml/xpath.pxi lxml/trunk/src/lxml/xslt.pxi Log: r3700 at delle: sbehnel | 2008-03-03 12:30:47 +0100 removed most deprecated functions and methods Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Mar 3 19:42:19 2008 @@ -24,6 +24,8 @@ Other changes ------------- +* Most deprecated functions and methods were removed. + 2.0.2 (2008-02-22) ================== Modified: lxml/trunk/doc/api.txt ============================================================================== --- lxml/trunk/doc/api.txt (original) +++ lxml/trunk/doc/api.txt Mon Mar 3 19:42:19 2008 @@ -208,7 +208,7 @@ errors that occured and "might have" lead to the problem from the error log copy attached to the exception:: - >>> etree.clearErrorLog() + >>> etree.clear_error_log() >>> broken_xml = ''' ... ... Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Mon Mar 3 19:42:19 2008 @@ -176,7 +176,7 @@ register the namespace with the evaluator, however, we can access it via a prefix:: - >>> e.registerNamespace('foo', 'http://mydomain.org/myfunctions') + >>> e.register_namespace('foo', 'http://mydomain.org/myfunctions') >>> e.evaluate('/foo:a')[0].tag '{http://mydomain.org/myfunctions}a' Modified: lxml/trunk/src/lxml/classlookup.pxi ============================================================================== --- lxml/trunk/src/lxml/classlookup.pxi (original) +++ lxml/trunk/src/lxml/classlookup.pxi Mon Mar 3 19:42:19 2008 @@ -107,13 +107,6 @@ """ self._setFallback(lookup) - def setFallback(self, ElementClassLookup lookup not None): - """Sets the fallback scheme for this lookup method. - - :deprecated: use ``set_fallback()`` instead. - """ - self._setFallback(lookup) - cdef object _callFallback(self, _Document doc, xmlNode* c_node): return self._fallback_function(self.fallback, doc, c_node) @@ -408,10 +401,6 @@ ELEMENT_CLASS_LOOKUP_STATE = state LOOKUP_ELEMENT_CLASS = function -def setElementClassLookup(ElementClassLookup lookup = None): - ":deprecated: use ``set_element_class_lookup(lookup)`` instead" - set_element_class_lookup(lookup) - def set_element_class_lookup(ElementClassLookup lookup = None): """set_element_class_lookup(lookup = None) Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Mon Mar 3 19:42:19 2008 @@ -81,11 +81,6 @@ PYTYPE_ATTRIBUTE = cetree.namespacedNameFromNsName( _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) -def setPytypeAttributeTag(attribute_tag=None): - """:deprecated: use ``set_pytype_attribute_tag()`` instead. - """ - set_pytype_attribute_tag(attribute_tag) - set_pytype_attribute_tag() @@ -1685,10 +1680,6 @@ cdef object objectify_parser objectify_parser = __DEFAULT_PARSER -def setDefaultParser(new_parser = None): - ":deprecated: use ``set_default_parser()`` instead." - set_default_parser(new_parser) - def set_default_parser(new_parser = None): """set_default_parser(new_parser = None) Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Mon Mar 3 19:42:19 2008 @@ -678,7 +678,7 @@ return "libxml2 %d.%d.%d" % LIBXML_VERSION def setElementClassLookup(self, ElementClassLookup lookup = None): - "@deprecated: use ``parser.set_element_class_lookup(lookup)`` instead." + ":deprecated: use ``parser.set_element_class_lookup(lookup)`` instead." self.set_element_class_lookup(lookup) def set_element_class_lookup(self, ElementClassLookup lookup = None): @@ -1130,14 +1130,6 @@ __GLOBAL_PARSER_CONTEXT.setDefaultParser(__DEFAULT_XML_PARSER) -def setDefaultParser(parser=None): - ":deprecated: please use set_default_parser instead." - set_default_parser(parser) - -def getDefaultParser(): - ":deprecated: please use get_default_parser instead." - return get_default_parser() - def set_default_parser(_BaseParser parser=None): """set_default_parser(parser=None) Modified: lxml/trunk/src/lxml/tests/test_classlookup.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_classlookup.py (original) +++ lxml/trunk/src/lxml/tests/test_classlookup.py Mon Mar 3 19:42:19 2008 @@ -25,7 +25,7 @@ etree = etree def tearDown(self): - etree.setElementClassLookup() + etree.set_element_class_lookup() super(ClassLookupTestCase, self).tearDown() def test_namespace_lookup(self): @@ -33,7 +33,7 @@ FIND_ME = "namespace class" lookup = etree.ElementNamespaceClassLookup() - etree.setElementClassLookup(lookup) + etree.set_element_class_lookup(lookup) ns = lookup.get_namespace("myNS") ns[None] = TestElement @@ -57,7 +57,7 @@ lookup = etree.ElementDefaultClassLookup( element=TestElement, comment=TestComment, pi=TestPI) - parser.setElementClassLookup(lookup) + parser.set_element_class_lookup(lookup) root = etree.XML(""" @@ -78,7 +78,7 @@ lookup = etree.AttributeBasedElementClassLookup( "a1", class_dict) - etree.setElementClassLookup(lookup) + etree.set_element_class_lookup(lookup) root = etree.XML(xml_str) self.assertFalse(hasattr(root, 'FIND_ME')) @@ -95,7 +95,7 @@ if name == 'c1': return TestElement - etree.setElementClassLookup( MyLookup() ) + etree.set_element_class_lookup( MyLookup() ) root = etree.XML(xml_str) self.assertFalse(hasattr(root, 'FIND_ME')) @@ -116,7 +116,7 @@ return TestElement1 lookup = etree.ElementNamespaceClassLookup( MyLookup() ) - etree.setElementClassLookup(lookup) + etree.set_element_class_lookup(lookup) ns = lookup.get_namespace("otherNS") ns[None] = TestElement2 @@ -134,14 +134,14 @@ FIND_ME = "parser_based" lookup = etree.ParserBasedElementClassLookup() - etree.setElementClassLookup(lookup) + etree.set_element_class_lookup(lookup) class MyLookup(etree.CustomElementClassLookup): def lookup(self, t, d, ns, name): return TestElement parser = etree.XMLParser() - parser.setElementClassLookup( MyLookup() ) + parser.set_element_class_lookup( MyLookup() ) root = etree.parse(StringIO(xml_str), parser).getroot() self.assertEquals(root.FIND_ME, Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Mon Mar 3 19:42:19 2008 @@ -254,7 +254,7 @@ parse = self.etree.parse # from StringIO f = StringIO('') - self.etree.clearErrorLog() + self.etree.clear_error_log() try: parse(f) logs = None Modified: lxml/trunk/src/lxml/tests/test_htmlparser.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_htmlparser.py (original) +++ lxml/trunk/src/lxml/tests/test_htmlparser.py Mon Mar 3 19:42:19 2008 @@ -27,7 +27,7 @@ def tearDown(self): super(HtmlParserTestCase, self).tearDown() - self.etree.setDefaultParser() + self.etree.set_default_parser() def test_module_HTML(self): element = self.etree.HTML(self.html_str) @@ -235,13 +235,13 @@ self.assertRaises(self.etree.XMLSyntaxError, self.etree.parse, StringIO(self.broken_html_str)) - self.etree.setDefaultParser( self.etree.HTMLParser() ) + self.etree.set_default_parser( self.etree.HTMLParser() ) tree = self.etree.parse(StringIO(self.broken_html_str)) self.assertEqual(self.etree.tostring(tree.getroot()), self.html_str) - self.etree.setDefaultParser() + self.etree.set_default_parser() self.assertRaises(self.etree.XMLSyntaxError, self.etree.parse, StringIO(self.broken_html_str)) Modified: lxml/trunk/src/lxml/tests/test_nsclasses.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_nsclasses.py (original) +++ lxml/trunk/src/lxml/tests/test_nsclasses.py Mon Mar 3 19:42:19 2008 @@ -25,11 +25,11 @@ lookup = etree.ElementNamespaceClassLookup() self.Namespace = lookup.get_namespace parser = etree.XMLParser() - parser.setElementClassLookup(lookup) - etree.setDefaultParser(parser) + parser.set_element_class_lookup(lookup) + etree.set_default_parser(parser) def tearDown(self): - etree.setDefaultParser() + etree.set_default_parser() del self.Namespace super(ETreeNamespaceClassesTestCase, self).tearDown() Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Mon Mar 3 19:42:19 2008 @@ -29,7 +29,7 @@ B ''', - st.tostring(res)) + str(res)) def test_xslt_elementtree_error(self): self.assertRaises(ValueError, etree.XSLT, etree.ElementTree()) @@ -298,7 +298,7 @@ Bar ''', - st.tostring(res)) + str(res)) if etree.LIBXSLT_VERSION < (1,1,18): # later versions produce no error @@ -335,7 +335,7 @@ BarBaz ''', - st.tostring(res)) + str(res)) def test_xslt_parameter_xpath(self): tree = self.parse('BC') @@ -354,7 +354,7 @@ B ''', - st.tostring(res)) + str(res)) def test_xslt_default_parameters(self): @@ -375,13 +375,13 @@ Bar ''', - st.tostring(res)) + str(res)) res = st.apply(tree) self.assertEquals('''\ Default ''', - st.tostring(res)) + str(res)) def test_xslt_html_output(self): tree = self.parse('BC') @@ -471,7 +471,6 @@ styledoc = self.parse(xslt) style = etree.XSLT(styledoc) result = style.apply(source) - self.assertEqual('', style.tostring(result)) self.assertEqual('', str(result)) def test_xslt_message(self): @@ -488,7 +487,6 @@ styledoc = self.parse(xslt) style = etree.XSLT(styledoc) result = style.apply(source) - self.assertEqual('', style.tostring(result)) self.assertEqual('', str(result)) self.assert_("TEST TEST TEST" in [entry.message for entry in style.error_log]) @@ -507,7 +505,6 @@ styledoc = self.parse(xslt) style = etree.XSLT(styledoc) result = style.apply(source) - self.assertEqual('', style.tostring(result)) self.assertEqual('', str(result)) self.assert_("TEST TEST TEST" in [entry.message for entry in style.error_log]) @@ -907,7 +904,7 @@ B ''', - st.tostring(res)) + str(res)) def test_xslt_pi_embedded_id(self): # test XPath lookup mechanism @@ -941,7 +938,7 @@ B ''', - st.tostring(res)) + str(res)) def test_xslt_pi_get(self): tree = self.parse('''\ Modified: lxml/trunk/src/lxml/xmlerror.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlerror.pxi (original) +++ lxml/trunk/src/lxml/xmlerror.pxi Mon Mar 3 19:42:19 2008 @@ -12,14 +12,6 @@ """ __GLOBAL_ERROR_LOG.clear() -def clearErrorLog(): - """Clear the global error log. Note that this log is already bound to a - fixed size. - - :deprecated: use ``clear_error_log()`` instead. - """ - __GLOBAL_ERROR_LOG.clear() - # dummy function: no debug output at all cdef void _nullGenericErrorFunc(void* ctxt, char* msg, ...): pass @@ -411,17 +403,6 @@ "Helper function for properties in exceptions." return __GLOBAL_ERROR_LOG.copy() -def useGlobalPythonLog(PyErrorLog log not None): - """Replace the global error log by an etree.PyErrorLog that uses the - standard Python logging package. - - Note that this disables access to the global error log from exceptions. - Parsers, XSLT etc. will continue to provide their normal local error log. - - :deprecated: use ``use_global_python_log()`` instead. - """ - use_global_python_log(log) - def use_global_python_log(PyErrorLog log not None): """use_global_python_log(log) Modified: lxml/trunk/src/lxml/xpath.pxi ============================================================================== --- lxml/trunk/src/lxml/xpath.pxi (original) +++ lxml/trunk/src/lxml/xpath.pxi Mon Mar 3 19:42:19 2008 @@ -235,26 +235,11 @@ python.PyErr_NoMemory() self.set_context(xpathCtxt) - def registerNamespace(self, prefix, uri): - """Register a namespace with the XPath context. - - :deprecated: use ``register_namespace()`` instead - """ - self._context.addNamespace(prefix, uri) - def register_namespace(self, prefix, uri): """Register a namespace with the XPath context. """ self._context.addNamespace(prefix, uri) - def registerNamespaces(self, namespaces): - """Register a prefix -> uri dict. - - :deprecated: use ``register_namespaces()`` instead - """ - for prefix, uri in namespaces.items(): - self._context.addNamespace(prefix, uri) - def register_namespaces(self, namespaces): """Register a prefix -> uri dict. """ Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Mon Mar 3 19:42:19 2008 @@ -387,15 +387,6 @@ :deprecated: call the object, not this method.""" return self(_input, profile_run=profile_run, **_kw) - def tostring(self, _ElementTree result_tree): - """tostring(self, result_tree) - - Save result doc to string based on stylesheet output method. - - :deprecated: use str(result_tree) instead. - """ - return str(result_tree) - def __deepcopy__(self, memo): return self.__copy__() From scoder at codespeak.net Mon Mar 3 19:42:27 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:27 +0100 (CET) Subject: [Lxml-checkins] r52110 - in lxml/trunk: . src/lxml Message-ID: <20080303184227.226BB169EAE@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:26 2008 New Revision: 52110 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/parser.pxi Log: r3701 at delle: sbehnel | 2008-03-03 13:13:15 +0100 dropped one more Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Mon Mar 3 19:42:26 2008 @@ -677,10 +677,6 @@ def __get__(self): return "libxml2 %d.%d.%d" % LIBXML_VERSION - def setElementClassLookup(self, ElementClassLookup lookup = None): - ":deprecated: use ``parser.set_element_class_lookup(lookup)`` instead." - self.set_element_class_lookup(lookup) - def set_element_class_lookup(self, ElementClassLookup lookup = None): """set_element_class_lookup(self, lookup = None) From scoder at codespeak.net Mon Mar 3 19:42:39 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:39 +0100 (CET) Subject: [Lxml-checkins] r52111 - in lxml/trunk: . doc src/lxml src/lxml/html src/lxml/tests Message-ID: <20080303184239.DDCB7169ECE@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:39 2008 New Revision: 52111 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/element_classes.txt lxml/trunk/doc/extensions.txt lxml/trunk/src/lxml/html/__init__.py lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/tests/test_objectify.py lxml/trunk/src/lxml/tests/test_pyclasslookup.py lxml/trunk/src/lxml/tests/test_xpathevaluator.py lxml/trunk/src/lxml/tests/test_xslt.py Log: r3702 at delle: sbehnel | 2008-03-03 13:32:50 +0100 tons of API usage fixes in the docs Modified: lxml/trunk/doc/element_classes.txt ============================================================================== --- lxml/trunk/doc/element_classes.txt (original) +++ lxml/trunk/doc/element_classes.txt Mon Mar 3 19:42:39 2008 @@ -89,7 +89,7 @@ >>> parser_lookup = etree.ElementDefaultClassLookup(element=HonkElement) >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(parser_lookup) + >>> parser.set_element_class_lookup(parser_lookup) There is one drawback of the parser based scheme: the ``Element()`` factory does not know about your specialised parser and creates a new document that @@ -153,7 +153,7 @@ >>> lookup = etree.ElementDefaultClassLookup() >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) Note that the default for new parsers is to use the global fallback, which is also the default lookup (if not configured otherwise). @@ -167,7 +167,7 @@ False >>> lookup = etree.ElementDefaultClassLookup(element=HonkElement) - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) >>> el = parser.makeelement("myelement") >>> print isinstance(el, HonkElement) @@ -189,7 +189,7 @@ >>> lookup = etree.ElementNamespaceClassLookup() >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) See the separate section on `implementing namespaces`_ below to learn how to make use of it. @@ -203,7 +203,7 @@ >>> fallback = etree.ElementDefaultClassLookup(element=HonkElement) >>> lookup = etree.ElementNamespaceClassLookup(fallback) - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) Attribute based lookup @@ -217,7 +217,7 @@ >>> lookup = etree.AttributeBasedElementClassLookup( ... 'id', id_class_mapping) >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) Instead of a global setup of this scheme, you should consider using a per-parser setup. @@ -230,7 +230,7 @@ >>> lookup = etree.AttributeBasedElementClassLookup( ... 'id', id_class_mapping, fallback) >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) Custom element class lookup @@ -244,7 +244,7 @@ ... return MyElementClass # defined elsewhere >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(MyLookup()) + >>> parser.set_element_class_lookup(MyLookup()) The ``lookup()`` method must either return None (which triggers the fallback mechanism) or a subclass of ``lxml.etree.ElementBase``. It can take any @@ -270,7 +270,7 @@ ... return MyElementClass # defined elsewhere >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(MyLookup()) + >>> parser.set_element_class_lookup(MyLookup()) As before, the first argument to the ``lookup()`` method is the opaque document instance that contains the Element. The second arguments is a @@ -305,7 +305,7 @@ >>> lookup = etree.ElementNamespaceClassLookup() >>> parser = etree.XMLParser() - >>> parser.setElementClassLookup(lookup) + >>> parser.set_element_class_lookup(lookup) >>> namespace = lookup.get_namespace('http://hui.de/honk') Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Mon Mar 3 19:42:39 2008 @@ -141,12 +141,12 @@ XSL transformations:: >>> e = etree.XPathEvaluator(doc) - >>> print e.evaluate('es:hello(local-name(/a))') + >>> print e('es:hello(local-name(/a))') Ola a >>> namespaces = {'f' : 'http://mydomain.org/myfunctions'} >>> e = etree.XPathEvaluator(doc, namespaces=namespaces) - >>> print e.evaluate('f:hello(local-name(/a))') + >>> print e('f:hello(local-name(/a))') Hello a >>> xslt = etree.XSLT(etree.ElementTree(etree.XML(''' @@ -169,7 +169,7 @@ >>> f = StringIO('') >>> ns_doc = etree.parse(f) >>> e = etree.XPathEvaluator(ns_doc) - >>> e.evaluate('/a') + >>> e('/a') [] This returns nothing, as we did not ask for the right namespace. When we @@ -177,14 +177,14 @@ prefix:: >>> e.register_namespace('foo', 'http://mydomain.org/myfunctions') - >>> e.evaluate('/foo:a')[0].tag + >>> e('/foo:a')[0].tag '{http://mydomain.org/myfunctions}a' Note that this prefix mapping is only known to this evaluator, as opposed to the global mapping of the FunctionNamespace objects:: >>> e2 = etree.XPathEvaluator(ns_doc) - >>> e2.evaluate('/foo:a') + >>> e2('/foo:a') Traceback (most recent call last): ... XPathEvalError: Undefined namespace prefix @@ -202,7 +202,7 @@ >>> namespaces = {'l' : 'local-ns'} >>> e = etree.XPathEvaluator(doc, namespaces=namespaces, extensions=extensions) - >>> print e.evaluate('l:local-hello(string(b))') + >>> print e('l:local-hello(string(b))') Hello Haegar For larger numbers of extension functions, you can define classes or modules @@ -221,7 +221,7 @@ >>> extensions = etree.Extension( ext_module, functions, ns='local-ns' ) >>> e = etree.XPathEvaluator(doc, namespaces=namespaces, extensions=extensions) - >>> print e.evaluate('l:function1(string(b))') + >>> print e('l:function1(string(b))') 1Haegar The optional second argument to ``Extension`` can either be be a @@ -237,17 +237,17 @@ >>> functions = ('function1', 'function2', 'function3') >>> extensions = etree.Extension( ext_module, functions ) >>> e = etree.XPathEvaluator(doc, extensions=extensions) - >>> print e.evaluate('function1(function2(function3(string(b))))') + >>> print e('function1(function2(function3(string(b))))') 123Haegar >>> extensions = etree.Extension( ext_module, functions, ns=None ) >>> e = etree.XPathEvaluator(doc, extensions=extensions) - >>> print e.evaluate('function1(function2(function3(string(b))))') + >>> print e('function1(function2(function3(string(b))))') 123Haegar >>> extensions = etree.Extension(ext_module) >>> e = etree.XPathEvaluator(doc, extensions=extensions) - >>> print e.evaluate('function1(function2(function3(string(b))))') + >>> print e('function1(function2(function3(string(b))))') 123Haegar >>> functions = { @@ -257,7 +257,7 @@ ... } >>> extensions = etree.Extension(ext_module, functions) >>> e = etree.XPathEvaluator(doc, extensions=extensions) - >>> print e.evaluate('function1(function2(function3(string(b))))') + >>> print e('function1(function2(function3(string(b))))') 123Haegar For convenience, you can also pass a sequence of extensions:: @@ -266,7 +266,7 @@ >>> extensions2 = etree.Extension(ext_module, ns='local-ns') >>> e = etree.XPathEvaluator(doc, extensions=[extensions1, extensions2], ... namespaces=namespaces) - >>> print e.evaluate('function1(l:function2(function3(string(b))))') + >>> print e('function1(l:function2(function3(string(b))))') 123Haegar @@ -296,15 +296,15 @@ >>> ns['first'] = returnFirstNode >>> e = etree.XPathEvaluator(doc) - >>> e.evaluate("float()") + >>> e("float()") 1.7 - >>> e.evaluate("int()") + >>> e("int()") 1.0 - >>> int( e.evaluate("int()") ) + >>> int( e("int()") ) 1 - >>> e.evaluate("bool()") + >>> e("bool()") True - >>> e.evaluate("count(first(//b))") + >>> e("count(first(//b))") 1.0 As the last example shows, you can pass the results of functions back into @@ -327,11 +327,11 @@ >>> e = etree.XPathEvaluator(doc) - >>> r = e.evaluate("new-node-set()/result") + >>> r = e("new-node-set()/result") >>> print [ t.text for t in r ] ['Alpha', 'Beta', 'Gamma', 'Delta'] - >>> r = e.evaluate("new-node-set()") + >>> r = e("new-node-set()") >>> print [ t.tag for t in r ] ['results1', 'results2', 'subresult'] >>> print [ len(t) for t in r ] Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Mon Mar 3 19:42:39 2008 @@ -1339,7 +1339,7 @@ class HTMLParser(etree.HTMLParser): def __init__(self, **kwargs): super(HTMLParser, self).__init__(**kwargs) - self.setElementClassLookup(HtmlElementClassLookup()) + self.set_element_class_lookup(HtmlElementClassLookup()) def Element(*args, **kw): v = html_parser.makeelement(*args, **kw) Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Mon Mar 3 19:42:39 2008 @@ -1326,7 +1326,7 @@ """ evaluator = XPathElementEvaluator(self, namespaces=namespaces, extensions=extensions) - return evaluator.evaluate(_path, **_variables) + return evaluator(_path, **_variables) cdef python.PyThread_type_lock ELEMENT_CREATION_LOCK @@ -1742,7 +1742,7 @@ self._assertHasRoot() evaluator = XPathDocumentEvaluator(self, namespaces=namespaces, extensions=extensions) - return evaluator.evaluate(_path, **_variables) + return evaluator(_path, **_variables) def xslt(self, _xslt, extensions=None, access_control=None, **_kw): """xslt(self, _xslt, extensions=None, access_control=None, **_kw) Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Mon Mar 3 19:42:39 2008 @@ -677,6 +677,10 @@ def __get__(self): return "libxml2 %d.%d.%d" % LIBXML_VERSION + def setElementClassLookup(self, ElementClassLookup lookup = None): + ":deprecated: use ``parser.set_element_class_lookup(lookup)`` instead." + self.set_element_class_lookup(lookup) + def set_element_class_lookup(self, ElementClassLookup lookup = None): """set_element_class_lookup(self, lookup = None) Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Mon Mar 3 19:42:39 2008 @@ -77,7 +77,7 @@ self.parser = self.etree.XMLParser(remove_blank_text=True) self.lookup = etree.ElementNamespaceClassLookup( objectify.ObjectifyElementClassLookup() ) - self.parser.setElementClassLookup(self.lookup) + self.parser.set_element_class_lookup(self.lookup) self.Element = self.parser.makeelement Modified: lxml/trunk/src/lxml/tests/test_pyclasslookup.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_pyclasslookup.py (original) +++ lxml/trunk/src/lxml/tests/test_pyclasslookup.py Mon Mar 3 19:42:39 2008 @@ -32,14 +32,14 @@ Element = parser.makeelement def tearDown(self): - self.parser.setElementClassLookup(None) + self.parser.set_element_class_lookup(None) super(PyClassLookupTestCase, self).tearDown() def _setClassLookup(self, lookup_function): class Lookup(PythonElementClassLookup): def lookup(self, *args): return lookup_function(*args) - self.parser.setElementClassLookup( Lookup() ) + self.parser.set_element_class_lookup( Lookup() ) def _buildElementClass(self): class LocalElement(etree.ElementBase): Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Mon Mar 3 19:42:39 2008 @@ -192,7 +192,7 @@ root = tree.getroot() self.assertEquals( [root], - e.evaluate('//a')) + e('//a')) def test_xpath_evaluator_tree(self): tree = self.parse('') @@ -200,11 +200,11 @@ e = etree.XPathEvaluator(child_tree) self.assertEquals( [], - e.evaluate('a')) + e('a')) root = child_tree.getroot() self.assertEquals( [root[0]], - e.evaluate('c')) + e('c')) def test_xpath_evaluator_tree_absolute(self): tree = self.parse('') @@ -212,14 +212,14 @@ e = etree.XPathEvaluator(child_tree) self.assertEquals( [], - e.evaluate('/a')) + e('/a')) root = child_tree.getroot() self.assertEquals( [root], - e.evaluate('/b')) + e('/b')) self.assertEquals( [], - e.evaluate('/c')) + e('/c')) def test_xpath_evaluator_element(self): tree = self.parse('') @@ -227,7 +227,7 @@ e = etree.XPathEvaluator(root[0]) self.assertEquals( [root[0][0]], - e.evaluate('c')) + e('c')) def test_xpath_extensions(self): def foo(evaluator, a): @@ -236,7 +236,7 @@ tree = self.parse('') e = etree.XPathEvaluator(tree, extensions=[extension]) self.assertEquals( - "hello you", e.evaluate("foo('you')")) + "hello you", e("foo('you')")) def test_xpath_extensions_wrong_args(self): def foo(evaluator, a, b): @@ -244,7 +244,7 @@ extension = {(None, 'foo'): foo} tree = self.parse('') e = etree.XPathEvaluator(tree, extensions=[extension]) - self.assertRaises(TypeError, e.evaluate, "foo('you')") + self.assertRaises(TypeError, e, "foo('you')") def test_xpath_extensions_error(self): def foo(evaluator, a): @@ -252,7 +252,7 @@ extension = {(None, 'foo'): foo} tree = self.parse('') e = etree.XPathEvaluator(tree, extensions=[extension]) - self.assertRaises(ZeroDivisionError, e.evaluate, "foo('test')") + self.assertRaises(ZeroDivisionError, e, "foo('test')") def test_xpath_extensions_nodes(self): def f(evaluator, arg): @@ -265,7 +265,7 @@ x = self.parse('') e = etree.XPathEvaluator(x, extensions=[{(None, 'foo'): f}]) - r = e.evaluate("foo('World')/result") + r = e("foo('World')/result") self.assertEquals(2, len(r)) self.assertEquals('Hoi', r[0].text) self.assertEquals('Dag', r[1].text) @@ -281,7 +281,7 @@ x = self.parse('') e = etree.XPathEvaluator(x, extensions=[{(None, 'foo'): f}]) - r = e.evaluate("foo(/*)/result") + r = e("foo(/*)/result") self.assertEquals(2, len(r)) self.assertEquals('Hoi', r[0].text) self.assertEquals('Dag', r[1].text) @@ -298,7 +298,7 @@ x = self.parse('Honk') e = etree.XPathEvaluator(x, extensions=[{(None, 'foo'): f}]) - r = e.evaluate("foo(/*)/result") + r = e("foo(/*)/result") self.assertEquals(3, len(r)) self.assertEquals('Hoi', r[0].text) self.assertEquals('Dag', r[1].text) @@ -375,14 +375,14 @@ e = etree.XPathEvaluator(x) expr = "/a[@attr=$aval]" - r = e.evaluate(expr, aval=1) + r = e(expr, aval=1) self.assertEquals(0, len(r)) - r = e.evaluate(expr, aval="true") + r = e(expr, aval="true") self.assertEquals(1, len(r)) self.assertEquals("true", r[0].get('attr')) - r = e.evaluate(expr, aval=True) + r = e(expr, aval=True) self.assertEquals(1, len(r)) self.assertEquals("true", r[0].get('attr')) @@ -393,7 +393,7 @@ element = etree.Element("test-el") etree.SubElement(element, "test-sub") expr = "$value" - r = e.evaluate(expr, value=element) + r = e(expr, value=element) self.assertEquals(1, len(r)) self.assertEquals(element.tag, r[0].tag) self.assertEquals(element[0].tag, r[0][0].tag) @@ -424,41 +424,41 @@ e = etree.XPathEvaluator(x, extensions=[extension]) del x - self.assertRaises(LocalException, e.evaluate, "foo(., 0)") - self.assertRaises(LocalException, e.evaluate, "foo(., $value)", value=0) + self.assertRaises(LocalException, e, "foo(., 0)") + self.assertRaises(LocalException, e, "foo(., $value)", value=0) - r = e.evaluate("foo(., $value)", value=1) + r = e("foo(., $value)", value=1) self.assertEqual(len(r), 0) - r = e.evaluate("foo(., 1)") + r = e("foo(., 1)") self.assertEqual(len(r), 0) - r = e.evaluate("foo(., $value)", value=2) + r = e("foo(., $value)", value=2) self.assertEqual(len(r), 0) - r = e.evaluate("foo(., $value)", value=3) + r = e("foo(., $value)", value=3) self.assertEqual(len(r), 1) self.assertEqual(r[0].tag, "test") - r = e.evaluate("foo(., $value)", value="false") + r = e("foo(., $value)", value="false") self.assertEqual(len(r), 1) self.assertEqual(r[0].tag, "NODE") - r = e.evaluate("foo(., 'false')") + r = e("foo(., 'false')") self.assertEqual(len(r), 1) self.assertEqual(r[0].tag, "NODE") - r = e.evaluate("foo(., 'true')") + r = e("foo(., 'true')") self.assertEqual(len(r), 1) self.assertEqual(r[0].tag, "a") self.assertEqual(r[0][0].tag, "test") - r = e.evaluate("foo(., $value)", value="true") + r = e("foo(., $value)", value="true") self.assertEqual(len(r), 1) self.assertEqual(r[0].tag, "a") - self.assertRaises(LocalException, e.evaluate, "foo(., 0)") - self.assertRaises(LocalException, e.evaluate, "foo(., $value)", value=0) + self.assertRaises(LocalException, e, "foo(., 0)") + self.assertRaises(LocalException, e, "foo(., $value)", value=0) class ETreeXPathClassTestCase(HelperTestCase): @@ -467,15 +467,15 @@ x = self.parse('') expr = etree.XPath("/a[@attr != 'true']") - r = expr.evaluate(x) + r = expr(x) self.assertEquals(0, len(r)) expr = etree.XPath("/a[@attr = 'true']") - r = expr.evaluate(x) + r = expr(x) self.assertEquals(1, len(r)) expr = etree.XPath( expr.path ) - r = expr.evaluate(x) + r = expr(x) self.assertEquals(1, len(r)) def test_xpath_compile_element(self): @@ -483,22 +483,22 @@ root = x.getroot() expr = etree.XPath("./b") - r = expr.evaluate(root) + r = expr(root) self.assertEquals(1, len(r)) self.assertEquals('b', r[0].tag) expr = etree.XPath("./*") - r = expr.evaluate(root) + r = expr(root) self.assertEquals(2, len(r)) def test_xpath_compile_vars(self): x = self.parse('') expr = etree.XPath("/a[@attr=$aval]") - r = expr.evaluate(x, aval=False) + r = expr(x, aval=False) self.assertEquals(0, len(r)) - r = expr.evaluate(x, aval=True) + r = expr(x, aval=True) self.assertEquals(1, len(r)) def test_xpath_compile_error(self): @@ -513,12 +513,12 @@ x = self.parse('') expr = etree.ETXPath("/a/{nsa}b") - r = expr.evaluate(x) + r = expr(x) self.assertEquals(1, len(r)) self.assertEquals('{nsa}b', r[0].tag) expr = etree.ETXPath("/a/{nsb}b") - r = expr.evaluate(x) + r = expr(x) self.assertEquals(1, len(r)) self.assertEquals('{nsb}b', r[0].tag) @@ -526,12 +526,12 @@ x = self.parse(u'') expr = etree.ETXPath(u"/a/{nsa\uf8d2}b") - r = expr.evaluate(x) + r = expr(x) self.assertEquals(1, len(r)) self.assertEquals(u'{nsa\uf8d2}b', r[0].tag) expr = etree.ETXPath(u"/a/{nsb\uf8d1}b") - r = expr.evaluate(x) + r = expr(x) self.assertEquals(1, len(r)) self.assertEquals(u'{nsb\uf8d1}b', r[0].tag) @@ -595,32 +595,32 @@ >>> root = SAMPLE_XML >>> e = etree.XPathEvaluator(root, extensions=[extension]) - >>> e.evaluate("stringTest('you')") + >>> e("stringTest('you')") 'Hello you' - >>> e.evaluate(u"stringTest('\xe9lan')") + >>> e(u"stringTest('\xe9lan')") u'Hello \\xe9lan' - >>> e.evaluate("stringTest('you','there')") + >>> e("stringTest('you','there')") Traceback (most recent call last): ... TypeError: stringTest() takes exactly 2 arguments (3 given) - >>> e.evaluate("floatTest(2)") + >>> e("floatTest(2)") 6.0 - >>> e.evaluate("booleanTest(true())") + >>> e("booleanTest(true())") False - >>> map(tag, e.evaluate("setTest(/body/tag)")) + >>> map(tag, e("setTest(/body/tag)")) ['tag'] - >>> map(tag, e.evaluate("setTest2(/body/*)")) + >>> map(tag, e("setTest2(/body/*)")) ['tag', 'section'] - >>> e.evaluate("argsTest1('a',1.5,true(),/body/tag)") + >>> e("argsTest1('a',1.5,true(),/body/tag)") "a, 1.5, True, ['tag', 'tag', 'tag']" - >>> map(tag, e.evaluate("argsTest2(/body/tag, /body/section)")) + >>> map(tag, e("argsTest2(/body/tag, /body/section)")) ['tag', 'section', 'tag', 'tag'] - >>> e.evaluate("resultTypesTest()") + >>> e("resultTypesTest()") Traceback (most recent call last): ... XPathResultError: This is not a node: 'x' >>> try: - ... e.evaluate("resultTypesTest2()") + ... e("resultTypesTest2()") ... except etree.XPathResultError: ... print "Got error" Got error Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Mon Mar 3 19:42:39 2008 @@ -24,7 +24,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree) + res = st(tree) self.assertEquals('''\ B @@ -97,7 +97,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree) + res = st(tree) expected = u'''\ \uF8D2 @@ -117,7 +117,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree) + res = st(tree) expected = u'''\ \uF8D2 @@ -137,7 +137,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree) + res = st(tree) expected = u"""\ \ \uF8D2""" @@ -160,7 +160,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree) + res = st(tree) expected = u'''\ \uF8D2 @@ -293,7 +293,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree, bar="'Bar'") + res = st(tree, bar="'Bar'") self.assertEquals('''\ Bar @@ -330,7 +330,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree, bar="'Bar'", baz="'Baz'") + res = st(tree, bar="'Bar'", baz="'Baz'") self.assertEquals('''\ BarBaz @@ -349,7 +349,7 @@ ''') st = etree.XSLT(style) - res = st.apply(tree, bar="/a/b/text()") + res = st(tree, bar="/a/b/text()") self.assertEquals('''\ B @@ -370,13 +370,13 @@ ''') st = etree.XSLT(style) - res = st.apply(tree, bar="'Bar'") + res = st(tree, bar="'Bar'") self.assertEquals('''\ Bar ''', str(res)) - res = st.apply(tree) + res = st(tree) self.assertEquals('''\ Default @@ -422,14 +422,14 @@ source = self.parse(xml) styledoc = self.parse(xslt) style = etree.XSLT(styledoc) - result = style.apply(source) + result = style(source) etree.tostring(result.getroot()) source = self.parse(xml) styledoc = self.parse(xslt) style = etree.XSLT(styledoc) - result = style.apply(source) + result = style(source) etree.tostring(result.getroot()) @@ -445,10 +445,10 @@ source = self.parse(xml) styledoc = self.parse(xslt) transform = etree.XSLT(styledoc) - result = transform.apply(source) - result = transform.apply(source) + result = transform(source) + result = transform(source) etree.tostring(result.getroot()) - result = transform.apply(source) + result = transform(source) etree.tostring(result.getroot()) str(result) @@ -470,7 +470,7 @@ source = self.parse(xml) styledoc = self.parse(xslt) style = etree.XSLT(styledoc) - result = style.apply(source) + result = style(source) self.assertEqual('', str(result)) def test_xslt_message(self): @@ -486,7 +486,7 @@ source = self.parse(xml) styledoc = self.parse(xslt) style = etree.XSLT(styledoc) - result = style.apply(source) + result = style(source) self.assertEqual('', str(result)) self.assert_("TEST TEST TEST" in [entry.message for entry in style.error_log]) @@ -504,7 +504,7 @@ source = self.parse(xml) styledoc = self.parse(xslt) style = etree.XSLT(styledoc) - result = style.apply(source) + result = style(source) self.assertEqual('', str(result)) self.assert_("TEST TEST TEST" in [entry.message for entry in style.error_log]) @@ -899,7 +899,7 @@ style_root.tag) st = etree.XSLT(style_root) - res = st.apply(tree) + res = st(tree) self.assertEquals('''\ B @@ -933,7 +933,7 @@ style_root.tag) st = etree.XSLT(style_root) - res = st.apply(tree) + res = st(tree) self.assertEquals('''\ B From scoder at codespeak.net Mon Mar 3 19:42:46 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:46 +0100 (CET) Subject: [Lxml-checkins] r52112 - in lxml/trunk: . src/lxml Message-ID: <20080303184246.66422169ECE@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:45 2008 New Revision: 52112 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xslt.pxi Log: r3703 at delle: sbehnel | 2008-03-03 14:19:09 +0100 keep deprecated method ... Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Mon Mar 3 19:42:45 2008 @@ -387,6 +387,15 @@ :deprecated: call the object, not this method.""" return self(_input, profile_run=profile_run, **_kw) + def tostring(self, _ElementTree result_tree): + """tostring(self, result_tree) + + Save result doc to string based on stylesheet output method. + + :deprecated: use str(result_tree) instead. + """ + return str(result_tree) + def __deepcopy__(self, memo): return self.__copy__() From scoder at codespeak.net Mon Mar 3 19:42:51 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:51 +0100 (CET) Subject: [Lxml-checkins] r52113 - in lxml/trunk: . src/lxml Message-ID: <20080303184251.8F321169EB3@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:51 2008 New Revision: 52113 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.objectify.pyx Log: r3704 at delle: sbehnel | 2008-03-03 15:37:22 +0100 deprecate objectify.enableRecursiveStr() Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Mon Mar 3 19:42:51 2008 @@ -1243,6 +1243,17 @@ Enable a recursively generated tree representation for str(element), based on objectify.dump(element). + + :deprecated: use enable_recursive_str() instead + """ + global __RECURSIVE_STR + __RECURSIVE_STR = on + +def enable_recursive_str(on=True): + """enableRecursiveStr(on=True) + + Enable a recursively generated tree representation for str(element), + based on objectify.dump(element). """ global __RECURSIVE_STR __RECURSIVE_STR = on From scoder at codespeak.net Mon Mar 3 19:42:57 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:42:57 +0100 (CET) Subject: [Lxml-checkins] r52114 - lxml/trunk Message-ID: <20080303184257.DF3C9169EAE@codespeak.net> Author: scoder Date: Mon Mar 3 19:42:57 2008 New Revision: 52114 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3705 at delle: sbehnel | 2008-03-03 15:45:42 +0100 changelog: removed deprecated APIs Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Mar 3 19:42:57 2008 @@ -24,7 +24,38 @@ Other changes ------------- -* Most deprecated functions and methods were removed. +Most long-time deprecated functions and methods were removed: + +- ``etree.clearErrorLog()``, use ``etree.clear_error_log()`` + +- ``etree.useGlobalPythonLog()``, use + ``etree.use_global_python_log()`` + +- ``etree.ElementClassLookup.setFallback()``, use + ``etree.ElementClassLookup.set_fallback()`` + +- ``etree.getDefaultParser()``, use ``etree.get_default_parser()`` + +- ``etree.setDefaultParser()``, use ``etree.set_default_parser()`` + +- ``etree.setElementClassLookup()``, use + ``etree.set_element_class_lookup()`` + + Note that ``parser.setElementClassLookup()`` has not been removed + yet, although ``parser.set_element_class_lookup()`` should be used + instead. + +- ``xpath_evaluator.registerNamespace()``, use + ``xpath_evaluator.register_namespace()`` + +- ``xpath_evaluator.registerNamespaces()``, use + ``xpath_evaluator.register_namespaces()`` + +- ``objectify.setPytypeAttributeTag``, use + ``objectify.set_pytype_attribute_tag`` + +- ``objectify.setDefaultParser()``, use + ``objectify.set_default_parser()`` 2.0.2 (2008-02-22) From scoder at codespeak.net Mon Mar 3 19:43:16 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:43:16 +0100 (CET) Subject: [Lxml-checkins] r52115 - in lxml/trunk: . doc doc/html Message-ID: <20080303184316.48691169EB3@codespeak.net> Author: scoder Date: Mon Mar 3 19:43:14 2008 New Revision: 52115 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/FAQ.txt lxml/trunk/doc/api.txt lxml/trunk/doc/capi.txt lxml/trunk/doc/compatibility.txt lxml/trunk/doc/cssselect.txt lxml/trunk/doc/element_classes.txt lxml/trunk/doc/elementsoup.txt lxml/trunk/doc/extensions.txt lxml/trunk/doc/html/style.css lxml/trunk/doc/lxml2.txt lxml/trunk/doc/lxmlhtml.txt lxml/trunk/doc/mkhtml.py lxml/trunk/doc/objectify.txt lxml/trunk/doc/parsing.txt lxml/trunk/doc/performance.txt lxml/trunk/doc/resolvers.txt lxml/trunk/doc/rest2html.py lxml/trunk/doc/sax.txt lxml/trunk/doc/tutorial.txt lxml/trunk/doc/validation.txt lxml/trunk/doc/xpathxslt.txt Log: r3706 at delle: sbehnel | 2008-03-03 15:53:29 +0100 use Pygments to enable syntax highlighting in docs Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Mar 3 19:43:14 2008 @@ -24,6 +24,11 @@ Other changes ------------- +* Generating the HTML documentation now requires Pygments_, which is + used to enable syntax highlighting for the doctest examples. + +.. _Pygments: http://pygments.org/ + Most long-time deprecated functions and methods were removed: - ``etree.clearErrorLog()``, use ``etree.clear_error_log()`` Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Mon Mar 3 19:43:14 2008 @@ -227,7 +227,9 @@ The ElementTree tree model defines an Element as a container with a tag name, contained text, child Elements and a tail text. This means that whenever you -serialise an Element, you will get all parts of that Element:: +serialise an Element, you will get all parts of that Element: + +.. sourcecode:: pycon >>> from lxml import etree >>> root = etree.XML("texttail") @@ -235,7 +237,9 @@ texttail Here is an example that shows why not serialising the tail would be -even more surprising from an object point of view:: +even more surprising from an object point of view: + +.. sourcecode:: pycon >>> root = etree.Element("test") @@ -484,7 +488,9 @@ You should always try to reproduce the problem with the latest versions of libxml2 and libxslt - and make sure they are used. -``lxml.etree`` can tell you what it runs with:: +``lxml.etree`` can tell you what it runs with: + +.. sourcecode:: python from lxml import etree print "lxml.etree: ", etree.LXML_VERSION @@ -705,7 +711,9 @@ only added between nodes that do not contain data. This is always the case for trees constructed element-by-element, so no problems should be expected here. For parsed trees, a good way to assure that no conflicting whitespace -is left in the tree is the ``remove_blank_text`` option:: +is left in the tree is the ``remove_blank_text`` option: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(remove_blank_text=True) >>> tree = etree.parse(file, parser) Modified: lxml/trunk/doc/api.txt ============================================================================== --- lxml/trunk/doc/api.txt (original) +++ lxml/trunk/doc/api.txt Mon Mar 3 19:43:14 2008 @@ -57,7 +57,9 @@ AttributeError in older versions. The versions of libxml2 and libxslt are available through the attributes ``LIBXML_VERSION`` and ``LIBXSLT_VERSION``. -The following examples usually assume this to be executed first:: +The following examples usually assume this to be executed first: + +.. sourcecode:: pycon >>> from lxml import etree >>> from StringIO import StringIO @@ -85,7 +87,9 @@ ------------------- Compared to the original ElementTree API, lxml.etree has an extended tree -model. It knows about parents and siblings of elements:: +model. It knows about parents and siblings of elements: + +.. sourcecode:: pycon >>> root = etree.Element("root") >>> a = etree.SubElement(root, "a") @@ -102,14 +106,18 @@ Elements always live within a document context in lxml. This implies that there is also a notion of an absolute document root. You can retrieve an -ElementTree for the root node of a document from any of its elements:: +ElementTree for the root node of a document from any of its elements. + +.. sourcecode:: pycon >>> tree = d.getroottree() >>> print tree.getroot().tag root Note that this is different from wrapping an Element in an ElementTree. You -can use ElementTrees to create XML trees with an explicit root node:: +can use ElementTrees to create XML trees with an explicit root node: + +.. sourcecode:: pycon >>> tree = etree.ElementTree(d) >>> print tree.getroot().tag @@ -123,7 +131,9 @@ All operations that you run on such an ElementTree (like XPath, XSLT, etc.) will understand the explicitly chosen root as root node of a document. They will not see any elements outside the ElementTree. However, ElementTrees do -not modify their Elements:: +not modify their Elements: + +.. sourcecode:: pycon >>> element = tree.getroot() >>> print element.tag @@ -143,7 +153,9 @@ --------- The ElementTree API makes Elements iterable to supports iteration over their -children. Using the tree defined above, we get:: +children. Using the tree defined above, we get: + +.. sourcecode:: pycon >>> [ child.tag for child in root ] ['a', 'b', 'c', 'd'] @@ -151,14 +163,18 @@ To iterate in the opposite direction, use the ``reversed()`` function that exists in Python 2.4 and later. -Tree traversal should use the ``element.iter()`` method:: +Tree traversal should use the ``element.iter()`` method: + +.. sourcecode:: pycon >>> [ el.tag for el in root.iter() ] ['root', 'a', 'b', 'c', 'd', 'e'] lxml.etree also supports this, but additionally features an extended API for iteration over the children, following/preceding siblings, ancestors and -descendants of an element, as defined by the respective XPath axis:: +descendants of an element, as defined by the respective XPath axis: + +.. sourcecode:: pycon >>> [ child.tag for child in root.iterchildren() ] ['a', 'b', 'c', 'd'] @@ -178,7 +194,9 @@ implements the 'descendant-or-self' axis in XPath. All of these iterators support an additional ``tag`` keyword argument that -filters the generated elements by tag name:: +filters the generated elements by tag name: + +.. sourcecode:: pycon >>> [ child.tag for child in root.iterchildren(tag='a') ] ['a'] @@ -206,7 +224,9 @@ However, lxml also keeps a global error log of all errors that occurred at the application level. Whenever an exception is raised, you can retrieve the errors that occured and "might have" lead to the problem from the error log -copy attached to the exception:: +copy attached to the exception: + +.. sourcecode:: pycon >>> etree.clear_error_log() >>> broken_xml = ''' @@ -221,7 +241,9 @@ Once you have caught this exception, you can access its ``error_log`` property to retrieve the log entries or filter them by a specific type, error domain or -error level:: +error level: + +.. sourcecode:: pycon >>> log = e.error_log.filter_from_level(etree.ErrorLevels.FATAL) >>> print log @@ -234,14 +256,18 @@ parsing (PARSER) lines 4, column 8 and line 5, column 1 of a string (, or the filename if available). Here, PARSER is the so-called error domain, see ``lxml.etree.ErrorDomains`` for that. You can get it from a log entry -like this:: +like this: + +.. sourcecode:: pycon >>> entry = log[0] >>> print entry.domain_name, entry.type_name, entry.filename PARSER ERR_TAG_NAME_MISMATCH There is also a convenience attribute ``last_error`` that returns the last -error or fatal error that occurred:: +error or fatal error that occurred: + +.. sourcecode:: pycon >>> entry = e.error_log.last_error >>> print entry.domain_name, entry.type_name, entry.filename @@ -264,7 +290,9 @@ lxml.etree has direct support for pretty printing XML output. Functions like ``ElementTree.write()`` and ``tostring()`` support it through a keyword -argument:: +argument: + +.. sourcecode:: pycon >>> root = etree.XML("") >>> print etree.tostring(root) @@ -279,7 +307,9 @@ output. It was added in lxml 2.0. By default, lxml (just as ElementTree) outputs the XML declaration only if it -is required by the standard:: +is required by the standard: + +.. sourcecode:: pycon >>> unicode_root = etree.Element(u"t\u3120st") >>> unicode_root.text = u"t\u0A0Ast" @@ -295,7 +325,9 @@ .. _`Unicode support`: parsing.html#python-unicode-strings You can enable or disable the declaration explicitly by passing another -keyword argument for the serialisation:: +keyword argument for the serialisation: + +.. sourcecode:: pycon >>> print etree.tostring(root, xml_declaration=True) @@ -308,7 +340,9 @@ Note that a standard compliant XML parser will not consider the last line well-formed XML if the encoding is not explicitly provided somehow, e.g. in an -underlying transport protocol:: +underlying transport protocol: + +.. sourcecode:: pycon >>> notxml = etree.tostring(unicode_root, encoding="UTF-16LE", ... xml_declaration=False) @@ -322,7 +356,9 @@ --------------------------- You can let lxml process xinclude statements in a document by calling the -xinclude() method on a tree:: +xinclude() method on a tree: + +.. sourcecode:: pycon >>> data = StringIO('''\ ... @@ -353,7 +389,9 @@ The lxml.etree.ElementTree class has a method write_c14n, which takes a file object as argument. This file object will receive an UTF-8 representation of the canonicalized form of the XML, following the W3C C14N recommendation. For -example:: +example: + +.. sourcecode:: pycon >>> f = StringIO('') >>> tree = etree.parse(f) Modified: lxml/trunk/doc/capi.txt ============================================================================== --- lxml/trunk/doc/capi.txt (original) +++ lxml/trunk/doc/capi.txt Mon Mar 3 19:43:14 2008 @@ -61,7 +61,9 @@ If you really feel like it, you can also interface with lxml.etree straight from C code. All you have to do is include the header file for the public -API, import the ``lxml.etree`` module and then call the import function:: +API, import the ``lxml.etree`` module and then call the import function: + +.. sourcecode:: c /* My C extension */ Modified: lxml/trunk/doc/compatibility.txt ============================================================================== --- lxml/trunk/doc/compatibility.txt (original) +++ lxml/trunk/doc/compatibility.txt Mon Mar 3 19:43:14 2008 @@ -7,7 +7,9 @@ * Importing etree is obviously different; etree uses a lower-case package name, while ElementTree uses a combination of upper-case and - lower case in imports:: + lower case in imports: + + .. sourcecode:: python # etree from lxml.etree import Element @@ -19,7 +21,9 @@ from xml.etree.ElementTree import Element When switching over code from ElementTree to lxml.etree, and you're using - the package name prefix 'ElementTree', you can do the following:: + the package name prefix 'ElementTree', you can do the following: + + .. sourcecode:: python # instead of from elementtree import ElementTree @@ -46,18 +50,24 @@ strings. * ElementTree allows you to place an Element in two different trees at the - same time. Thus, this:: + same time. Thus, this: + + .. sourcecode:: python a = Element('a') b = SubElement(a, 'b') c = Element('c') c.append(b) - will result in the following tree a:: + will result in the following tree a: + + .. sourcecode:: xml - and the following tree c:: + and the following tree c: + + .. sourcecode:: xml @@ -66,11 +76,15 @@ an element can only exist in a single tree at the same time. Adding an element in some tree to another tree will cause this element to be moved. - So, for tree a we will get:: + So, for tree a we will get: + + .. sourcecode:: xml - and for tree c we will get:: + and for tree c we will get: + + .. sourcecode:: xml Modified: lxml/trunk/doc/cssselect.txt ============================================================================== --- lxml/trunk/doc/cssselect.txt (original) +++ lxml/trunk/doc/cssselect.txt Mon Mar 3 19:43:14 2008 @@ -25,7 +25,9 @@ The most important class in the ``cssselect`` module is ``CSSSelector``. It provides the same interface as the XPath_ class, but accepts a CSS selector -expression as input:: +expression as input: + +.. sourcecode:: pycon >>> from lxml.cssselect import CSSSelector >>> sel = CSSSelector('div.content') @@ -35,13 +37,17 @@ 'div.content' The selector actually compiles to XPath, and you can see the -expression by inspecting the object:: +expression by inspecting the object: + +.. sourcecode:: pycon >>> sel.path "descendant-or-self::div[contains(concat(' ', normalize-space(@class), ' '), ' content ')]" To use the selector, simply call it with a document or element -object:: +object: + +.. sourcecode:: pycon >>> from lxml.etree import fromstring >>> h = fromstring('''
Modified: lxml/trunk/doc/element_classes.txt ============================================================================== --- lxml/trunk/doc/element_classes.txt (original) +++ lxml/trunk/doc/element_classes.txt Mon Mar 3 19:43:14 2008 @@ -8,7 +8,9 @@ specific namespace or for an exact element at a specific position in the tree. Custom Elements must inherit from the ``lxml.etree.ElementBase`` class, which -provides the Element interface for subclasses:: +provides the Element interface for subclasses: + +.. sourcecode:: pycon >>> from lxml import etree >>> class HonkElement(etree.ElementBase): @@ -85,7 +87,9 @@ class. For example, setting a different default element class for a parser works as -follows:: +follows: + +.. sourcecode:: pycon >>> parser_lookup = etree.ElementDefaultClassLookup(element=HonkElement) >>> parser = etree.XMLParser() @@ -93,7 +97,9 @@ There is one drawback of the parser based scheme: the ``Element()`` factory does not know about your specialised parser and creates a new document that -deploys the default parser:: +deploys the default parser: + +.. sourcecode:: pycon >>> el = etree.Element("root") >>> print isinstance(el, HonkElement) @@ -101,20 +107,26 @@ You should therefore avoid using this function in code that uses custom classes. The ``makeelement()`` method of parsers provides a simple -replacement:: +replacement: + +.. sourcecode:: pycon >>> el = parser.makeelement("root") >>> print isinstance(el, HonkElement) True If you use a parser at the module level, you can easily redirect a module -level ``Element()`` factory to the parser method by adding code like this:: +level ``Element()`` factory to the parser method by adding code like this: + +.. sourcecode:: pycon >>> MODULE_PARSER = etree.XMLParser() >>> Element = MODULE_PARSER.makeelement While the ``XML()`` and ``HTML()`` factories also depend on the default -parser, you can pass them a different parser as second argument:: +parser, you can pass them a different parser as second argument: + +.. sourcecode:: pycon >>> element = etree.XML("") >>> print isinstance(element, HonkElement) @@ -126,7 +138,9 @@ Whenever you create a document with a parser, it will inherit the lookup scheme and all subsequent element instantiations for this document will use -it:: +it: + +.. sourcecode:: pycon >>> element = etree.fromstring("", parser) >>> print isinstance(element, HonkElement) @@ -149,7 +163,9 @@ element class. Consequently, no further fallbacks are supported, but this scheme is a good fallback for other custom lookup mechanisms. -Usage:: +Usage: + +.. sourcecode:: pycon >>> lookup = etree.ElementDefaultClassLookup() >>> parser = etree.XMLParser() @@ -160,7 +176,9 @@ To change the default element implementation, you can pass your new class to the constructor. While it accepts classes for ``element``, ``comment`` and -``pi`` nodes, most use cases will only override the element class:: +``pi`` nodes, most use cases will only override the element class: + +.. sourcecode:: pycon >>> el = parser.makeelement("myelement") >>> print isinstance(el, HonkElement) @@ -185,7 +203,9 @@ ---------------------- This is an advanced lookup mechanism that supports namespace/tag-name specific -element classes. You can select it by calling:: +element classes. You can select it by calling: + +.. sourcecode:: pycon >>> lookup = etree.ElementNamespaceClassLookup() >>> parser = etree.XMLParser() @@ -199,7 +219,9 @@ This scheme supports a fallback mechanism that is used in the case where the namespace is not found or no class was registered for the element name. Normally, the default class lookup is used here. To change it, pass the -desired fallback lookup scheme to the constructor:: +desired fallback lookup scheme to the constructor: + +.. sourcecode:: pycon >>> fallback = etree.ElementDefaultClassLookup(element=HonkElement) >>> lookup = etree.ElementNamespaceClassLookup(fallback) @@ -211,7 +233,9 @@ This scheme uses a mapping from attribute values to classes. An attribute name is set at initialisation time and is then used to find the corresponding -value. It is set up as follows:: +value. It is set up as follows: + +.. sourcecode:: pycon >>> id_class_mapping = {} # maps attribute values to element classes >>> lookup = etree.AttributeBasedElementClassLookup( @@ -224,7 +248,9 @@ This class uses its fallback if the attribute is not found or its value is not in the mapping. Normally, the default class lookup is used here. If you want -to use the namespace lookup, for example, you can use this code:: +to use the namespace lookup, for example, you can use this code: + +.. sourcecode:: pycon >>> fallback = etree.ElementNamespaceClassLookup() >>> lookup = etree.AttributeBasedElementClassLookup( @@ -237,7 +263,9 @@ --------------------------- This is the most customisable way of finding element classes on a per-element -basis. It allows you to implement a custom lookup scheme in a subclass:: +basis. It allows you to implement a custom lookup scheme in a subclass: + +.. sourcecode:: pycon >>> class MyLookup(etree.CustomElementClassLookup): ... def lookup(self, node_type, document, namespace, name): @@ -263,7 +291,9 @@ elements in the tree have been instantiated as Python Element objects. Luckily, there is a way to do this. The ``PythonElementClassLookup`` -works similar to the custom lookup scheme:: +works similar to the custom lookup scheme: + +.. sourcecode:: pycon >>> class MyLookup(etree.PythonElementClassLookup): ... def lookup(self, document, element): @@ -301,7 +331,9 @@ lxml allows you to implement namespaces, in a rather literal sense. After setting up the namespace class lookup mechanism as described above, you can build a new element namespace (or retrieve an existing one) by calling the -``get_namespace(uri)`` method of the lookup:: +``get_namespace(uri)`` method of the lookup: + +.. sourcecode:: pycon >>> lookup = etree.ElementNamespaceClassLookup() >>> parser = etree.XMLParser() @@ -310,19 +342,25 @@ >>> namespace = lookup.get_namespace('http://hui.de/honk') and then register the new element type with that namespace, say, under the tag -name ``honk``:: +name ``honk``: + +.. sourcecode:: pycon >>> namespace['honk'] = HonkElement After this, you create and use your XML elements through the normal API of -lxml:: +lxml: + +.. sourcecode:: pycon >>> xml = '' >>> honk_element = etree.XML(xml, parser) >>> print honk_element.honking True -The same works when creating elements by hand:: +The same works when creating elements by hand: + +.. sourcecode:: pycon >>> honk_element = parser.makeelement('{http://hui.de/honk}honk', ... honking='true') @@ -339,7 +377,9 @@ In the setup example above, we associated the HonkElement class only with the 'honk' element. If an XML tree contains different elements in the same -namespace, they do not pick up the same implementation:: +namespace, they do not pick up the same implementation: + +.. sourcecode:: pycon >>> xml = '' >>> honk_element = etree.XML(xml, parser) @@ -360,7 +400,9 @@ You may consider following an object oriented approach here. If you build a class hierarchy of element classes, you can also implement a base class for a namespace that is used if no specific element class is provided. Again, you -can just pass None as an element name:: +can just pass None as an element name: + +.. sourcecode:: pycon >>> class HonkNSElement(etree.ElementBase): ... def honk(self): @@ -374,7 +416,9 @@ >>> namespace['honk'] = HonkElement Now you can rely on lxml to always return objects of type HonkNSElement or its -subclasses for elements of this namespace:: +subclasses for elements of this namespace: + +.. sourcecode:: pycon >>> xml = '' >>> honk_element = etree.XML(xml, parser) Modified: lxml/trunk/doc/elementsoup.txt ============================================================================== --- lxml/trunk/doc/elementsoup.txt (original) +++ lxml/trunk/doc/elementsoup.txt Mon Mar 3 19:43:14 2008 @@ -17,17 +17,23 @@ parse a file using BeautifulSoup, and `convert_tree()` to convert a BeautifulSoup tree into a list of top-level Elements. -Here is a document full of tag soup, similar to, but not quite like, HTML:: +Here is a document full of tag soup, similar to, but not quite like, HTML: + +.. sourcecode:: pycon >>> tag_soup = 'Hello</head<body onload=crash()>Hi all<p>' -all you need to do is pass it to the `parse()` function:: +all you need to do is pass it to the `parse()` function: + +.. sourcecode:: pycon >>> from lxml.html.ElementSoup import parse >>> from StringIO import StringIO >>> root = parse(StringIO(tag_soup)) -To see what we have here, you can serialise it:: +To see what we have here, you can serialise it: + +.. sourcecode:: pycon >>> from lxml.etree import tostring >>> print tostring(root, pretty_print=True), Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Mon Mar 3 19:43:14 2008 @@ -2,14 +2,18 @@ ====================================== This document describes how to use Python extension functions in XPath -and XSLT like this:: +and XSLT like this: + +.. sourcecode:: xml <xsl:value-of select="f:myPythonFunction(.//sometag)" /> Here is how an extension function looks like. As the first argument, it always receives a context object (see below). The other arguments are provided by the respective call in the XPath expression, one in -the following examples. Any number of arguments is allowed:: +the following examples. Any number of arguments is allowed: + +.. sourcecode:: pycon >>> def hello(dummy, a): ... return "Hello %s" % a @@ -35,7 +39,9 @@ In order to use a function in XPath/XSLT, it needs to have a (namespaced) name by which it can be called during evaluation. This is done using the FunctionNamespace class. For simplicity, we choose the empty namespace -(None):: +(None): + +.. sourcecode:: pycon >>> from lxml import etree >>> ns = etree.FunctionNamespace(None) @@ -45,7 +51,9 @@ This registers the function `hello` with the name `hello` in the default namespace (None), and the function `loadsofargs` with the name `countargs`. Now we're going to create a document that we can run XPath expressions -against:: +against: + +.. sourcecode:: pycon >>> from lxml import etree >>> from StringIO import StringIO @@ -53,7 +61,9 @@ >>> doc = etree.parse(f) >>> root = doc.getroot() -Done. Now we can have XPath expressions call our new function:: +Done. Now we can have XPath expressions call our new function: + +.. sourcecode:: pycon >>> print root.xpath("hello('world')") Hello world @@ -67,7 +77,9 @@ Note how we call both a Python function (`hello`) and an XPath built-in function (`string`) in exactly the same way. Normally, however, you would want to separate the two in different namespaces. The FunctionNamespace class -allows you to do this:: +allows you to do this: + +.. sourcecode:: pycon >>> ns = etree.FunctionNamespace('http://mydomain.org/myfunctions') >>> ns['hello'] = hello @@ -81,7 +93,9 @@ In the last example, you had to specify a prefix for the function namespace. If you always use the same prefix for a function namespace, you can also -register it with the namespace:: +register it with the namespace: + +.. sourcecode:: pycon >>> ns = etree.FunctionNamespace('http://mydomain.org/myother/functions') >>> ns.prefix = 'es' @@ -106,7 +120,9 @@ Functions get a context object as first parameter. In lxml 1.x, this value was None, but since lxml 2.0 it provides two properties: ``eval_context`` and ``context_node``. The context node is the Element where the current function -is called:: +is called: + +.. sourcecode:: pycon >>> def print_tag(context, nodes): ... print context.context_node.tag, [ n.tag for n in nodes ] @@ -120,7 +136,9 @@ b [] The ``eval_context`` is a dictionary that is local to the evaluation. It -allows functions to keep state:: +allows functions to keep state: + +.. sourcecode:: pycon >>> def print_context(context): ... context.eval_context[context.context_node.tag] = "done" @@ -138,7 +156,9 @@ ------------------- Extension functions work for all ways of evaluating XPath expressions and for -XSL transformations:: +XSL transformations: + +.. sourcecode:: pycon >>> e = etree.XPathEvaluator(doc) >>> print e('es:hello(local-name(/a))') @@ -164,7 +184,9 @@ It is also possible to register namespaces with a single evaluator after its creation. While the following example involves no functions, the idea should -still be clear:: +still be clear: + +.. sourcecode:: pycon >>> f = StringIO('<a xmlns="http://mydomain.org/myfunctions" />') >>> ns_doc = etree.parse(f) @@ -174,14 +196,18 @@ This returns nothing, as we did not ask for the right namespace. When we register the namespace with the evaluator, however, we can access it via a -prefix:: +prefix: + +.. sourcecode:: pycon >>> e.register_namespace('foo', 'http://mydomain.org/myfunctions') >>> e('/foo:a')[0].tag '{http://mydomain.org/myfunctions}a' Note that this prefix mapping is only known to this evaluator, as opposed to -the global mapping of the FunctionNamespace objects:: +the global mapping of the FunctionNamespace objects: + +.. sourcecode:: pycon >>> e2 = etree.XPathEvaluator(ns_doc) >>> e2('/foo:a') @@ -196,7 +222,9 @@ Apart from the global registration of extension functions, there is also a way of making extensions known to a single Evaluator or XSLT. All evaluators and the XSLT object accept a keyword argument ``extensions`` in their constructor. -The value is a dictionary mapping (namespace, name) tuples to functions:: +The value is a dictionary mapping (namespace, name) tuples to functions: + +.. sourcecode:: pycon >>> extensions = {('local-ns', 'local-hello') : hello} >>> namespaces = {'l' : 'local-ns'} @@ -206,7 +234,9 @@ Hello Haegar For larger numbers of extension functions, you can define classes or modules -and use the ``Extension`` helper:: +and use the ``Extension`` helper: + +.. sourcecode:: pycon >>> class MyExt: ... def function1(self, _, arg): @@ -232,7 +262,9 @@ The additional ``ns`` keyword argument takes a namespace URI or ``None`` (also if left out) for the default namespace. The following -examples will therefore all do the same thing:: +examples will therefore all do the same thing: + +.. sourcecode:: pycon >>> functions = ('function1', 'function2', 'function3') >>> extensions = etree.Extension( ext_module, functions ) @@ -260,7 +292,9 @@ >>> print e('function1(function2(function3(string(b))))') 123Haegar -For convenience, you can also pass a sequence of extensions:: +For convenience, you can also pass a sequence of extensions: + +.. sourcecode:: pycon >>> extensions1 = etree.Extension(ext_module) >>> extensions2 = etree.Extension(ext_module, ns='local-ns') @@ -278,7 +312,9 @@ Extension functions can return any data type for which there is an XPath equivalent (see the documentation on `XPath return values`). This includes numbers, boolean values, elements and lists of elements. Note that integers -will also be returned as floats:: +will also be returned as floats: + +.. sourcecode:: pycon >>> def returnsFloat(_): ... return 1.7 @@ -309,7 +345,9 @@ As the last example shows, you can pass the results of functions back into the XPath expression. Elements and sequences of elements are treated as -XPath node-sets:: +XPath node-sets: + +.. sourcecode:: pycon >>> def returnsNodeSet(_): ... results1 = etree.Element('results1') @@ -352,7 +390,9 @@ Only the elements and their children are passed on, no outlying parents or tail texts will be available in the result. This also means that in the above example, the `subresult` elements in `results2` and `results3` are no longer -identical within the node-set, they belong to independent trees:: +identical within the node-set, they belong to independent trees: + +.. sourcecode:: pycon >>> print r[1][-1].tag, r[2].tag subresult subresult Modified: lxml/trunk/doc/html/style.css ============================================================================== --- lxml/trunk/doc/html/style.css (original) +++ lxml/trunk/doc/html/style.css Mon Mar 3 19:43:14 2008 @@ -242,7 +242,7 @@ code { color: Black; - background-color: #cccccc; + background-color: #f0f0f0; font-family: "Courier New", Courier, monospace; } @@ -250,6 +250,69 @@ padding: 0.5em; border: 1px solid #8cacbb; color: Black; - background-color: #cccccc; + background-color: #f0f0f0; font-family: "Courier New", Courier, monospace; } + +/* Syntax highlighting */ + +.syntax { background: #f0f0f0; } +.syntax .c { color: #60a0b0; font-style: italic } /* Comment */ +.syntax .err { border: 1px solid #FF0000 } /* Error */ +.syntax .k { color: #007020; font-weight: bold } /* Keyword */ +.syntax .o { color: #666666 } /* Operator */ +.syntax .cm { color: #60a0b0; font-style: italic } /* Comment.Multiline */ +.syntax .cp { color: #007020 } /* Comment.Preproc */ +.syntax .c1 { color: #60a0b0; font-style: italic } /* Comment.Single */ +.syntax .cs { color: #60a0b0; background-color: #fff0f0 } /* Comment.Special */ +.syntax .gd { color: #A00000 } /* Generic.Deleted */ +.syntax .ge { font-style: italic } /* Generic.Emph */ +.syntax .gr { color: #FF0000 } /* Generic.Error */ +.syntax .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.syntax .gi { color: #00A000 } /* Generic.Inserted */ +.syntax .go { color: #404040 } /* Generic.Output */ +.syntax .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.syntax .gs { font-weight: bold } /* Generic.Strong */ +.syntax .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.syntax .gt { color: #0040D0 } /* Generic.Traceback */ +.syntax .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.syntax .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.syntax .kp { color: #007020 } /* Keyword.Pseudo */ +.syntax .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.syntax .kt { color: #902000 } /* Keyword.Type */ +.syntax .m { color: #40a070 } /* Literal.Number */ +.syntax .s { color: #4070a0 } /* Literal.String */ +.syntax .na { color: #4070a0 } /* Name.Attribute */ +.syntax .nb { color: #007020 } /* Name.Builtin */ +.syntax .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.syntax .no { color: #60add5 } /* Name.Constant */ +.syntax .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.syntax .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.syntax .ne { color: #007020 } /* Name.Exception */ +.syntax .nf { color: #06287e } /* Name.Function */ +.syntax .nl { color: #002070; font-weight: bold } /* Name.Label */ +.syntax .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.syntax .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.syntax .nv { color: #bb60d5 } /* Name.Variable */ +.syntax .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.syntax .w { color: #bbbbbb } /* Text.Whitespace */ +.syntax .mf { color: #40a070 } /* Literal.Number.Float */ +.syntax .mh { color: #40a070 } /* Literal.Number.Hex */ +.syntax .mi { color: #40a070 } /* Literal.Number.Integer */ +.syntax .mo { color: #40a070 } /* Literal.Number.Oct */ +.syntax .sb { color: #4070a0 } /* Literal.String.Backtick */ +.syntax .sc { color: #4070a0 } /* Literal.String.Char */ +.syntax .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.syntax .s2 { color: #4070a0 } /* Literal.String.Double */ +.syntax .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.syntax .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.syntax .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.syntax .sx { color: #c65d09 } /* Literal.String.Other */ +.syntax .sr { color: #235388 } /* Literal.String.Regex */ +.syntax .s1 { color: #4070a0 } /* Literal.String.Single */ +.syntax .ss { color: #517918 } /* Literal.String.Symbol */ +.syntax .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.syntax .vc { color: #bb60d5 } /* Name.Variable.Class */ +.syntax .vg { color: #bb60d5 } /* Name.Variable.Global */ +.syntax .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.syntax .il { color: #40a070 } /* Literal.Number.Integer.Long */ \ No newline at end of file Modified: lxml/trunk/doc/lxml2.txt ============================================================================== --- lxml/trunk/doc/lxml2.txt (original) +++ lxml/trunk/doc/lxml2.txt Mon Mar 3 19:43:14 2008 @@ -208,11 +208,15 @@ A very useful module for doctests based on XML or HTML is ``lxml.doctestcompare``. It provides a relaxed comparison mechanism for XML and HTML in doctests. Using it for XML comparisons is as -simple as:: +simple as: + +.. sourcecode:: pycon >>> import lxml.usedoctest -and for HTML comparisons:: +and for HTML comparisons: + +.. sourcecode:: pycon >>> import lxml.html.usedoctest Modified: lxml/trunk/doc/lxmlhtml.txt ============================================================================== --- lxml/trunk/doc/lxmlhtml.txt (original) +++ lxml/trunk/doc/lxmlhtml.txt Mon Mar 3 19:43:14 2008 @@ -150,12 +150,16 @@ Luckily, lxml provides the ``lxml.doctestcompare`` module that supports relaxed comparison of XML and HTML pages and provides a readable diff in the output when a test fails. The HTML comparison is -most easily used by importing the ``usedoctest`` module in a doctest:: +most easily used by importing the ``usedoctest`` module in a doctest: + +.. sourcecode:: pycon >>> import lxml.html.usedoctest Now, if you have a HTML document and want to compare it to an expected result -document in a doctest, you can do the following:: +document in a doctest, you can do the following: + +.. sourcecode:: pycon >>> import lxml.html >>> html = lxml.html.fromstring('''\ @@ -195,7 +199,9 @@ lxml.html comes with a predefined HTML vocabulary for the `E-factory`_, originally written by Fredrik Lundh. This allows you to quickly generate HTML -pages and fragments:: +pages and fragments: + +.. sourcecode:: pycon >>> from lxml.html import builder as E >>> from lxml.html import usedoctest @@ -387,7 +393,9 @@ Note that you can change any of these attributes (values, method, action, etc) and then serialize the form to see the updated values. -You can, for instance, do:: +You can, for instance, do: + +.. sourcecode:: pycon >>> from lxml.html import fromstring, tostring >>> form_page = fromstring('''<html><body><form> @@ -441,7 +449,9 @@ argument, which is a function with the signature ``open_http(method, url, values)``. -Example:: +Example: + +.. sourcecode:: pycon >>> from lxml.html import parse, submit_form >>> page = parse('http://tinyurl.com').getroot() @@ -458,7 +468,9 @@ CSS style annotations and much more. Say, you have an evil web page from an untrusted source that contains lots of -content that upsets browsers and tries to run evil code on the client side:: +content that upsets browsers and tries to run evil code on the client side: + +.. sourcecode:: pycon >>> html = '''\ ... <html> @@ -488,7 +500,9 @@ ... </html>''' To remove the all suspicious content from this unparsed document, use the -``clean_html`` function:: +``clean_html`` function: + +.. sourcecode:: pycon >>> from lxml.html.clean import clean_html @@ -511,7 +525,9 @@ </html> The ``Cleaner`` class supports several keyword arguments to control exactly -which content is removed:: +which content is removed: + +.. sourcecode:: pycon >>> from lxml.html.clean import Cleaner @@ -637,6 +653,8 @@ Example of ``htmldiff``: +.. sourcecode:: pycon + >>> from lxml.html.diff import htmldiff, html_annotate >>> doc1 = '''<p>Here is some text.</p>''' >>> doc2 = '''<p>Here is <b>a lot</b> of <i>text</i>.</p>''' @@ -658,7 +676,9 @@ argument, ``markup``. This is a function like ``markup(text, version)`` that returns the given text marked up with the given version. The default version, the output of which you see in the -example, looks like:: +example, looks like: + +.. sourcecode:: python def default_markup(text, version): return '<span title="%s">%s</span>' % ( @@ -673,7 +693,9 @@ This example parses the `hCard <http://microformats.org/wiki/hcard>`_ microformat. -First we get the page:: +First we get the page: + +.. sourcecode:: pycon >>> import urllib >>> from lxml.html import fromstring @@ -682,7 +704,9 @@ >>> doc = fromstring(content) >>> doc.make_links_absolute(url) -Then we create some objects to put the information in:: +Then we create some objects to put the information in: + +.. sourcecode:: pycon >>> class Card(object): ... def __init__(self, **kw): @@ -692,7 +716,9 @@ ... def __init__(self, phone, types=()): ... self.phone, self.types = phone, types -And some generally handy functions for microformats:: +And some generally handy functions for microformats: + +.. sourcecode:: pycon >>> def get_text(el, class_name): ... els = el.find_class(class_name) @@ -708,7 +734,9 @@ ... # Ideally this would parse street, etc. ... return el.find_class('adr') -Then the parsing:: +Then the parsing: + +.. sourcecode:: pycon >>> for el in doc.find_class('hcard'): ... card = Card() Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Mon Mar 3 19:43:14 2008 @@ -84,7 +84,7 @@ tag = el.tag if tag[0] != '{': el.tag = "{http://www.w3.org/1999/xhtml}" + tag - current_menu = find_menu(menu_root, name=replace_invalid('', name)) + current_menu = find_menu(menu_root, name=replace_invalid(' ', name)) if current_menu: for submenu in current_menu: submenu.set("class", submenu.get("class", ""). @@ -137,11 +137,11 @@ build_menu(tree, basename, section_head) - # also convert INSTALL.txt and CHANGES.txt + # also convert CHANGES.txt rest2html(script, os.path.join(lxml_path, 'CHANGES.txt'), os.path.join(dirname, 'changes-%s.html' % release), - stylesheet_url) + '') # integrate menu for tree, basename, outpath in trees.itervalues(): Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Mar 3 19:43:14 2008 @@ -42,7 +42,9 @@ To set up and use ``objectify``, you need both the ``lxml.etree`` -module and ``lxml.objectify``:: +module and ``lxml.objectify``: + +.. sourcecode:: pycon >>> from lxml import etree >>> from lxml import objectify @@ -56,7 +58,9 @@ code using lxml.objectify. To make the doctests in this document look a little nicer, we also use -this:: +this: + +.. sourcecode:: pycon >>> import lxml.usedoctest @@ -77,7 +81,9 @@ As with ``lxml.etree``, you can either create an ``objectify`` tree by parsing an XML document or by building one from scratch. To parse a document, just use the ``parse()`` or ``fromstring()`` functions of -the module:: +the module: + +.. sourcecode:: pycon >>> from StringIO import StringIO >>> fileobject = StringIO('<test/>') @@ -91,14 +97,18 @@ True To build a new tree in memory, ``objectify`` replicates the standard -factory function ``Element()`` from ``lxml.etree``:: +factory function ``Element()`` from ``lxml.etree``: + +.. sourcecode:: pycon >>> obj_el = objectify.Element("new") >>> print isinstance(obj_el, objectify.ObjectifiedElement) True After creating such an Element, you can use the `usual API`_ of -lxml.etree to add SubElements to the tree:: +lxml.etree to add SubElements to the tree: + +.. sourcecode:: pycon >>> child = etree.SubElement(obj_el, "newchild", attr="value") @@ -107,7 +117,9 @@ New subelements will automatically inherit the objectify behaviour from their tree. However, all independent elements that you create through the ``Element()`` factory of lxml.etree (instead of objectify) -will not support the ``objectify`` API by themselves:: +will not support the ``objectify`` API by themselves: + +.. sourcecode:: pycon >>> subel = etree.SubElement(obj_el, "sub") >>> print isinstance(subel, objectify.ObjectifiedElement) @@ -123,7 +135,9 @@ The main idea behind the ``objectify`` API is to hide XML element access behind the usual object attribute access pattern. Asking an element for an -attribute will return the sequence of children with corresponding tag names:: +attribute will return the sequence of children with corresponding tag names: + +.. sourcecode:: pycon >>> root = objectify.Element("root") >>> b = etree.SubElement(root, "b") @@ -139,7 +153,9 @@ >>> root.index(root.b[1]) 1 -For convenience, you can omit the index '0' to access the first child:: +For convenience, you can omit the index '0' to access the first child: + +.. sourcecode:: pycon >>> print root.b.tag b @@ -147,7 +163,9 @@ 0 >>> del root.b -Iteration and slicing also obey the requested tag:: +Iteration and slicing also obey the requested tag: + +.. sourcecode:: pycon >>> x1 = etree.SubElement(root, "x") >>> x2 = etree.SubElement(root, "x") @@ -168,7 +186,9 @@ If you want to iterate over all children or need to provide a specific namespace for the tag, use the ``iterchildren()`` method. Like the other -methods for iteration, it supports an optional tag keyword argument:: +methods for iteration, it supports an optional tag keyword argument: + +.. sourcecode:: pycon >>> [ el.tag for el in root.iterchildren() ] ['b', 'x', 'x'] @@ -179,7 +199,9 @@ >>> [ el.tag for el in root.b ] ['b'] -XML attributes are accessed as in the normal ElementTree API:: +XML attributes are accessed as in the normal ElementTree API: + +.. sourcecode:: pycon >>> c = etree.SubElement(root, "c", myattr="someval") >>> print root.c.get("myattr") @@ -192,7 +214,9 @@ In addition to the normal ElementTree API for appending elements to trees, subtrees can also be added by assigning them to object attributes. In this case, the subtree is automatically deep copied and the tag name of its root is -updated to match the attribute name:: +updated to match the attribute name: + +.. sourcecode:: pycon >>> el = objectify.Element("yet_another_child") >>> root.new_child = el @@ -205,13 +229,17 @@ >>> [ el.tag for el in root.y ] ['y', 'y'] -The latter is a short form for operations on the full slice:: +The latter is a short form for operations on the full slice: + +.. sourcecode:: pycon >>> root.y[:] = [ objectify.Element("y") ] >>> [ el.tag for el in root.y ] ['y'] -You can also replace children that way:: +You can also replace children that way: + +.. sourcecode:: pycon >>> child1 = etree.SubElement(root, "child") >>> child2 = etree.SubElement(root, "child") @@ -228,7 +256,9 @@ >>> print root.child[2].sub.tag sub -Note that special care must be taken when changing the tag name of an element:: +Note that special care must be taken when changing the tag name of an element: + +.. sourcecode:: pycon >>> print root.b.tag b @@ -244,7 +274,9 @@ Tree generation with the E-factory ---------------------------------- -To simplify the generation of trees even further, you can use the E-factory:: +To simplify the generation of trees even further, you can use the E-factory: + +.. sourcecode:: pycon >>> E = objectify.E >>> root = E.root( @@ -262,7 +294,9 @@ <d py:pytype="str" tell="me">how</d> </root> -This allows you to write up a specific language in tags:: +This allows you to write up a specific language in tags: + +.. sourcecode:: pycon >>> ROOT = objectify.E.root >>> TITLE = objectify.E.title @@ -282,7 +316,9 @@ ``objectify.E`` is an instance of ``objectify.ElementMaker``. By default, it creates pytype annotated Elements without a namespace. You can switch off the pytype annotation by passing False to the ``annotate`` keyword argument of the -constructor. You can also pass a default namespace and an ``nsmap``:: +constructor. You can also pass a default namespace and an ``nsmap``: + +.. sourcecode:: pycon >>> myE = objectify.ElementMaker(annotate=False, ... namespace="http://my/ns", nsmap={None : "http://my/ns"}) @@ -300,7 +336,9 @@ Namespaces are handled mostly behind the scenes. If you access a child of an Element without specifying a namespace, the lookup will use the namespace of -the parent:: +the parent: + +.. sourcecode:: pycon >>> root = objectify.Element("{ns}root") >>> b = etree.SubElement(root, "{ns}b") @@ -313,18 +351,24 @@ ... AttributeError: no such child: {ns}c -You can access elements with different namespaces via ``getattr()``:: +You can access elements with different namespaces via ``getattr()``: + +.. sourcecode:: pycon >>> print getattr(root, "{other}c").tag {other}c -For convenience, there is also a quick way through item access:: +For convenience, there is also a quick way through item access: + +.. sourcecode:: pycon >>> print root["{other}c"].tag {other}c The same approach must be used to access children with tag names that are not -valid Python identifiers:: +valid Python identifiers: + +.. sourcecode:: pycon >>> el = etree.SubElement(root, "{ns}tag-name") >>> print root["tag-name"].tag @@ -348,7 +392,9 @@ >>> print root["tag-name"][1].child.tag {ns}child -or for names that have a special meaning in lxml.objectify:: +or for names that have a special meaning in lxml.objectify: + +.. sourcecode:: pycon >>> root = objectify.XML("<root><text>TEXT</text></root>") @@ -375,7 +421,9 @@ .. _`documentation on validation`: validation.html First of all, we need a parser that knows our schema, so let's say we -parse the schema from a file-like object (or file or filename):: +parse the schema from a file-like object (or file or filename): + +.. sourcecode:: pycon >>> from StringIO import StringIO >>> f = StringIO('''\ @@ -392,13 +440,17 @@ When creating the validating parser, we must make sure it `returns objectify trees`_. This is best done with the ``makeparser()`` -function:: +function: + +.. sourcecode:: pycon >>> parser = objectify.makeparser(schema = schema) .. _`returns objectify trees`: #advance-element-class-lookup -Now we can use it to parse a valid document:: +Now we can use it to parse a valid document: + +.. sourcecode:: pycon >>> xml = "<a><b>test</b></a>" >>> a = objectify.fromstring(xml, parser) @@ -406,7 +458,9 @@ >>> print a.b test -Or an invalid document:: +Or an invalid document: + +.. sourcecode:: pycon >>> xml = "<a><b>test</b><c/></a>" >>> a = objectify.fromstring(xml, parser) @@ -421,7 +475,9 @@ ========== For both convenience and speed, objectify supports its own path language, -represented by the ``ObjectPath`` class:: +represented by the ``ObjectPath`` class: + +.. sourcecode:: pycon >>> root = objectify.Element("{ns}root") >>> b1 = etree.SubElement(root, "{ns}b") @@ -465,7 +521,9 @@ >>> print find(root).tag {ns}b -Apart from strings, ObjectPath also accepts lists of path segments:: +Apart from strings, ObjectPath also accepts lists of path segments: + +.. sourcecode:: pycon >>> find = objectify.ObjectPath(['root', 'b', 'c']) >>> print find(root).tag @@ -476,7 +534,9 @@ {ns}b You can also use relative paths starting with a '.' to ignore the actual root -element and only inherit its namespace:: +element and only inherit its namespace: + +.. sourcecode:: pycon >>> find = objectify.ObjectPath(".b[1]") >>> print find(root).tag @@ -498,13 +558,17 @@ ... AttributeError: no such child: {other}unknown -For convenience, a single dot represents the empty ObjectPath (identity):: +For convenience, a single dot represents the empty ObjectPath (identity): + +.. sourcecode:: pycon >>> find = objectify.ObjectPath(".") >>> print find(root).tag {ns}root -ObjectPath objects can be used to manipulate trees:: +ObjectPath objects can be used to manipulate trees: + +.. sourcecode:: pycon >>> root = objectify.Element("{ns}root") @@ -532,7 +596,9 @@ >>> [ el.text for el in path.find(root) ] ['my value', 'my new value'] -As with attribute assignment, ``setattr()`` accepts lists:: +As with attribute assignment, ``setattr()`` accepts lists: + +.. sourcecode:: pycon >>> path.setattr(root, ["v1", "v2", "v3"]) >>> [ el.text for el in path.find(root) ] @@ -541,7 +607,9 @@ Note, however, that indexing is only supported in this context if the children exist. Indexing of non existing children will not extend or create a list of -such children but raise an exception:: +such children but raise an exception: + +.. sourcecode:: pycon >>> path = objectify.ObjectPath(".{non}existing[1]") >>> path.setattr(root, "my value") @@ -559,7 +627,9 @@ The objectify module knows about Python data types and tries its best to let element content behave like them. For example, they support the normal math -operators:: +operators: + +.. sourcecode:: pycon >>> root = objectify.fromstring( ... "<root><a>5</a><b>11</b><c>true</c><d>hoi</d></root>") @@ -591,7 +661,9 @@ However, data elements continue to provide the objectify API. This means that sequence operations such as ``len()``, slicing and indexing (e.g. of strings) cannot behave as the Python types. Like all other tree elements, they show -the normal slicing behaviour of objectify elements:: +the normal slicing behaviour of objectify elements: + +.. sourcecode:: pycon >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>") >>> print root.a + ' me' # behaves like a string, right? @@ -611,7 +683,9 @@ If you need to run sequence operations on data types, you must ask the API for the *real* Python value. The string value is always available through the normal ElementTree ``.text`` attribute. Additionally, all data classes -provide a ``.pyval`` attribute that returns the value as plain Python type:: +provide a ``.pyval`` attribute that returns the value as plain Python type: + +.. sourcecode:: pycon >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>") >>> root.a.text @@ -625,7 +699,9 @@ 5 Note, however, that both attributes are read-only in objectify. If you want -to change values, just assign them directly to the attribute:: +to change values, just assign them directly to the attribute: + +.. sourcecode:: pycon >>> root.a.text = "25" Traceback (most recent call last): @@ -652,7 +728,9 @@ To see the data types that are currently used, you can call the module level ``dump()`` function that returns a recursive string representation for -elements:: +elements: + +.. sourcecode:: pycon >>> root = objectify.fromstring(""" ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> @@ -677,7 +755,9 @@ d = None [NoneElement] * xsi:nil = 'true' -You can freely switch between different types for the same child:: +You can freely switch between different types for the same child: + +.. sourcecode:: pycon >>> root = objectify.fromstring("<root><a>5</a></root>") >>> print objectify.dump(root) @@ -722,7 +802,9 @@ Normally, elements use the standard string representation for str() that is provided by lxml.etree. You can enable a pretty-print representation for -objectify elements like this:: +objectify elements like this: + +.. sourcecode:: pycon >>> objectify.enableRecursiveStr() @@ -749,7 +831,9 @@ d = None [NoneElement] * xsi:nil = 'true' -This behaviour can be switched off in the same way:: +This behaviour can be switched off in the same way: + +.. sourcecode:: pycon >>> objectify.enableRecursiveStr(False) @@ -796,7 +880,9 @@ The "type hint" mechanism deploys an XML attribute defined as ``lxml.objectify.PYTYPE_ATTRIBUTE``. It may contain any of the following -string values: int, long, float, str, unicode, NoneType:: +string values: int, long, float, str, unicode, NoneType: + +.. sourcecode:: pycon >>> print objectify.PYTYPE_ATTRIBUTE {http://codespeak.net/lxml/objectify/pytype}pytype @@ -821,7 +907,9 @@ attribute through the ``set_pytype_attribute_tag(tag)`` module function, in case your application ever needs to. There is also a utility function ``annotate()`` that recursively generates this -attribute for the elements of a tree:: +attribute for the elements of a tree: + +.. sourcecode:: pycon >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>") >>> print objectify.dump(root) @@ -844,7 +932,9 @@ A second way of specifying data type information uses XML Schema types as element annotations. Objectify knows those that can be mapped to normal -Python types:: +Python types: + +.. sourcecode:: pycon >>> root = objectify.fromstring('''\ ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" @@ -864,7 +954,9 @@ * xsi:type = 'xsd:string' Again, there is a utility function ``xsiannotate()`` that recursively -generates the "xsi:type" attribute for the elements of a tree:: +generates the "xsi:type" attribute for the elements of a tree: + +.. sourcecode:: pycon >>> root = objectify.fromstring('''\ ... <root><a>test</a><b>5</b><c>true</c></root> @@ -891,7 +983,9 @@ `Defining additional data classes`_. The utility function ``deannotate()`` can be used to get rid of 'py:pytype' -and/or 'xsi:type' information:: +and/or 'xsi:type' information: + +.. sourcecode:: pycon >>> root = objectify.fromstring('''\ ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" @@ -925,7 +1019,9 @@ For convenience, the ``DataElement()`` factory creates an Element with a Python value in one step. You can pass the required Python type name or the -XSI type name:: +XSI type name: + +.. sourcecode:: pycon >>> root = objectify.Element("root") >>> root.x = objectify.DataElement(5, _pytype="long") @@ -949,7 +1045,9 @@ * xsi:type = 'xsd:integer' XML Schema types reside in the XML schema namespace thus ``DataElement()`` -tries to correctly prefix the xsi:type attribute value for you:: +tries to correctly prefix the xsi:type attribute value for you: + +.. sourcecode:: pycon >>> root = objectify.Element("root") >>> root.s = objectify.DataElement(5, _xsi="string") @@ -960,7 +1058,9 @@ <s xsi:type="xsd:string">5</s> </root> -``DataElement()`` uses a default nsmap to set these prefixes:: +``DataElement()`` uses a default nsmap to set these prefixes: + +.. sourcecode:: pycon >>> el = objectify.DataElement('5', _xsi='string') >>> for prefix, namespace in el.nsmap.items(): @@ -973,7 +1073,9 @@ xsd:string While you can set custom namespace prefixes, it is necessary to provide valid -namespace information if you choose to do so:: +namespace information if you choose to do so: + +.. sourcecode:: pycon >>> el = objectify.DataElement('5', _xsi='foo:string', ... nsmap={'foo': 'http://www.w3.org/2001/XMLSchema'}) @@ -987,7 +1089,9 @@ foo:string Note how lxml chose a default prefix for the XML Schema Instance -namespace. We can override it as in the following example:: +namespace. We can override it as in the following example: + +.. sourcecode:: pycon >>> el = objectify.DataElement('5', _xsi='foo:string', ... nsmap={'foo': 'http://www.w3.org/2001/XMLSchema', @@ -1004,7 +1108,9 @@ Care must be taken if different namespace prefixes have been used for the same namespace. Namespace information gets merged to avoid duplicate definitions when adding a new sub-element to a tree, but this mechanism does not adapt the -prefixes of attribute values:: +prefixes of attribute values: + +.. sourcecode:: pycon >>> root = objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""") >>> print etree.tostring(root, pretty_print=True) @@ -1022,7 +1128,9 @@ It is your responsibility to fix the prefixes of attribute values if you choose to deviate from the standard prefixes. A convenient way to do this for -xsi:type attributes is to use the ``xsiannotate()`` utility:: +xsi:type attributes is to use the ``xsiannotate()`` utility: + +.. sourcecode:: pycon >>> objectify.xsiannotate(root) >>> print etree.tostring(root, pretty_print=True) @@ -1045,7 +1153,9 @@ to set their type conversion function (string -> numeric Python type). This call should be placed into the element ``_init()`` method. -The registration of data classes uses the ``PyType`` class:: +The registration of data classes uses the ``PyType`` class: + +.. sourcecode:: pycon >>> class ChristmasDate(objectify.ObjectifiedDataElement): ... def call_santa(self): @@ -1067,14 +1177,18 @@ does not raise a ValueError/TypeError exception when applied to the element text. -If you want, you can also register this type under an XML Schema type name:: +If you want, you can also register this type under an XML Schema type name: + +.. sourcecode:: pycon >>> xmas_type.xmlSchemaTypes = ("date",) XML Schema types will be considered if the element has an ``xsi:type`` attribute that specifies its data type. The line above binds the XSD type ``date`` to the newly defined Python type. Note that this must be done before -the next step, which is to register the type. Then you can use it:: +the next step, which is to register the type. Then you can use it: + +.. sourcecode:: pycon >>> xmas_type.register() @@ -1096,7 +1210,9 @@ the dependencies of already registered types. If you provide XML Schema type information, this will override the type check -function defined above:: +function defined above: + +.. sourcecode:: pycon >>> root = objectify.fromstring('''\ ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> @@ -1108,7 +1224,9 @@ >>> root.a.call_santa() Ho ho ho! -To unregister a type, call its ``unregister()`` method:: +To unregister a type, call its ``unregister()`` method: + +.. sourcecode:: pycon >>> root.a.call_santa() Ho ho ho! @@ -1142,11 +1260,15 @@ this alters the document infoset, so if you consider the removed spaces as data in your specific use case, you should go with a normal parser and just set the element class lookup. Most applications, -however, will work fine with the following setup:: +however, will work fine with the following setup: + +.. sourcecode:: pycon >>> parser = objectify.makeparser(remove_blank_text=True) -What this does internally, is:: +What this does internally, is: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(remove_blank_text=True) @@ -1159,7 +1281,9 @@ however, you have to take care that the namespace classes inherit from ``objectify.ObjectifiedElement``, not only from the normal ``lxml.etree.ElementBase``, so that they support the ``objectify`` -API. The above setup code then becomes:: +API. The above setup code then becomes: + +.. sourcecode:: pycon >>> lookup = etree.ElementNamespaceClassLookup( ... objectify.ObjectifyElementClassLookup() ) Modified: lxml/trunk/doc/parsing.txt ============================================================================== --- lxml/trunk/doc/parsing.txt (original) +++ lxml/trunk/doc/parsing.txt Mon Mar 3 19:43:14 2008 @@ -21,7 +21,9 @@ 4.1 Serialising to Unicode strings -The usual setup procedure:: +The usual setup procedure: + +.. sourcecode:: pycon >>> from lxml import etree >>> from StringIO import StringIO @@ -33,7 +35,9 @@ Parsers are represented by parser objects. There is support for parsing both XML and (broken) HTML. Note that XHTML is best parsed as XML, parsing it with the HTML parser can lead to unexpected results. Here is a simple example for -parsing XML from an in-memory string:: +parsing XML from an in-memory string: + +.. sourcecode:: pycon >>> xml = '<a xmlns="test"><b xmlns="test"/></a>' @@ -42,7 +46,9 @@ <a xmlns="test"><b xmlns="test"/></a> To read from a file or file-like object, you can use the ``parse()`` function, -which returns an ``ElementTree`` object:: +which returns an ``ElementTree`` object: + +.. sourcecode:: pycon >>> tree = etree.parse(StringIO(xml)) >>> print etree.tostring(tree.getroot()) @@ -50,7 +56,9 @@ Note how the ``parse()`` function reads from a file-like object here. If parsing is done from a real file, it is more common (and also somewhat more -efficient) to pass a filename:: +efficient) to pass a filename: + +.. sourcecode:: pycon >>> tree = etree.parse("doc/test.xml") @@ -59,7 +67,9 @@ If you want to parse from memory and still provide a base URL for the document (e.g. to support relative paths in an XInclude), you can pass the ``base_url`` -keyword argument:: +keyword argument: + +.. sourcecode:: pycon >>> root = etree.fromstring(xml, base_url="http://where.it/is/from.xml") @@ -68,7 +78,9 @@ -------------- The parsers accept a number of setup options as keyword arguments. The above -example is easily extended to clean up namespaces during parsing:: +example is easily extended to clean up namespaces during parsing: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(ns_clean=True) >>> tree = etree.parse(StringIO(xml), parser) @@ -105,7 +117,9 @@ --------- Parsers have an ``error_log`` property that lists the errors of the -last parser run:: +last parser run: + +.. sourcecode:: pycon >>> parser = etree.XMLParser() >>> print len(parser.error_log) @@ -132,7 +146,9 @@ HTML parsing is similarly simple. The parsers have a ``recover`` keyword argument that the HTMLParser sets by default. It lets libxml2 try its best to return something usable without raising an exception. You should use libxml2 -version 2.6.21 or newer to take advantage of this feature:: +version 2.6.21 or newer to take advantage of this feature: + +.. sourcecode:: pycon >>> broken_html = "<html><head><title>test<body><h1>page title</h3>" @@ -150,7 +166,9 @@ </html> Lxml has an HTML function, similar to the XML shortcut known from -ElementTree:: +ElementTree: + +.. sourcecode:: pycon >>> html = etree.HTML(broken_html) >>> print etree.tostring(html, pretty_print=True), @@ -178,7 +196,9 @@ The use of the libxml2 parsers makes some additional information available at the API level. Currently, ElementTree objects can access the DOCTYPE information provided by a parsed document, as well as the XML version and the -original encoding:: +original encoding: + +.. sourcecode:: pycon >>> pub_id = "-//W3C//DTD XHTML 1.0 Transitional//EN" >>> sys_url = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" @@ -207,7 +227,9 @@ .. _`As in ElementTree`: http://effbot.org/elementtree/elementtree-xmlparser.htm `As in ElementTree`_, and similar to a SAX event handler, you can pass -a target object to the parser:: +a target object to the parser: + +.. sourcecode:: pycon >>> class EchoTarget: ... def start(self, tag, attrib): @@ -248,7 +270,9 @@ .. _`ElementTree parsers`: http://effbot.org/elementtree/elementtree-xmlparser.htm -To start parsing with a feed parser, just call its ``feed()`` method:: +To start parsing with a feed parser, just call its ``feed()`` method: + +.. sourcecode:: pycon >>> parser = etree.XMLParser() @@ -257,7 +281,9 @@ When you are done parsing, you **must** call the ``close()`` method to retrieve the root Element of the parse result document, and to unlock the -parser:: +parser: + +.. sourcecode:: pycon >>> root = parser.close() @@ -280,7 +306,9 @@ ``feed_error_log``. Errors in the feed parser do not show up in the normal ``error_log`` and vice versa. -You can also combine the feed parser interface with the target parser:: +You can also combine the feed parser interface with the target parser: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(target = EchoTarget()) @@ -312,7 +340,9 @@ The 'start' and 'end' events represent opening and closing elements and are accompanied by the respective element. By default, only 'end' events are -generated:: +generated: + +.. sourcecode:: pycon >>> xml = '''\ ... <root> @@ -330,12 +360,16 @@ end {testns}empty-element end root -The resulting tree is available through the ``root`` property of the iterator:: +The resulting tree is available through the ``root`` property of the iterator: + +.. sourcecode:: pycon >>> context.root.tag 'root' -The other event types can be activated with the ``events`` keyword argument:: +The other event types can be activated with the ``events`` keyword argument: + +.. sourcecode:: pycon >>> events = ("start", "end") >>> context = etree.iterparse(StringIO(xml), events=events) @@ -356,7 +390,9 @@ As an extension over ElementTree, lxml.etree accepts a ``tag`` keyword argument just like ``element.iter(tag)``. This restricts events to a -specific tag or namespace:: +specific tag or namespace: + +.. sourcecode:: pycon >>> context = etree.iterparse(StringIO(xml), tag="element") >>> for action, elem in context: @@ -378,7 +414,9 @@ You can modify the element and its descendants when handling the 'end' event. To save memory, for example, you can remove subtrees that are no longer -needed:: +needed: + +.. sourcecode:: pycon >>> context = etree.iterparse(StringIO(xml)) >>> for action, elem in context: @@ -398,7 +436,9 @@ If you have elements with a long list of children in your XML file and want to save more memory during parsing, you can clean up the preceding siblings of -the current element:: +the current element: + +.. sourcecode:: pycon >>> for event, element in etree.iterparse(StringIO(xml)): ... # ... do something with the element @@ -415,7 +455,9 @@ code. The 'start-ns' and 'end-ns' events notify about namespace declarations and -generate tuples ``(prefix, URI)``:: +generate tuples ``(prefix, URI)``: + +.. sourcecode:: pycon >>> events = ("start-ns", "end-ns") >>> context = etree.iterparse(StringIO(xml), events=events) @@ -432,7 +474,9 @@ -------- A second extension over ElementTree is the ``iterwalk()`` function. It -behaves exactly like ``iterparse()``, but works on Elements and ElementTrees:: +behaves exactly like ``iterparse()``, but works on Elements and ElementTrees: + +.. sourcecode:: pycon >>> root = etree.XML(xml) @@ -464,7 +508,9 @@ library. First of all, where ElementTree would raise an exception, the parsers in lxml.etree can handle unicode strings straight away. This is most helpful for XML snippets embedded in source code using the ``XML()`` -function:: +function: + +.. sourcecode:: pycon >>> uxml = u'<test> \uf8d1 + \uf8d2 </test>' >>> uxml @@ -472,7 +518,9 @@ >>> root = etree.XML(uxml) This requires, however, that unicode strings do not specify a conflicting -encoding themselves and thus lie about their real encoding:: +encoding themselves and thus lie about their real encoding: + +.. sourcecode:: pycon >>> etree.XML(u'<?xml version="1.0" encoding="ASCII"?>\n' + uxml) Traceback (most recent call last): @@ -490,7 +538,9 @@ To serialize the result, you would normally use the ``tostring()`` module function, which serializes to plain ASCII by default or a -number of other byte encodings if asked for:: +number of other byte encodings if asked for: + +.. sourcecode:: pycon >>> etree.tostring(root) '<test>  +  </test>' @@ -499,7 +549,9 @@ '<test> \xef\xa3\x91 + \xef\xa3\x92 </test>' As an extension, lxml.etree recognises the unicode type as encoding to -build a Python unicode representation of a tree:: +build a Python unicode representation of a tree: + +.. sourcecode:: pycon >>> etree.tostring(root, encoding=unicode) u'<test> \uf8d1 + \uf8d2 </test>' Modified: lxml/trunk/doc/performance.txt ============================================================================== --- lxml/trunk/doc/performance.txt (original) +++ lxml/trunk/doc/performance.txt Mon Mar 3 19:43:14 2008 @@ -470,7 +470,9 @@ .. _`benchmark proposal`: http://www.onlamp.com/pub/wlg/6291 .. _`Old Testament`: http://www.ibiblio.org/bosak/xml/eg/religion.2.00.xml.zip -Now, Uche's original proposal was more or less the following:: +Now, Uche's original proposal was more or less the following: + +.. sourcecode:: python def bench_ET(): tree = ElementTree.parse("ot.xml") @@ -482,7 +484,9 @@ return len(result) which takes about one second on my machine today. The faster ``iterparse()`` -variant looks like this:: +variant looks like this: + +.. sourcecode:: python def bench_ET_iterparse(): result = [] @@ -507,7 +511,9 @@ One of the many great tools in lxml is XPath, a swiss army knife for finding things in XML documents. It is possible to move the whole thing to a pure -XPath implementation, which looks like this:: +XPath implementation, which looks like this: + +.. sourcecode:: python def bench_lxml_xpath_all(): tree = etree.parse("ot.xml") @@ -518,7 +524,9 @@ implementation (in lines of Python code) that I could come up with. Now, this is already a rather complex XPath expression compared to the simple "//v" ElementPath expression we started with. Since this is also valid XPath, let's -try this instead:: +try this instead: + +.. sourcecode:: python def bench_lxml_xpath(): tree = etree.parse("ot.xml") @@ -536,7 +544,9 @@ what we had in the beginning. Under lxml, this runs in the same 0.12 seconds. But there is one thing left to try. We can replace the simple ElementPath -expression with a native tree iterator:: +expression with a native tree iterator: + +.. sourcecode:: python def bench_lxml_getiterator(): tree = etree.parse("ot.xml") @@ -633,11 +643,15 @@ A way to improve the normal attribute access time is static instantiation of the Python objects, thus trading memory for speed. Just create a cache -dictionary and run:: +dictionary and run: + +.. sourcecode:: python cache[root] = list(root.iter()) -after parsing and:: +after parsing and: + +.. sourcecode:: python del cache[root] Modified: lxml/trunk/doc/resolvers.txt ============================================================================== --- lxml/trunk/doc/resolvers.txt (original) +++ lxml/trunk/doc/resolvers.txt Mon Mar 3 19:43:14 2008 @@ -16,7 +16,9 @@ Resolvers --------- -Here is an example of a custom resolver:: +Here is an example of a custom resolver: + +.. sourcecode:: pycon >>> from lxml import etree @@ -44,14 +46,18 @@ terminates if ``resolve()`` returns the result of any of the above ``resolve_*()`` methods. -Resolvers are registered local to a parser:: +Resolvers are registered local to a parser: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(load_dtd=True) >>> parser.resolvers.add( DTDResolver() ) Note that we instantiate a parser that loads the DTD. This is not done by the default parser, which does no validation. When we use this parser to parse a -document that requires resolving a URL, it will call our custom resolver:: +document that requires resolving a URL, it will call our custom resolver: + +.. sourcecode:: pycon >>> xml = u'<!DOCTYPE doc SYSTEM "MissingDTD.dtd"><doc>&myentity;</doc>' >>> from StringIO import StringIO @@ -71,7 +77,9 @@ XML documents memorise their initial parser (and its resolvers) during their life-time. This means that a lookup process related to a document will use the resolvers of the document's parser. We can demonstrate this with a -resolver that only responds to a specific prefix:: +resolver that only responds to a specific prefix: + +.. sourcecode:: pycon >>> class PrefixResolver(etree.Resolver): ... def __init__(self, prefix): @@ -87,7 +95,9 @@ ... print "Resolved url %s as prefix %s" % (url, self.prefix) ... return self.resolve_string(self.result_xml, context) -We demonstrate this in XSLT and use the following stylesheet as an example:: +We demonstrate this in XSLT and use the following stylesheet as an example: + +.. sourcecode:: pycon >>> xml_text = """\ ... <xsl:stylesheet version="1.0" @@ -105,7 +115,9 @@ document (i.e. when resolving ``xsl:import`` and ``xsl:include`` elements) and ``hoi:test`` at transformation time, when calls to the ``document`` function are resolved. If we now register different resolvers with two different -parsers, we can parse our document twice in different resolver contexts:: +parsers, we can parse our document twice in different resolver contexts: + +.. sourcecode:: pycon >>> hoi_parser = etree.XMLParser() >>> normal_doc = etree.parse(StringIO(xml_text), hoi_parser) @@ -121,44 +133,52 @@ memorise their original parser so that the correct set of resolvers is used in subsequent lookups. To compile the stylesheet, XSLT must resolve the ``honk:test`` URI in the ``xsl:include`` element. The ``hoi`` resolver cannot -do that:: +do that: + +.. sourcecode:: pycon >>> transform = etree.XSLT(normal_doc) Traceback (most recent call last): - [...] + ... XSLTParseError: Cannot resolve URI honk:test >>> transform = etree.XSLT(hoi_doc) Traceback (most recent call last): - [...] + ... XSLTParseError: Cannot resolve URI honk:test However, if we use the ``honk`` resolver associated with the respective -document, everything works fine:: +document, everything works fine: + +.. sourcecode:: pycon >>> transform = etree.XSLT(honk_doc) Resolved url honk:test as prefix honk Running the transform accesses the same parser context again, but since it now needs to resolve the ``hoi`` URI in the call to the document function, its -``honk`` resolver will fail to do so:: +``honk`` resolver will fail to do so: + +.. sourcecode:: pycon >>> result = transform(normal_doc) Traceback (most recent call last): - [...] + ... XSLTApplyError: Cannot resolve URI hoi:test >>> result = transform(hoi_doc) Traceback (most recent call last): - [...] + ... XSLTApplyError: Cannot resolve URI hoi:test >>> result = transform(honk_doc) Traceback (most recent call last): - [...] + ... XSLTApplyError: Cannot resolve URI hoi:test -This can only be solved by adding a ``hoi`` resolver to the original parser:: +This can only be solved by adding a ``hoi`` resolver to the original parser: + +.. sourcecode:: pycon >>> honk_parser.resolvers.add( PrefixResolver("hoi") ) >>> result = transform(honk_doc) @@ -170,7 +190,9 @@ We can see that the ``hoi`` resolver was called to generate a document that was then inserted into the result document by the XSLT transformation. Note that this is completely independent of the XML file you transform, as the URI -is resolved from within the stylesheet context:: +is resolved from within the stylesheet context: + +.. sourcecode:: pycon >>> result = transform(normal_doc) Resolved url hoi:test as prefix hoi @@ -200,7 +222,9 @@ Access control is configured using the ``XSLTAccessControl`` class. It can be called with a number of keyword arguments that allow or deny specific -operations:: +operations: + +.. sourcecode:: pycon >>> transform = etree.XSLT(honk_doc) Resolved url honk:test as prefix honk @@ -212,7 +236,7 @@ Resolved url honk:test as prefix honk >>> result = transform(normal_doc) Traceback (most recent call last): - [...] + ... XSLTApplyError: xsltLoadDocument: read rights for hoi:test denied There are a few things to keep in mind: Modified: lxml/trunk/doc/rest2html.py ============================================================================== --- lxml/trunk/doc/rest2html.py (original) +++ lxml/trunk/doc/rest2html.py Mon Mar 3 19:43:14 2008 @@ -1,23 +1,61 @@ #!/usr/bin/python -# Author: David Goodger -# Contact: goodger at python.org -# Revision: $Revision: 3901 $ -# Date: $Date: 2005-09-25 17:49:54 +0200 (Sun, 25 Sep 2005) $ -# Copyright: This module has been placed in the public domain. - """ -A minimal front end to the Docutils Publisher, producing HTML. +A minimal front end to the Docutils Publisher, producing HTML with +Pygments syntax highlighting. """ +# Set to True if you want inline CSS styles instead of classes +INLINESTYLES = False + + try: import locale locale.setlocale(locale.LC_ALL, '') except: pass -from docutils.core import publish_cmdline, default_description +# set up Pygments + +from pygments.formatters import HtmlFormatter + +# The default formatter +DEFAULT = HtmlFormatter(noclasses=INLINESTYLES, cssclass='syntax') + +# Add name -> formatter pairs for every variant you want to use +VARIANTS = { + # 'linenos': HtmlFormatter(noclasses=INLINESTYLES, linenos=True), +} + +from docutils import nodes +from docutils.parsers.rst import directives + +from pygments import highlight +from pygments.lexers import get_lexer_by_name, TextLexer + +def pygments_directive(name, arguments, options, content, lineno, + content_offset, block_text, state, state_machine): + try: + lexer = get_lexer_by_name(arguments[0]) + except ValueError, e: + # no lexer found - use the text one instead of an exception + lexer = TextLexer() + # take an arbitrary option if more than one is given + formatter = options and VARIANTS[options.keys()[0]] or DEFAULT + parsed = highlight(u'\n'.join(content), lexer, formatter) + return [nodes.raw('', parsed, format='html')] + +pygments_directive.arguments = (1, 0, 1) +pygments_directive.content = 1 +pygments_directive.options = dict([(key, directives.flag) for key in VARIANTS]) + +directives.register_directive('sourcecode', pygments_directive) + + +# run the generation + +from docutils.core import publish_cmdline, default_description description = ('Generates (X)HTML documents from standalone reStructuredText ' 'sources. ' + default_description) Modified: lxml/trunk/doc/sax.txt ============================================================================== --- lxml/trunk/doc/sax.txt (original) +++ lxml/trunk/doc/sax.txt Mon Mar 3 19:43:14 2008 @@ -19,12 +19,16 @@ First of all, lxml has support for building a new tree given SAX events. To do this, we use the special SAX content handler defined by lxml named -``lxml.sax.ElementTreeContentHandler``:: +``lxml.sax.ElementTreeContentHandler``: + +.. sourcecode:: pycon >>> import lxml.sax >>> handler = lxml.sax.ElementTreeContentHandler() -Now let's fire some SAX events at it:: +Now let's fire some SAX events at it: + +.. sourcecode:: pycon >>> handler.startElementNS((None, 'a'), 'a', {}) >>> handler.startElementNS((None, 'b'), 'b', {(None, 'foo'): 'bar'}) @@ -33,7 +37,9 @@ >>> handler.endElementNS((None, 'a'), 'a') This constructs an equivalent tree. You can access it through the ``etree`` -property of the handler:: +property of the handler: + +.. sourcecode:: pycon >>> tree = handler.etree >>> lxml.etree.tostring(tree.getroot()) @@ -47,14 +53,18 @@ Producing SAX events from an ElementTree or Element --------------------------------------------------- -Let's make a tree we can generate SAX events for:: +Let's make a tree we can generate SAX events for: + +.. sourcecode:: pycon >>> from StringIO import StringIO >>> f = StringIO('<a><b>Text</b></a>') >>> tree = lxml.etree.parse(f) To see whether the correct SAX events are produced, we'll write a custom -content handler.:: +content handler.: + +.. sourcecode:: pycon >>> from xml.sax.handler import ContentHandler >>> class MyContentHandler(ContentHandler): @@ -77,12 +87,16 @@ The SAX event generator in lxml.sax currently only supports namespace-aware processing. -To test the content handler, we can produce SAX events from the tree:: +To test the content handler, we can produce SAX events from the tree: + +.. sourcecode:: pycon >>> handler = MyContentHandler() >>> lxml.sax.saxify(tree, handler) -This is what we expect:: +This is what we expect: + +.. sourcecode:: pycon >>> handler.a_amount 1 @@ -99,13 +113,17 @@ Python library. Note, however, that this is a one-way solution, as Python's DOM implementation connot generate SAX events from a DOM tree. -You can use xml.dom.pulldom to build a minidom from lxml:: +You can use xml.dom.pulldom to build a minidom from lxml: + +.. sourcecode:: pycon >>> from xml.dom.pulldom import SAX2DOM >>> handler = SAX2DOM() >>> lxml.sax.saxify(tree, handler) -PullDOM makes the result available through the ``document`` attribute:: +PullDOM makes the result available through the ``document`` attribute: + +.. sourcecode:: pycon >>> dom = handler.document >>> print dom.firstChild.localName Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Mon Mar 3 19:43:14 2008 @@ -37,13 +37,17 @@ 6 ElementPath -A common way to import ``lxml.etree`` is as follows:: +A common way to import ``lxml.etree`` is as follows: + +.. sourcecode:: pycon >>> from lxml import etree If your code only uses the ElementTree API and does not rely on any functionality that is specific to ``lxml.etree``, you can also use (any part -of) the following import chain as a fall-back to the original ElementTree:: +of) the following import chain as a fall-back to the original ElementTree: + +.. sourcecode:: python try: from lxml import etree @@ -84,28 +88,38 @@ An ``Element`` is the main container object for the ElementTree API. Most of the XML tree functionality is accessed through this class. Elements are -easily created through the ``Element`` factory:: +easily created through the ``Element`` factory: + +.. sourcecode:: pycon >>> root = etree.Element("root") -The XML tag name of elements is accessed through the ``tag`` property:: +The XML tag name of elements is accessed through the ``tag`` property: + +.. sourcecode:: pycon >>> print root.tag root Elements are organised in an XML tree structure. To create child elements and -add them to a parent element, you can use the ``append()`` method:: +add them to a parent element, you can use the ``append()`` method: + +.. sourcecode:: pycon >>> root.append( etree.Element("child1") ) However, this is so common that there is a shorter and much more efficient way to do this: the ``SubElement`` factory. It accepts the same arguments as the -``Element`` factory, but additionally requires the parent as first argument:: +``Element`` factory, but additionally requires the parent as first argument: + +.. sourcecode:: pycon >>> child2 = etree.SubElement(root, "child2") >>> child3 = etree.SubElement(root, "child3") -To see that this is really XML, you can serialise the tree you have created:: +To see that this is really XML, you can serialise the tree you have created: + +.. sourcecode:: pycon >>> print etree.tostring(root, pretty_print=True), <root> @@ -119,7 +133,9 @@ ------------------ To make the access to these subelements as easy and straight forward as -possible, elements behave like normal Python lists:: +possible, elements behave like normal Python lists: + +.. sourcecode:: pycon >>> child = root[0] >>> print child.tag @@ -169,7 +185,9 @@ If you want to *copy* an element to a different position, consider creating an independent *deep copy* using the ``copy`` module from Python's standard -library:: +library: + +.. sourcecode:: pycon >>> from copy import deepcopy @@ -181,13 +199,17 @@ >>> print [ c.tag for c in root ] ['child3', 'child1', 'child2'] -The way up in the tree is provided through the ``getparent()`` method:: +The way up in the tree is provided through the ``getparent()`` method: + +.. sourcecode:: pycon >>> root is root[0].getparent() # lxml.etree only! True The siblings (or neighbours) of an element are accessed as next and previous -elements:: +elements: + +.. sourcecode:: pycon >>> root[0] is root[1].getprevious() # lxml.etree only! True @@ -199,14 +221,18 @@ ------------------------- XML elements support attributes. You can create them directly in the Element -factory:: +factory: + +.. sourcecode:: pycon >>> root = etree.Element("root", interesting="totally") >>> print etree.tostring(root) <root interesting="totally"/> Fast and direct access to these attributes is provided by the ``set()`` and -``get()`` methods of elements:: +``get()`` methods of elements: + +.. sourcecode:: pycon >>> print root.get("interesting") totally @@ -216,7 +242,9 @@ somewhat However, a very convenient way of dealing with them is through the dictionary -interface of the ``attrib`` property:: +interface of the ``attrib`` property: + +.. sourcecode:: pycon >>> attributes = root.attrib @@ -236,7 +264,9 @@ Elements contain text --------------------- -Elements can contain text:: +Elements can contain text: + +.. sourcecode:: pycon >>> root = etree.Element("root") >>> root.text = "TEXT" @@ -252,14 +282,18 @@ tree hierarchy. However, if XML is used for tagged text documents such as (X)HTML, text can -also appear between different elements, right in the middle of the tree:: +also appear between different elements, right in the middle of the tree: + +.. sourcecode:: html <html><body>Hello<br/>World</body></html> Here, the ``<br/>`` tag is surrounded by text. This is often referred to as *document-style* or *mixed-content* XML. Elements support this through their ``tail`` property. It contains the text that directly follows the element, up -to the next element in the XML tree:: +to the next element in the XML tree: + +.. sourcecode:: pycon >>> html = etree.Element("html") >>> body = etree.SubElement(html, "body") @@ -286,7 +320,9 @@ For example, when you serialise an Element from within the tree, you do not always want its tail text in the result (although you would still want the tail text of its children). For this purpose, the -``tostring()`` function accepts the keyword argument ``with_tail``:: +``tostring()`` function accepts the keyword argument ``with_tail``: + +.. sourcecode:: pycon >>> print etree.tostring(br) <br/>TAIL @@ -299,7 +335,9 @@ If you want to read *only* the text, i.e. without any intermediate tags, you have to recursively concatenate all ``text`` and ``tail`` attributes in the correct order. Again, the ``tostring()`` function -comes to the rescue, this time using the ``method`` keyword:: +comes to the rescue, this time using the ``method`` keyword: + +.. sourcecode:: pycon >>> print etree.tostring(html, method="text") TEXTTAIL @@ -311,14 +349,18 @@ .. _XPath: xpathxslt.html#xpath Another way to extract the text content of a tree is XPath_, which -also allows you to extract the separate text chunks into a list:: +also allows you to extract the separate text chunks into a list: + +.. sourcecode:: pycon >>> print html.xpath("string()") # lxml.etree only! TEXTTAIL >>> print html.xpath("//text()") # lxml.etree only! ['TEXT', 'TAIL'] -If you want to use this more often, you can wrap it in a function:: +If you want to use this more often, you can wrap it in a function: + +.. sourcecode:: pycon >>> build_text_list = etree.XPath("//text()") # lxml.etree only! >>> print build_text_list(html) @@ -327,7 +369,9 @@ Note that a string result returned by XPath is a special 'smart' object that knows about its origins. You can ask it where it came from through its ``getparent()`` method, just as you would with -Elements:: +Elements: + +.. sourcecode:: pycon >>> texts = build_text_list(html) >>> print texts[0] @@ -341,7 +385,9 @@ >>> print texts[1].getparent().tag br -You can also find out if it's normal text content or tail text:: +You can also find out if it's normal text content or tail text: + +.. sourcecode:: pycon >>> print texts[0].is_text True @@ -352,7 +398,9 @@ While this works for the results of the ``text()`` function, lxml will not to tell you the origin of a string value that was constructed by -the XPath functions ``string()`` or ``concat()``:: +the XPath functions ``string()`` or ``concat()``: + +.. sourcecode:: pycon >>> stringify = etree.XPath("string()") >>> print stringify(html) @@ -368,7 +416,9 @@ and do something with its elements, tree iteration is a very convenient solution. Elements provide a tree iterator for this purpose. It yields elements in *document order*, i.e. in the order their tags would appear if you -serialised the tree to XML:: +serialised the tree to XML: + +.. sourcecode:: pycon >>> root = etree.Element("root") >>> etree.SubElement(root, "child").text = "Child 1" @@ -390,7 +440,9 @@ another - Child 3 If you know you are only interested in a single tag, you can pass its name to -``iter()`` to have it filter for you:: +``iter()`` to have it filter for you: + +.. sourcecode:: pycon >>> for element in root.iter("child"): ... print element.tag, '-', element.text @@ -400,7 +452,9 @@ By default, iteration yields all nodes in the tree, including ProcessingInstructions, Comments and Entity instances. If you want to make sure only Element objects are returned, you can pass the -``Element`` factory as tag parameter:: +``Element`` factory as tag parameter: + +.. sourcecode:: pycon >>> root.append(etree.Entity("#234")) >>> root.append(etree.Comment("some comment")) @@ -441,7 +495,9 @@ string, or the ``ElementTree.write()`` method that writes to a file or file-like object. Both accept the same keyword arguments like ``pretty_print`` for formatted output or ``encoding`` to select a -specific output encoding other than plain ASCII:: +specific output encoding other than plain ASCII: + +.. sourcecode:: pycon >>> root = etree.XML('<root><a><b/></a></root>') @@ -468,7 +524,9 @@ Since lxml 2.0 (and ElementTree 1.3), the serialisation functions can do more than XML serialisation. You can serialise to HTML or extract -the text content by passing the ``method`` keyword:: +the text content by passing the ``method`` keyword: + +.. sourcecode:: pycon >>> root = etree.XML('<html><head/><body><p>Hello<br/>World</p></body></html>') @@ -491,7 +549,9 @@ HelloWorld For the plain text output, serialising to a Python unicode string -might become handy. Just pass the ``unicode`` type as encoding:: +might become handy. Just pass the ``unicode`` type as encoding: + +.. sourcecode:: pycon >>> etree.tostring(root, encoding=unicode, method='text') u'HelloWorld' @@ -505,7 +565,9 @@ and general document handling. One of the bigger differences is that it serialises as a complete document, as opposed to a single ``Element``. This includes top-level processing instructions and -comments, as well as a DOCTYPE and other DTD content in the document:: +comments, as well as a DOCTYPE and other DTD content in the document: + +.. sourcecode:: pycon >>> from StringIO import StringIO >>> tree = etree.parse(StringIO('''\ @@ -562,7 +624,9 @@ The fromstring() function ------------------------- -The ``fromstring()`` function is the easiest way to parse a string:: +The ``fromstring()`` function is the easiest way to parse a string: + +.. sourcecode:: pycon >>> some_xml_data = "<root>data</root>" @@ -577,7 +641,9 @@ ------------------ The ``XML()`` function behaves like the ``fromstring()`` function, but is -commonly used to write XML literals right into the source:: +commonly used to write XML literals right into the source: + +.. sourcecode:: pycon >>> root = etree.XML("<root>data</root>") >>> print root.tag @@ -589,7 +655,9 @@ The parse() function -------------------- -The ``parse()`` function is used to parse from files and file-like objects:: +The ``parse()`` function is used to parse from files and file-like objects: + +.. sourcecode:: pycon >>> some_file_like = StringIO("<root>data</root>") @@ -599,7 +667,9 @@ <root>data</root> Note that ``parse()`` returns an ElementTree object, not an Element object as -the string parser functions:: +the string parser functions: + +.. sourcecode:: pycon >>> root = tree.getroot() >>> print root.tag @@ -630,13 +700,17 @@ -------------- By default, ``lxml.etree`` uses a standard parser with a default setup. If -you want to configure the parser, you can create a you instance:: +you want to configure the parser, you can create a you instance: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(remove_blank_text=True) # lxml.etree only! This creates a parser that removes empty text between tags while parsing, which can reduce the size of the tree and avoid dangling tail text if you know -that whitespace-only content is not meaningful for your data. An example:: +that whitespace-only content is not meaningful for your data. An example: + +.. sourcecode:: pycon >>> root = etree.XML("<root> <a/> <b> </b> </root>", parser) @@ -645,7 +719,9 @@ Note that the whitespace content inside the ``<b>`` tag was not removed, as content at leaf elements tends to be data content (even if blank). You can -easily remove it in an additional step by traversing the tree:: +easily remove it in an additional step by traversing the tree: + +.. sourcecode:: pycon >>> for element in root.iter("*"): ... if element.text is not None and not element.text.strip(): @@ -664,7 +740,9 @@ through file-like objects, where it calls the ``read()`` method repeatedly. This is best used where the data arrives from a source like ``urllib`` or any other file-like object that can provide data on request. Note that the parser -will block and wait until data becomes available in this case:: +will block and wait until data becomes available in this case: + +.. sourcecode:: pycon >>> class DataSource: ... data = iter(["<roo", "t><", "a/", "><", "/root>"]) @@ -680,7 +758,9 @@ <root><a/></root> The second way is through a feed parser interface, given by the ``feed(data)`` -and ``close()`` methods:: +and ``close()`` methods: + +.. sourcecode:: pycon >>> parser = etree.XMLParser() @@ -703,7 +783,9 @@ After calling the ``close()`` method (or when an exception was raised by the parser), you can reuse the parser by calling its ``feed()`` -method again:: +method again: + +.. sourcecode:: pycon >>> parser.feed("<root/>") >>> root = parser.close() @@ -722,7 +804,9 @@ at all, and instead calls feedback methods on a target object in a SAX-like fashion. -Here is a simple ``iterparse()`` example:: +Here is a simple ``iterparse()`` example: + +.. sourcecode:: pycon >>> some_file_like = StringIO("<root><a>data</a></root>") @@ -732,7 +816,9 @@ end, root, None By default, ``iterparse()`` only generates events when it is done parsing an -element, but you can control this through the ``events`` keyword argument:: +element, but you can control this through the ``events`` keyword argument: + +.. sourcecode:: pycon >>> some_file_like = StringIO("<root><a>data</a></root>") @@ -752,7 +838,9 @@ If memory is a real bottleneck, or if building the tree is not desired at all, the target parser interface of ``lxml.etree`` can be used. It creates SAX-like events by calling the methods of a target object. By implementing -some or all of these methods, you can control which events are generated:: +some or all of these methods, you can control which events are generated: + +.. sourcecode:: pycon >>> class ParserTarget: ... events = [] @@ -776,7 +864,9 @@ ========== The ElementTree API avoids `namespace prefixes`_ wherever possible and deploys -the real namespaces instead:: +the real namespaces instead: + +.. sourcecode:: pycon >>> xhtml = etree.Element("{http://www.w3.org/1999/xhtml}html") >>> body = etree.SubElement(xhtml, "{http://www.w3.org/1999/xhtml}body") @@ -794,7 +884,9 @@ names. And retyping or copying a string over and over again is error prone. It is therefore common practice to store a namespace URI in a global variable. To adapt the namespace prefixes for serialisation, you can also pass a mapping -to the Element factory, e.g. to define the default namespace:: +to the Element factory, e.g. to define the default namespace: + +.. sourcecode:: pycon >>> XHTML_NAMESPACE = "http://www.w3.org/1999/xhtml" >>> XHTML = "{%s}" % XHTML_NAMESPACE @@ -810,7 +902,9 @@ <body>Hello World</body> </html> -Namespaces on attributes work alike:: +Namespaces on attributes work alike: + +.. sourcecode:: pycon >>> body.set(XHTML + "bgcolor", "#CCFFAA") @@ -824,7 +918,9 @@ >>> body.get(XHTML + "bgcolor") '#CCFFAA' -You can also use XPath in this way:: +You can also use XPath in this way: + +.. sourcecode:: pycon >>> find_xhtml_body = etree.ETXPath( # lxml only ! ... "//{%s}body" % XHTML_NAMESPACE) @@ -838,7 +934,9 @@ ============= The ``E-factory`` provides a simple and compact syntax for generating XML and -HTML:: +HTML: + +.. sourcecode:: pycon >>> from lxml.builder import E @@ -877,7 +975,9 @@ </html> The Element creation based on attribute access makes it easy to build up a -simple vocabulary for an XML language:: +simple vocabulary for an XML language: + +.. sourcecode:: pycon >>> from lxml.builder import ElementMaker # lxml only ! @@ -946,25 +1046,33 @@ * ``findtext()`` returns the ``.text`` content of the first match -Here are some examples:: +Here are some examples: + +.. sourcecode:: pycon >>> root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>") -Find a child of an Element:: +Find a child of an Element: + +.. sourcecode:: pycon >>> print root.find("b") None >>> print root.find("a").tag a -Find an Element anywhere in the tree:: +Find an Element anywhere in the tree: + +.. sourcecode:: pycon >>> print root.find(".//b").tag b >>> [ b.tag for b in root.iterfind(".//b") ] ['b', 'b'] -Find Elements with a certain attribute:: +Find Elements with a certain attribute: + +.. sourcecode:: pycon >>> print root.findall(".//a[@x]")[0].tag a Modified: lxml/trunk/doc/validation.txt ============================================================================== --- lxml/trunk/doc/validation.txt (original) +++ lxml/trunk/doc/validation.txt Mon Mar 3 19:43:14 2008 @@ -25,7 +25,9 @@ 4 XMLSchema 5 Schematron -The usual setup procedure:: +The usual setup procedure: + +.. sourcecode:: pycon >>> from lxml import etree >>> from StringIO import StringIO @@ -37,7 +39,9 @@ The parser in lxml can do on-the-fly validation of a document against a DTD or an XML schema. The DTD is retrieved automatically based on the DOCTYPE of the parsed document. All you have to do is use a -parser that has DTD validation enabled:: +parser that has DTD validation enabled: + +.. sourcecode:: pycon >>> parser = etree.XMLParser(dtd_validation=True) @@ -56,7 +60,9 @@ performed unless explicitly requested. XML schema is supported in a similar way, but requires an explicit -schema to be provided:: +schema to be provided: + +.. sourcecode:: pycon >>> schema_root = etree.XML('''\ ... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> @@ -69,7 +75,9 @@ >>> root = etree.fromstring("<a>5</a>", parser) If the validation fails (be it for a DTD or an XML schema), the parser -will raise an exception:: +will raise an exception: + +.. sourcecode:: pycon >>> root = etree.fromstring("<a>no int</a>", parser) Traceback (most recent call last): @@ -90,12 +98,16 @@ referenced by the document itself, you can use the ``DTD`` class. To use the ``DTD`` class, you must first pass a filename or file-like object -into the constructor to parse a DTD:: +into the constructor to parse a DTD: + +.. sourcecode:: pycon >>> f = StringIO("<!ELEMENT b EMPTY>") >>> dtd = etree.DTD(f) -Now you can use it to validate documents:: +Now you can use it to validate documents: + +.. sourcecode:: pycon >>> root = etree.XML("<b/>") >>> print dtd.validate(root) @@ -105,7 +117,9 @@ >>> print dtd.validate(root) False -The reason for the validation failure can be found in the error log:: +The reason for the validation failure can be found in the error log: + +.. sourcecode:: pycon >>> print dtd.error_log.filter_from_errors()[0] <string>:1:0:ERROR:VALID:DTD_NOT_EMPTY: Element b was declared EMPTY this one has content @@ -113,7 +127,9 @@ As an alternative to parsing from a file, you can use the ``external_id`` keyword argument to parse from a catalog. The following example reads the DocBook DTD in version 4.2, if available -in the system catalog:: +in the system catalog: + +.. sourcecode:: python dtd = etree.DTD(external_id = "-//OASIS//DTD DocBook XML V4.2//EN") @@ -122,7 +138,9 @@ ------- The ``RelaxNG`` class takes an ElementTree object to construct a Relax NG -validator:: +validator: + +.. sourcecode:: pycon >>> f = StringIO('''\ ... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0"> @@ -142,7 +160,9 @@ You can then validate some ElementTree document against the schema. You'll get back True if the document is valid against the Relax NG schema, and False if -not:: +not: + +.. sourcecode:: pycon >>> valid = StringIO('<a><b></b></a>') >>> doc = etree.parse(valid) @@ -155,7 +175,9 @@ False Calling the schema object has the same effect as calling its validate -method. This is sometimes used in conditional statements:: +method. This is sometimes used in conditional statements: + +.. sourcecode:: pycon >>> invalid = StringIO('<a><c></c></a>') >>> doc2 = etree.parse(invalid) @@ -164,21 +186,25 @@ invalid! If you prefer getting an exception when validating, you can use the -``assert_`` or ``assertValid`` methods:: +``assert_`` or ``assertValid`` methods: + +.. sourcecode:: pycon >>> relaxng.assertValid(doc2) Traceback (most recent call last): - [...] + ... DocumentInvalid: Did not expect element c there, line 1 >>> relaxng.assert_(doc2) Traceback (most recent call last): - [...] + ... AssertionError: Did not expect element c there, line 1 If you want to find out why the validation failed in the second case, you can look up the error log of the validation process and check it for relevant -messages:: +messages: + +.. sourcecode:: pycon >>> log = relaxng.error_log >>> print log.last_error @@ -186,7 +212,9 @@ You can see that the error (ERROR) happened during RelaxNG validation (RELAXNGV). The message then tells you what went wrong. You can also -look at the error domain and its type directly:: +look at the error domain and its type directly: + +.. sourcecode:: pycon >>> error = log.last_error >>> print error.domain_name @@ -198,7 +226,9 @@ contain log entries that appeared during the validation. Similar to XSLT, there's also a less efficient but easier shortcut method to -do one-shot RelaxNG validation:: +do one-shot RelaxNG validation: + +.. sourcecode:: pycon >>> doc.relaxng(relaxng_doc) True @@ -218,7 +248,9 @@ lxml.etree also has XML Schema (XSD) support, using the class lxml.etree.XMLSchema. The API is very similar to the Relax NG and DTD -classes. Pass an ElementTree object to construct a XMLSchema validator:: +classes. Pass an ElementTree object to construct a XMLSchema validator: + +.. sourcecode:: pycon >>> f = StringIO('''\ ... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> @@ -235,7 +267,9 @@ You can then validate some ElementTree document with this. Like with RelaxNG, you'll get back true if the document is valid against the XML schema, and -false if not:: +false if not: + +.. sourcecode:: pycon >>> valid = StringIO('<a><b></b></a>') >>> doc = etree.parse(valid) @@ -248,7 +282,9 @@ False Calling the schema object has the same effect as calling its validate method. -This is sometimes used in conditional statements:: +This is sometimes used in conditional statements: + +.. sourcecode:: pycon >>> invalid = StringIO('<a><c></c></a>') >>> doc2 = etree.parse(invalid) @@ -257,19 +293,23 @@ invalid! If you prefer getting an exception when validating, you can use the -``assert_`` or ``assertValid`` methods:: +``assert_`` or ``assertValid`` methods: + +.. sourcecode:: pycon >>> xmlschema.assertValid(doc2) Traceback (most recent call last): - [...] + ... DocumentInvalid: Element 'c': This element is not expected. Expected is ( b )., line 1 >>> xmlschema.assert_(doc2) Traceback (most recent call last): - [...] + ... AssertionError: Element 'c': This element is not expected. Expected is ( b )., line 1 -Error reporting works as for the RelaxNG class:: +Error reporting works as for the RelaxNG class: + +.. sourcecode:: pycon >>> log = xmlschema.error_log >>> error = log.last_error @@ -285,7 +325,9 @@ <string>:1:ERROR::SCHEMAV_ELEMENT_CONTENT: Element 'c': This element is not expected. Expected is ( b ). Similar to XSLT and RelaxNG, there's also a less efficient but easier shortcut -method to do XML Schema validation:: +method to do XML Schema validation: + +.. sourcecode:: pycon >>> doc.xmlschema(xmlschema_doc) True @@ -299,7 +341,9 @@ Since version 2.0, lxml.etree features Schematron_ support, using the class lxml.etree.Schematron. It requires at least libxml2 2.6.21 to work. The API is the same as for the other validators. Pass an -ElementTree object to construct a Schematron validator:: +ElementTree object to construct a Schematron validator: + +.. sourcecode:: pycon >>> f = StringIO('''\ ... <schema xmlns="http://www.ascc.net/xml/schematron" > @@ -316,7 +360,9 @@ You can then validate some ElementTree document with this. Like with RelaxNG, you'll get back true if the document is valid against the schema, and false if -not:: +not: + +.. sourcecode:: pycon >>> valid = StringIO('''\ ... <Total> @@ -336,7 +382,9 @@ False Calling the schema object has the same effect as calling its validate method. -This is sometimes used in conditional statements:: +This is sometimes used in conditional statements: + +.. sourcecode:: pycon >>> is_valid = etree.Schematron(sct_doc) Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Mon Mar 3 19:43:14 2008 @@ -23,7 +23,9 @@ 2.5 Profiling -The usual setup procedure:: +The usual setup procedure: + +.. sourcecode:: pycon >>> from lxml import etree >>> from StringIO import StringIO @@ -55,7 +57,9 @@ ---------------------- For ElementTree, the xpath method performs a global XPath query against the -document (if absolute) or against the root node (if relative):: +document (if absolute) or against the root node (if relative): + +.. sourcecode:: pycon >>> f = StringIO('<foo><bar></bar></foo>') >>> tree = etree.parse(f) @@ -71,7 +75,9 @@ 'bar' When ``xpath()`` is used on an Element, the XPath expression is evaluated -against the element (if relative) or against the root tree (if absolute):: +against the element (if relative) or against the root tree (if absolute): + +.. sourcecode:: pycon >>> root = tree.getroot() >>> r = root.xpath('bar') @@ -88,7 +94,9 @@ >>> r[0].tag 'bar' -The ``xpath()`` method has support for XPath variables:: +The ``xpath()`` method has support for XPath variables: + +.. sourcecode:: pycon >>> expr = "//*[local-name() = $name]" @@ -103,7 +111,9 @@ Optionally, you can provide a ``namespaces`` keyword argument, which should be a dictionary mapping the namespace prefixes used in the XPath expression to -namespace URIs:: +namespace URIs: + +.. sourcecode:: pycon >>> f = StringIO('''\ ... <a:foo xmlns:a="http://codespeak.net/ns/test1" @@ -169,7 +179,9 @@ ---------------------------- ElementTree objects have a method ``getpath(element)``, which returns a -structural, absolute XPath expression to find that element:: +structural, absolute XPath expression to find that element: + +.. sourcecode:: pycon >>> a = etree.Element("a") >>> b = etree.SubElement(a, "b") @@ -187,7 +199,9 @@ The ``XPath`` class ------------------- -The ``XPath`` class compiles an XPath expression into a callable function:: +The ``XPath`` class compiles an XPath expression into a callable function: + +.. sourcecode:: pycon >>> root = etree.XML("<root><a><b/></a><b/></root>") @@ -200,7 +214,9 @@ for repeated evaluation of the same XPath expression. Just like the ``xpath()`` method, the ``XPath`` class supports XPath -variables:: +variables: + +.. sourcecode:: pycon >>> count_elements = etree.XPath("count(//*[local-name() = $name])") @@ -212,7 +228,9 @@ This supports very efficient evaluation of modified versions of an XPath expression, as compilation is still only required once. -Prefix-to-namespace mappings can be passed as second parameter:: +Prefix-to-namespace mappings can be passed as second parameter: + +.. sourcecode:: pycon >>> root = etree.XML("<root xmlns='NS'><a><b/></a><b/></root>") @@ -220,7 +238,9 @@ >>> print find(root)[0].tag {NS}b -By default, ``XPath`` supports regular expressions in the EXSLT_ namespace:: +By default, ``XPath`` supports regular expressions in the EXSLT_ namespace: + +.. sourcecode:: pycon >>> regexpNS = "http://exslt.org/regular-expressions" >>> find = etree.XPath("//*[re:test(., '^abc$', 'i')]", @@ -242,7 +262,9 @@ lxml.etree provides two other efficient XPath evaluators that work on ElementTrees or Elements respectively: ``XPathDocumentEvaluator`` and ``XPathElementEvaluator``. They are automatically selected if you use the -XPathEvaluator helper for instantiation:: +XPathEvaluator helper for instantiation: + +.. sourcecode:: pycon >>> root = etree.XML("<root><a><b/></a><b/></root>") >>> xpatheval = etree.XPathEvaluator(root) @@ -274,7 +296,9 @@ lxml.etree bridges this gap through the class ``ETXPath``, which accepts XPath expressions with namespaces in Clark notation. It is identical to the ``XPath`` class, except for the namespace notation. Normally, you would -write:: +write: + +.. sourcecode:: pycon >>> root = etree.XML("<root xmlns='ns'><a><b/></a><b/></root>") @@ -282,7 +306,9 @@ >>> print find(root)[0].tag {ns}b -``ETXPath`` allows you to change this to:: +``ETXPath`` allows you to change this to: + +.. sourcecode:: pycon >>> find = etree.ETXPath("//{ns}b") >>> print find(root)[0].tag @@ -293,7 +319,9 @@ -------------- lxml.etree raises exceptions when errors occur while parsing or evaluating an -XPath expression:: +XPath expression: + +.. sourcecode:: pycon >>> find = etree.XPath("\\") Traceback (most recent call last): @@ -301,14 +329,18 @@ XPathSyntaxError: Invalid expression lxml will also try to give you a hint what went wrong, so if you pass a more -complex expression, you may get a somewhat more specific error:: +complex expression, you may get a somewhat more specific error: + +.. sourcecode:: pycon >>> find = etree.XPath("//*[1.1.1]") Traceback (most recent call last): ... XPathSyntaxError: Invalid predicate -During evaluation, lxml will emit an XPathEvalError on errors:: +During evaluation, lxml will emit an XPathEvalError on errors: + +.. sourcecode:: pycon >>> find = etree.XPath("//ns:a") >>> find(root) @@ -318,7 +350,9 @@ This works for the ``XPath`` class, however, the other evaluators (including the ``xpath()`` method) are one-shot operations that do parsing and evaluation -in one step. They therefore raise evaluation exceptions in all cases:: +in one step. They therefore raise evaluation exceptions in all cases: + +.. sourcecode:: pycon >>> root = etree.Element("test") >>> find = root.xpath("//*[1.1.1]") @@ -345,7 +379,9 @@ ==== lxml.etree introduces a new class, lxml.etree.XSLT. The class can be -given an ElementTree object to construct an XSLT transformer:: +given an ElementTree object to construct an XSLT transformer: + +.. sourcecode:: pycon >>> f = StringIO('''\ ... <xsl:stylesheet version="1.0" @@ -358,7 +394,9 @@ >>> transform = etree.XSLT(xslt_doc) You can then run the transformation on an ElementTree document by simply -calling it, and this results in another ElementTree object:: +calling it, and this results in another ElementTree object: + +.. sourcecode:: pycon >>> f = StringIO('<a><b>Text</b></a>') >>> doc = etree.parse(f) @@ -379,7 +417,9 @@ ------------------- The result of an XSL transformation can be accessed like a normal ElementTree -document:: +document: + +.. sourcecode:: pycon >>> f = StringIO('<a><b>Text</b></a>') >>> doc = etree.parse(f) @@ -389,7 +429,9 @@ 'Text' but, as opposed to normal ElementTree objects, can also be turned into an (XML -or text) string by applying the str() function:: +or text) string by applying the str() function: + +.. sourcecode:: pycon >>> str(result) '<?xml version="1.0"?>\n<foo>Text</foo>\n' @@ -398,13 +440,17 @@ ``xsl:output`` element in the stylesheet. If you want a Python unicode string instead, you should set this encoding to ``UTF-8`` (unless the `ASCII` default is sufficient). This allows you to call the builtin ``unicode()`` function on -the result:: +the result: + +.. sourcecode:: pycon >>> unicode(result) u'<?xml version="1.0"?>\n<foo>Text</foo>\n' You can use other encodings at the cost of multiple recoding. Encodings that -are not supported by Python will result in an error:: +are not supported by Python will result in an error: + +.. sourcecode:: pycon >>> xslt_tree = etree.XML('''\ ... <xsl:stylesheet version="1.0" @@ -419,7 +465,7 @@ >>> result = transform(doc) >>> unicode(result) Traceback (most recent call last): - [...] + ... LookupError: unknown encoding: UCS4 @@ -427,7 +473,9 @@ --------------------- It is possible to pass parameters, in the form of XPath expressions, to the -XSLT template:: +XSLT template: + +.. sourcecode:: pycon >>> xslt_tree = etree.XML('''\ ... <xsl:stylesheet version="1.0" @@ -441,13 +489,17 @@ >>> doc = etree.parse(f) The parameters are passed as keyword parameters to the transform call. First -let's try passing in a simple string expression:: +let's try passing in a simple string expression: + +.. sourcecode:: pycon >>> result = transform(doc, a="'A'") >>> str(result) '<?xml version="1.0"?>\n<foo>A</foo>\n' -Let's try a non-string XPath expression now:: +Let's try a non-string XPath expression now: + +.. sourcecode:: pycon >>> result = transform(doc, a="/a/b/text()") >>> str(result) @@ -459,7 +511,9 @@ Just like `custom extension functions`_, lxml supports custom extension *elements* in XSLT. This means, you can write XSLT code -like this:: +like this: + +.. sourcecode:: xml <xsl:template match="*"> <my:python-extension> @@ -467,7 +521,9 @@ </my:python-extension> </xsl:template> -And then you can implement the element in Python like this:: +And then you can implement the element in Python like this: + +.. sourcecode:: pycon >>> class MyExtElement(etree.XSLTExtension): ... def execute(self, context, self_node, input_node, output_parent): @@ -496,7 +552,9 @@ In XSLT, extension elements can be used like any other XSLT element, except that they must be declared as extensions using the standard -XSLT ``extension-element-prefixes`` option:: +XSLT ``extension-element-prefixes`` option: + +.. sourcecode:: pycon >>> xslt_ext_tree = etree.XML(''' ... <xsl:stylesheet version="1.0" @@ -512,7 +570,9 @@ ... </xsl:stylesheet>''') To register the extension, add its name and namespace to the extension -mapping of the XSLT object:: +mapping of the XSLT object: + +.. sourcecode:: pycon >>> my_extension = MyExtElement() >>> extensions = { ('testns', 'ext') : my_extension } @@ -520,7 +580,9 @@ Note how we pass an instance here, not the class of the extension. Now we can run the transformation and see how our extension is -called:: +called: + +.. sourcecode:: pycon >>> root = etree.XML('<dummy/>') >>> result = transform(root) @@ -533,7 +595,9 @@ the input document and the stylesheet, and you can even call back into the XSLT processor to process templates. Here is an example that passes an Element into the ``.apply_templates()`` method of the -``XSLTExtension`` instance:: +``XSLTExtension`` instance: + +.. sourcecode:: pycon >>> class MyExtElement(etree.XSLTExtension): ... def execute(self, context, self_node, input_node, output_parent): @@ -562,7 +626,9 @@ What you can do, however, is to deepcopy them to make them normal Elements, and then modify them using the normal etree API. So this -will work:: +will work: + +.. sourcecode:: pycon >>> from copy import deepcopy >>> class MyExtElement(etree.XSLTExtension): @@ -587,13 +653,17 @@ There's also a convenience method on ElementTree objects for doing XSL transformations. This is less efficient if you want to apply the same XSL transformation to multiple documents, but is shorter to write for one-shot -operations, as you do not have to instantiate a stylesheet yourself:: +operations, as you do not have to instantiate a stylesheet yourself: + +.. sourcecode:: pycon >>> result = doc.xslt(xslt_tree, a="'A'") >>> str(result) '<?xml version="1.0"?>\n<foo>A</foo>\n' -This is a shortcut for the following code:: +This is a shortcut for the following code: + +.. sourcecode:: pycon >>> transform = etree.XSLT(xslt_tree) >>> result = transform(doc, a="'A'") @@ -635,13 +705,17 @@ --------- If you want to know how your stylesheet performed, pass the ``profile_run`` -keyword to the transform:: +keyword to the transform: + +.. sourcecode:: pycon >>> result = transform(doc, a="/a/b/text()", profile_run=True) >>> profile = result.xslt_profile The value of the ``xslt_profile`` property is an ElementTree with profiling -data about each template, similar to the following:: +data about each template, similar to the following: + +.. sourcecode:: xml <profile> <template rank="1" match="/" name="" mode="" calls="1" time="1" average="1"/> @@ -649,6 +723,8 @@ Note that this is a read-only document. You must not move any of its elements to other documents. Please deep-copy the document if you need to modify it. -If you want to free it from memory, just do:: +If you want to free it from memory, just do: + +.. sourcecode:: pycon >>> del result.xslt_profile From scoder at codespeak.net Mon Mar 3 19:43:23 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:43:23 +0100 (CET) Subject: [Lxml-checkins] r52116 - in lxml/trunk: . doc/html Message-ID: <20080303184323.473F3169ED2@codespeak.net> Author: scoder Date: Mon Mar 3 19:43:21 2008 New Revision: 52116 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/html/style.css Log: r3707 at delle: sbehnel | 2008-03-03 17:58:04 +0100 cleanup Modified: lxml/trunk/doc/html/style.css ============================================================================== --- lxml/trunk/doc/html/style.css (original) +++ lxml/trunk/doc/html/style.css Mon Mar 3 19:43:21 2008 @@ -193,7 +193,6 @@ dt { line-height: 1.5em; margin-left: 1em; - content: "\00BB" " "; } dt:before { @@ -315,4 +314,4 @@ .syntax .vc { color: #bb60d5 } /* Name.Variable.Class */ .syntax .vg { color: #bb60d5 } /* Name.Variable.Global */ .syntax .vi { color: #bb60d5 } /* Name.Variable.Instance */ -.syntax .il { color: #40a070 } /* Literal.Number.Integer.Long */ \ No newline at end of file +.syntax .il { color: #40a070 } /* Literal.Number.Integer.Long */ From scoder at codespeak.net Mon Mar 3 19:43:29 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 3 Mar 2008 19:43:29 +0100 (CET) Subject: [Lxml-checkins] r52117 - in lxml/trunk: . src/lxml Message-ID: <20080303184329.A07C7169ECF@codespeak.net> Author: scoder Date: Mon Mar 3 19:43:29 2008 New Revision: 52117 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/xslt.pxd lxml/trunk/src/lxml/xslt.pxi Log: r3708 at delle: sbehnel | 2008-03-03 17:59:44 +0100 'options' property on XSLTAccessControl class Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Mar 3 19:43:29 2008 @@ -8,6 +8,9 @@ Features added -------------- +* ``XSLTAccessControl`` instances have a property ``options`` that + returns a dict of access configuration options. + * Constant instances ``DENY_ALL`` and ``DENY_WRITE`` on ``XSLTAccessControl`` class. Modified: lxml/trunk/src/lxml/xslt.pxd ============================================================================== --- lxml/trunk/src/lxml/xslt.pxd (original) +++ lxml/trunk/src/lxml/xslt.pxd Mon Mar 3 19:43:29 2008 @@ -123,6 +123,9 @@ cdef int xsltSetSecurityPrefs(xsltSecurityPrefs* sec, xsltSecurityOption option, xsltSecurityCheck func) nogil + cdef xsltSecurityCheck xsltGetSecurityPrefs( + xsltSecurityPrefs* sec, + xsltSecurityOption option) nogil cdef int xsltSetCtxtSecurityPrefs(xsltSecurityPrefs* sec, xsltTransformContext* ctxt) nogil cdef xmlDoc* xsltGetProfileInformation(xsltTransformContext* ctxt) nogil Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Mon Mar 3 19:43:29 2008 @@ -222,6 +222,34 @@ cdef void _register_in_context(self, xslt.xsltTransformContext* ctxt): xslt.xsltSetCtxtSecurityPrefs(self._prefs, ctxt) + property options: + "The access control configuration as a map of options." + def __get__(self): + return { + 'read_file': self._optval(xslt.XSLT_SECPREF_READ_FILE), + 'write_file': self._optval(xslt.XSLT_SECPREF_WRITE_FILE), + 'create_dir': self._optval(xslt.XSLT_SECPREF_CREATE_DIRECTORY), + 'read_network': self._optval(xslt.XSLT_SECPREF_READ_NETWORK), + 'write_network': self._optval(xslt.XSLT_SECPREF_WRITE_NETWORK), + } + + cdef _optval(self, xslt.xsltSecurityOption option): + cdef xslt.xsltSecurityCheck function + function = xslt.xsltGetSecurityPrefs(self._prefs, option) + if function is <xslt.xsltSecurityCheck>xslt.xsltSecurityAllow: + return True + elif function is <xslt.xsltSecurityCheck>xslt.xsltSecurityForbid: + return False + else: + return None + + def __repr__(self): + items = self.options.items() + items.sort() + return "%s(%s)" % ( + python._fqtypename(self).split('.')[-1], + ', '.join(["%s=%r" % item for item in items])) + ################################################################################ # XSLT From scoder at codespeak.net Tue Mar 4 19:31:08 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 4 Mar 2008 19:31:08 +0100 (CET) Subject: [Lxml-checkins] r52168 - in lxml/trunk: . doc Message-ID: <20080304183108.0D073169EF9@codespeak.net> Author: scoder Date: Tue Mar 4 19:31:08 2008 New Revision: 52168 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/tutorial.txt Log: r3724 at delle: sbehnel | 2008-03-03 20:15:13 +0100 doc fix Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Tue Mar 4 19:31:08 2008 @@ -397,8 +397,8 @@ True While this works for the results of the ``text()`` function, lxml will -not to tell you the origin of a string value that was constructed by -the XPath functions ``string()`` or ``concat()``: +not tell you the origin of a string value that was constructed by the +XPath functions ``string()`` or ``concat()``: .. sourcecode:: pycon From scoder at codespeak.net Tue Mar 4 19:31:16 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 4 Mar 2008 19:31:16 +0100 (CET) Subject: [Lxml-checkins] r52169 - in lxml/trunk: . doc Message-ID: <20080304183116.6CE38169F85@codespeak.net> Author: scoder Date: Tue Mar 4 19:31:15 2008 New Revision: 52169 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/tutorial.txt Log: r3725 at delle: sbehnel | 2008-03-04 17:52:27 +0100 tutorial update Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Tue Mar 4 19:31:15 2008 @@ -520,7 +520,15 @@ </root> Note the newline that is appended at the end when pretty printing the -output. +output: + +.. sourcecode:: pycon + + >>> etree.tostring(root, pretty_print=True) + '<root>\n <a>\n <b/>\n </a>\n</root>\n' + + >>> etree.tostring(root) + '<root><a><b/></a></root>' Since lxml 2.0 (and ElementTree 1.3), the serialisation functions can do more than XML serialisation. You can serialise to HTML or extract @@ -528,7 +536,8 @@ .. sourcecode:: pycon - >>> root = etree.XML('<html><head/><body><p>Hello<br/>World</p></body></html>') + >>> root = etree.XML( + ... '<html><head/><body><p>Hello<br/>World</p></body></html>') >>> print etree.tostring(root) # default: method = 'xml' <html><head/><body><p>Hello<br/>World</p></body></html> @@ -548,13 +557,23 @@ >>> print etree.tostring(root, method='text') HelloWorld -For the plain text output, serialising to a Python unicode string +Note that the default encoding for plain text serialisation is UTF-8: + +.. sourcecode:: pycon + + >>> br = root.find('.//br') + >>> br.tail = u'W\xf6rld' + + >>> etree.tostring(root, method='text') + 'HelloW\xc3\xb6rld' + +Here, serialising to a Python unicode string instead of a byte string might become handy. Just pass the ``unicode`` type as encoding: .. sourcecode:: pycon >>> etree.tostring(root, encoding=unicode, method='text') - u'HelloWorld' + u'HelloW\xf6rld' The ElementTree class @@ -605,8 +624,8 @@ <a>eggs</a> </root> -Note that this has changed in lxml 1.3.4 to match the behaviour of the -upcoming lxml 2.0. Before, both would serialise without DTD content, which +Note that this has changed in lxml 1.3.4 to match the behaviour of +lxml 2.0. Before, both would serialise without DTD content, which made lxml loose DTD information in an input-output cycle. From scoder at codespeak.net Tue Mar 4 19:31:22 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 4 Mar 2008 19:31:22 +0100 (CET) Subject: [Lxml-checkins] r52170 - in lxml/trunk: . doc Message-ID: <20080304183122.32934169F61@codespeak.net> Author: scoder Date: Tue Mar 4 19:31:21 2008 New Revision: 52170 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/tutorial.txt Log: r3726 at delle: sbehnel | 2008-03-04 17:55:47 +0100 doc fix Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Tue Mar 4 19:31:21 2008 @@ -625,8 +625,8 @@ </root> Note that this has changed in lxml 1.3.4 to match the behaviour of -lxml 2.0. Before, both would serialise without DTD content, which -made lxml loose DTD information in an input-output cycle. +lxml 2.0. Before, the examples were serialised without DTD content, +which made lxml loose DTD information in an input-output cycle. Parsing from strings and files From scoder at codespeak.net Wed Mar 5 22:17:57 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:17:57 +0100 (CET) Subject: [Lxml-checkins] r52195 - in lxml/trunk: . benchmark doc src/lxml src/lxml/tests Message-ID: <20080305211757.B7253169EE7@codespeak.net> Author: scoder Date: Wed Mar 5 22:17:55 2008 New Revision: 52195 Modified: lxml/trunk/ (props changed) lxml/trunk/benchmark/bench_etree.py lxml/trunk/benchmark/benchbase.py lxml/trunk/doc/tutorial.txt lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/tree.pxd Log: r3730 at delle: sbehnel | 2008-03-04 22:44:06 +0100 rewrite of 'text' serialiser: fix default encoding, faster .tail adding and 'unicode' encoding Modified: lxml/trunk/benchmark/bench_etree.py ============================================================================== --- lxml/trunk/benchmark/bench_etree.py (original) +++ lxml/trunk/benchmark/bench_etree.py Wed Mar 5 22:17:55 2008 @@ -34,6 +34,35 @@ for i in range(1000): child = root[pos] + @with_attributes(False) + @with_text(text=True) + @onlylib('lxe', 'ET') + def bench_tostring_text_ascii(self, root): + self.etree.tostring(root, method="text") + + @with_attributes(False) + @with_text(text=True, utext=True) + @onlylib('lxe') + def bench_tostring_text_utf16(self, root): + self.etree.tostring(root, method="text", encoding='UTF-16') + + @with_attributes(False) + @with_text(text=True, utext=True) + @onlylib('lxe', 'ET') + @children + def bench_tostring_text_utf8_with_tail(self, children): + for child in children: + self.etree.tostring(child, method="text", + encoding='UTF-8', with_tail=True) + + @with_attributes(False) + @with_text(text=True, utext=True) + @onlylib('lxe') + @children + def bench_tostring_text_unicode(self, children): + for child in children: + self.etree.tostring(child, method="text", encoding=unicode) + @with_attributes(True, False) @with_text(text=True, utext=True) def bench_tostring_utf8(self, root): Modified: lxml/trunk/benchmark/benchbase.py ============================================================================== --- lxml/trunk/benchmark/benchbase.py (original) +++ lxml/trunk/benchmark/benchbase.py Wed Mar 5 22:17:55 2008 @@ -200,7 +200,7 @@ el.text = text for ch2 in atoz: for i in range(20 * TREE_FACTOR): - SubElement(el, "{cdefg}%s%05d" % (ch2, i)) + SubElement(el, "{cdefg}%s%05d" % (ch2, i)).tail = text t = current_time() - t return (root, t) @@ -216,7 +216,7 @@ el = SubElement(root, "{abc}"+ch1*5, attributes) el.text = text for ch2 in atoz: - SubElement(el, "{cdefg}%s%05d" % (ch2, i)) + SubElement(el, "{cdefg}%s%05d" % (ch2, i)).tail = text t = current_time() - t return (root, t) @@ -231,8 +231,9 @@ tag_no = count().next children = [ SubElement(c, "{cdefg}a%05d" % i, attributes) for i,c in enumerate(chain(children, children, children)) ] - for child in root: + for child in children: child.text = text + child.tail = text t = current_time() - t return (root, t) @@ -246,8 +247,8 @@ for ch1 in self.atoz: el = SubElement(root, "{abc}"+ch1*5, attributes) el.text = text - SubElement(el, "{cdefg}a00001", attributes) - SubElement(el, "{cdefg}z00000", attributes) + SubElement(el, "{cdefg}a00001", attributes).tail = text + SubElement(el, "{cdefg}z00000", attributes).tail = text t = current_time() - t return (root, t) Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Wed Mar 5 22:17:55 2008 @@ -557,14 +557,20 @@ >>> print etree.tostring(root, method='text') HelloWorld -Note that the default encoding for plain text serialisation is UTF-8: +As for XML serialisation, the default encoding for plain text +serialisation is ASCII: .. sourcecode:: pycon >>> br = root.find('.//br') >>> br.tail = u'W\xf6rld' - >>> etree.tostring(root, method='text') + >>> etree.tostring(root, method='text') # doctest: +ELLIPSIS + Traceback (most recent call last): + ... + UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' ... + + >>> etree.tostring(root, method='text', encoding="UTF-8") 'HelloW\xc3\xb6rld' Here, serialising to a Python unicode string instead of a byte string Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Wed Mar 5 22:17:55 2008 @@ -18,28 +18,46 @@ raise ValueError("unknown output method %r" % method) cdef _textToString(xmlNode* c_node, encoding, bint with_tail): + cdef bint needs_conversion cdef char* c_text + cdef xmlNode* c_text_node + cdef tree.xmlBuffer* c_buffer + + c_buffer = tree.xmlBufferCreate() + if c_buffer is NULL: + return python.PyErr_NoMemory() + with nogil: - c_text = tree.xmlNodeGetContent(c_node) - if c_text is NULL: - python.PyErr_NoMemory() - - text = c_text - tree.xmlFree(c_text) - - if with_tail and _hasTail(c_node): - tail = _collectText(c_node.next) - if tail: - text = text + tail + tree.xmlNodeBufGetContent(c_buffer, c_node) + if with_tail: + c_text_node = _textNodeOrSkip(c_node.next) + while c_text_node is not NULL: + tree.xmlBufferWriteChar(c_buffer, c_text_node.content) + c_text_node = _textNodeOrSkip(c_text_node.next) + c_text = tree.xmlBufferContent(c_buffer) - if encoding is None: - return text - encoding = encoding.upper() - if encoding == 'UTF-8' or encoding == 'ASCII': - return text + try: + needs_conversion = 0 + if encoding is not None: + encoding = encoding.upper() + if encoding != 'UTF-8': + if encoding == 'ASCII': + if isutf8(c_text): + # will raise a decode error below + needs_conversion = 1 + else: + needs_conversion = 1 + + if needs_conversion: + text = python.PyUnicode_DecodeUTF8( + c_text, tree.xmlBufferLength(c_buffer), 'strict') + text = python.PyUnicode_AsEncodedString(text, encoding, 'strict') + else: + text = c_text + finally: + tree.xmlBufferFree(c_buffer); + return text - text = python.PyUnicode_FromEncodedObject(text, 'utf-8', 'strict') - return python.PyUnicode_AsEncodedString(text, encoding, 'strict') cdef _tostring(_Element element, encoding, method, bint write_xml_declaration, bint write_complete_document, Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Wed Mar 5 22:17:55 2008 @@ -1849,6 +1849,27 @@ self.assertEquals(u'ABS?k p? nettetCtail'.encode("UTF-16"), result) + def test_tostring_method_text_unicode(self): + tostring = self.etree.tostring + Element = self.etree.Element + SubElement = self.etree.SubElement + + a = Element('a') + a.text = u'S?k p? nettetA' + a.tail = "tail" + b = SubElement(a, 'b') + b.text = "B" + b.tail = u'S?k p? nettetB' + c = SubElement(a, 'c') + c.text = "C" + + self.assertRaises(UnicodeEncodeError, + tostring, a, method="text") + + self.assertEquals( + u'S?k p? nettetABS?k p? nettetBCtail'.encode('utf-8'), + tostring(a, encoding="UTF-8", method="text")) + def test_tounicode(self): tounicode = self.etree.tounicode Element = self.etree.Element Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Wed Mar 5 22:17:55 2008 @@ -212,6 +212,7 @@ cdef xmlAttr* xmlHasProp(xmlNode* node, char* name) nogil cdef xmlAttr* xmlHasNsProp(xmlNode* node, char* name, char* nameSpace) nogil cdef char* xmlNodeGetContent(xmlNode* cur) nogil + cdef char* xmlNodeBufGetContent(xmlBuffer* buffer, xmlNode* cur) nogil cdef xmlNs* xmlSearchNs(xmlDoc* doc, xmlNode* node, char* prefix) nogil cdef xmlNs* xmlSearchNsByHref(xmlDoc* doc, xmlNode* node, char* href) nogil cdef int xmlIsBlankNode(xmlNode* node) nogil @@ -229,6 +230,7 @@ cdef int xmlReconciliateNs(xmlDoc* doc, xmlNode* tree) nogil cdef xmlNs* xmlNewReconciliedNs(xmlDoc* doc, xmlNode* tree, xmlNs* ns) nogil cdef xmlBuffer* xmlBufferCreate() nogil + cdef void xmlBufferWriteChar(xmlBuffer* buf, char* string) nogil cdef void xmlBufferFree(xmlBuffer* buf) nogil cdef char* xmlBufferContent(xmlBuffer* buf) nogil cdef int xmlBufferLength(xmlBuffer* buf) nogil @@ -249,7 +251,6 @@ xmlNotationTable* table) nogil cdef extern from "libxml/xmlIO.h": - cdef void xmlBufferWriteQuotedString(xmlOutputBuffer* out, char* str) nogil cdef int xmlOutputBufferWriteString(xmlOutputBuffer* out, char* str) nogil cdef int xmlOutputBufferWrite(xmlOutputBuffer* out, int len, char* str) nogil From scoder at codespeak.net Wed Mar 5 22:18:01 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:18:01 +0100 (CET) Subject: [Lxml-checkins] r52196 - lxml/trunk Message-ID: <20080305211801.7FD0A169F10@codespeak.net> Author: scoder Date: Wed Mar 5 22:18:00 2008 New Revision: 52196 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3731 at delle: sbehnel | 2008-03-04 22:45:35 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 5 22:18:00 2008 @@ -24,9 +24,15 @@ Bugs fixed ---------- +* Default encoding for plain text serialisation was different from + that of XML serialisation (UTF-8 instead of ASCII). + Other changes ------------- +* The benchmark suite now uses tail text in the trees, which makes the + absolute numbers incomparable to previous results. + * Generating the HTML documentation now requires Pygments_, which is used to enable syntax highlighting for the doctest examples. From scoder at codespeak.net Wed Mar 5 22:18:05 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:18:05 +0100 (CET) Subject: [Lxml-checkins] r52197 - in lxml/trunk: . src/lxml Message-ID: <20080305211805.D8406169F10@codespeak.net> Author: scoder Date: Wed Mar 5 22:18:05 2008 New Revision: 52197 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/serializer.pxi Log: r3732 at delle: sbehnel | 2008-03-04 23:26:02 +0100 merged internal _tounicode() function into _tostring() function Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Wed Mar 5 22:18:05 2008 @@ -2534,11 +2534,11 @@ :deprecated: use ``tostring(el, encoding=unicode)`` instead. """ if isinstance(element_or_tree, _Element): - return _tounicode(<_Element>element_or_tree, method, 0, pretty_print, - with_tail) + return _tostring(<_Element>element_or_tree, _unicode, method, + 0, 0, pretty_print, with_tail) elif isinstance(element_or_tree, _ElementTree): - return _tounicode((<_ElementTree>element_or_tree)._context_node, - method, 1, pretty_print, with_tail) + return _tostring((<_ElementTree>element_or_tree)._context_node, + _unicode, method, 0, 1, pretty_print, with_tail) else: raise TypeError("Type '%s' cannot be serialized." % type(element_or_tree)) Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Wed Mar 5 22:18:05 2008 @@ -38,7 +38,9 @@ try: needs_conversion = 0 - if encoding is not None: + if encoding is _unicode: + needs_conversion = 1 + elif encoding is not None: encoding = encoding.upper() if encoding != 'UTF-8': if encoding == 'ASCII': @@ -51,7 +53,9 @@ if needs_conversion: text = python.PyUnicode_DecodeUTF8( c_text, tree.xmlBufferLength(c_buffer), 'strict') - text = python.PyUnicode_AsEncodedString(text, encoding, 'strict') + if encoding is not _unicode: + text = python.PyUnicode_AsEncodedString( + text, encoding, 'strict') else: text = c_text finally: @@ -73,21 +77,18 @@ cdef int c_method if element is None: return None - if encoding is None: + c_method = _findOutputMethod(method) + if c_method == OUTPUT_METHOD_TEXT: + return _textToString(element._c_node, encoding, with_tail) + if encoding is None or encoding is _unicode: c_enc = NULL - elif encoding is _unicode: - return _tounicode(element, method, write_complete_document, - pretty_print, with_tail) else: encoding = _utf8(encoding) c_enc = _cstr(encoding) - c_method = _findOutputMethod(method) - if c_method == OUTPUT_METHOD_TEXT: - return _textToString(element._c_node, encoding, with_tail) # it is necessary to *and* find the encoding handler *and* use # encoding during output enchandler = tree.xmlFindCharEncodingHandler(c_enc) - if enchandler is NULL: + if enchandler is NULL and c_enc is not NULL: raise LookupError(python.PyString_FromFormat( "unknown encoding: '%s'", c_enc)) c_buffer = tree.xmlAllocOutputBuffer(enchandler) @@ -106,45 +107,15 @@ c_result_buffer = c_buffer.buffer try: - result = python.PyString_FromStringAndSize( - tree.xmlBufferContent(c_result_buffer), - tree.xmlBufferLength(c_result_buffer)) - finally: - tree.xmlOutputBufferClose(c_buffer) - return result - -cdef _tounicode(_Element element, method, bint write_complete_document, - bint pretty_print, bint with_tail): - """Serialize an element to the Python unicode representation of its XML - tree. - """ - cdef tree.xmlOutputBuffer* c_buffer - cdef tree.xmlBuffer* c_result_buffer - cdef int c_method - if element is None: - return None - c_method = _findOutputMethod(method) - if c_method == OUTPUT_METHOD_TEXT: - text = _textToString(element._c_node, None, with_tail) - return python.PyUnicode_FromEncodedObject(text, 'utf-8', 'strict') - c_buffer = tree.xmlAllocOutputBuffer(NULL) - if c_buffer is NULL: - return python.PyErr_NoMemory() - - with nogil: - _writeNodeToBuffer(c_buffer, element._c_node, NULL, c_method, 0, - write_complete_document, pretty_print, with_tail) - tree.xmlOutputBufferFlush(c_buffer) - if c_buffer.conv is not NULL: - c_result_buffer = c_buffer.conv + if encoding is _unicode: + result = python.PyUnicode_DecodeUTF8( + tree.xmlBufferContent(c_result_buffer), + tree.xmlBufferLength(c_result_buffer), + 'strict') else: - c_result_buffer = c_buffer.buffer - - try: - result = python.PyUnicode_DecodeUTF8( - tree.xmlBufferContent(c_result_buffer), - tree.xmlBufferLength(c_result_buffer), - 'strict') + result = python.PyString_FromStringAndSize( + tree.xmlBufferContent(c_result_buffer), + tree.xmlBufferLength(c_result_buffer)) finally: tree.xmlOutputBufferClose(c_buffer) return result From scoder at codespeak.net Wed Mar 5 22:18:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:18:09 +0100 (CET) Subject: [Lxml-checkins] r52198 - in lxml/trunk: . src/lxml Message-ID: <20080305211809.C6339169F6D@codespeak.net> Author: scoder Date: Wed Mar 5 22:18:09 2008 New Revision: 52198 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi Log: r3733 at delle: sbehnel | 2008-03-04 23:53:12 +0100 cleanup Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Wed Mar 5 22:18:09 2008 @@ -361,26 +361,26 @@ If there was no text to collect, return None """ cdef Py_ssize_t scount - cdef char* text + cdef char* c_text cdef xmlNode* c_node_cur # check for multiple text nodes scount = 0 - text = NULL + c_text = NULL c_node_cur = c_node = _textNodeOrSkip(c_node) while c_node_cur is not NULL: if c_node_cur.content[0] != c'\0': - text = c_node_cur.content + c_text = c_node_cur.content scount = scount + 1 c_node_cur = _textNodeOrSkip(c_node_cur.next) # handle two most common cases first - if text is NULL: + if c_text is NULL: if scount > 0: return '' else: return None if scount == 1: - return funicode(text) + return funicode(c_text) # the rest is not performance critical anymore result = '' From scoder at codespeak.net Wed Mar 5 22:18:14 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:18:14 +0100 (CET) Subject: [Lxml-checkins] r52199 - in lxml/trunk: . src/lxml Message-ID: <20080305211814.8838E169EE7@codespeak.net> Author: scoder Date: Wed Mar 5 22:18:13 2008 New Revision: 52199 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xsltext.pxi Log: r3734 at delle: sbehnel | 2008-03-05 00:02:01 +0100 memory leak in .apply_templates() of XSLT extensions Modified: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- lxml/trunk/src/lxml/xsltext.pxi (original) +++ lxml/trunk/src/lxml/xsltext.pxi Wed Mar 5 22:18:13 2008 @@ -47,19 +47,22 @@ try: while c_node is not NULL: c_next = c_node.next - tree.xmlUnlinkNode(c_node) if c_node.type == tree.XML_TEXT_NODE: - python.PyList_Append(results, _collectText(c_node)) + python.PyList_Append( + results, funicode(c_node.content)) elif c_node.type == tree.XML_ELEMENT_NODE: proxy = _newReadOnlyProxy( context._extension_element_proxy, c_node) - proxy.free_after_use() python.PyList_Append(results, proxy) + # unlink node and make sure it will be freed later on + tree.xmlUnlinkNode(c_node) + proxy.free_after_use() else: raise TypeError("unsupported XSLT result type: %d" % c_node.type) c_node = c_next finally: + # free all intermediate nodes that will not be freed by proxies tree.xmlFreeNode(c_parent) return results From scoder at codespeak.net Wed Mar 5 22:18:18 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:18:18 +0100 (CET) Subject: [Lxml-checkins] r52200 - in lxml/trunk: . src/lxml Message-ID: <20080305211818.20BBD169F6F@codespeak.net> Author: scoder Date: Wed Mar 5 22:18:17 2008 New Revision: 52200 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xsltext.pxi Log: r3735 at delle: sbehnel | 2008-03-05 00:14:55 +0100 docstring Modified: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- lxml/trunk/src/lxml/xsltext.pxi (original) +++ lxml/trunk/src/lxml/xsltext.pxi Wed Mar 5 22:18:17 2008 @@ -7,11 +7,11 @@ """execute(self, context, self_node, input_node, output_parent) Execute this extension element. - Subclasses may append elements to the `output_parent` element - here, or set its text content. To this end, the `input_node` - provides read-only access to the current node in the input - document, and the `self_node` points to the extension element - in the stylesheet. + Subclasses must override this method. They may append + elements to the `output_parent` element here, or set its text + content. To this end, the `input_node` provides read-only + access to the current node in the input document, and the + `self_node` points to the extension element in the stylesheet. """ pass From scoder at codespeak.net Wed Mar 5 22:18:21 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 5 Mar 2008 22:18:21 +0100 (CET) Subject: [Lxml-checkins] r52201 - in lxml/trunk: . src/lxml Message-ID: <20080305211821.D2E0D169F57@codespeak.net> Author: scoder Date: Wed Mar 5 22:18:21 2008 New Revision: 52201 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xsltext.pxi Log: r3736 at delle: sbehnel | 2008-03-05 00:26:13 +0100 comment Modified: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- lxml/trunk/src/lxml/xsltext.pxi (original) +++ lxml/trunk/src/lxml/xsltext.pxi Wed Mar 5 22:18:21 2008 @@ -42,7 +42,7 @@ context._xsltCtxt, c_context_node, NULL) context._xsltCtxt.insert = c_node - results = [] + results = [] # or maybe _collectAttributes(c_parent, 2) ? c_node = c_parent.children try: while c_node is not NULL: From scoder at codespeak.net Thu Mar 6 12:26:29 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 6 Mar 2008 12:26:29 +0100 (CET) Subject: [Lxml-checkins] r52211 - in lxml/trunk: . src/lxml/tests Message-ID: <20080306112629.01AAD169FA7@codespeak.net> Author: scoder Date: Thu Mar 6 12:26:29 2008 New Revision: 52211 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_xslt.py Log: r3744 at delle: sbehnel | 2008-03-06 12:18:10 +0100 fix test case for older libexslt versions Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Thu Mar 6 12:26:29 2008 @@ -176,8 +176,7 @@ xmlns:xsl="http://www.w3.org/1999/XSL/Transform" exclude-result-prefixes="str xsl"> <xsl:template match="text()"> - <xsl:value-of select="str:replace( - str:align(string(.), '---', 'center'), '-', '*')" /> + <xsl:value-of select="str:align(string(.), '***', 'center')" /> </xsl:template> <xsl:template match="*"> <xsl:copy> From scoder at codespeak.net Sun Mar 9 17:52:54 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 9 Mar 2008 17:52:54 +0100 (CET) Subject: [Lxml-checkins] r52344 - lxml/trunk Message-ID: <20080309165254.0725A168414@codespeak.net> Author: scoder Date: Sun Mar 9 17:52:52 2008 New Revision: 52344 Modified: lxml/trunk/ (props changed) lxml/trunk/TODO.txt Log: r3746 at delle: sbehnel | 2008-03-08 09:33:26 +0100 TODO cleanup Modified: lxml/trunk/TODO.txt ============================================================================== --- lxml/trunk/TODO.txt (original) +++ lxml/trunk/TODO.txt Sun Mar 9 17:52:52 2008 @@ -5,12 +5,6 @@ lxml ==== -Exposing libxml2 functionalities --------------------------------- - -* Test XML entities, also in an ElementTree context. - - In general ---------- @@ -24,6 +18,11 @@ * more testing on input/output of encoded filenames, including custom resolvers, relative XSLT imports, ... +* always use '<string>' as URL when tree was parsed from string? (can libxml2 + handle this?) + +* follow PEP 8 in API naming (avoidCamelCase in_favour_of_underscores) + QName ----- @@ -31,6 +30,12 @@ * expose prefix support? +Entities +-------- + +* clean support for entities (is the Entity element class enough?) + + Objectify --------- @@ -52,17 +57,8 @@ access check methods -lxml 2.0 -======== - -* always use '<string>' as URL when tree was parsed from string? (can libxml2 - handle this?) - -* clean up (and remove?) duplicated API for extension functions - -* follow PEP 8 in API naming (avoidCamelCase in_favour_of_underscores) - -* clean support for entities (is the Entity element class enough?) +Maybe +----- * rewrite iterparse() to accept a parser as argument instead of being one (or maybe not: iterparse() can't deal with all parser options From scoder at codespeak.net Sun Mar 9 17:52:59 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 9 Mar 2008 17:52:59 +0100 (CET) Subject: [Lxml-checkins] r52345 - in lxml/trunk: . src/lxml Message-ID: <20080309165259.548B416841E@codespeak.net> Author: scoder Date: Sun Mar 9 17:52:58 2008 New Revision: 52345 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r3747 at delle: sbehnel | 2008-03-08 19:26:49 +0100 cleanup Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Sun Mar 9 17:52:58 2008 @@ -332,15 +332,15 @@ cdef buildNewPrefix(self): ns = python.PyString_FromFormat("ns%d", self._ns_counter) if self._prefix_tail is not None: - ns = ns + self._prefix_tail - self._ns_counter = self._ns_counter + 1 + ns += self._prefix_tail + self._ns_counter += 1 if self._ns_counter < 0: # overflow! self._ns_counter = 0 if self._prefix_tail is None: self._prefix_tail = "A" else: - self._prefix_tail = self._prefix_tail + "A" + self._prefix_tail += "A" return ns cdef xmlNs* _findOrBuildNodeNs(self, xmlNode* c_node, From scoder at codespeak.net Sun Mar 9 17:53:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 9 Mar 2008 17:53:09 +0100 (CET) Subject: [Lxml-checkins] r52347 - in lxml/trunk: . src/lxml Message-ID: <20080309165309.4D114168430@codespeak.net> Author: scoder Date: Sun Mar 9 17:53:08 2008 New Revision: 52347 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r3749 at delle: sbehnel | 2008-03-09 12:12:23 +0100 reuse namespace prefixes instead of building them on each request Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Sun Mar 9 17:53:08 2008 @@ -330,7 +330,11 @@ return (version, encoding) cdef buildNewPrefix(self): - ns = python.PyString_FromFormat("ns%d", self._ns_counter) + if self._ns_counter < python.PyTuple_GET_SIZE(_PREFIX_CACHE): + ns = python.PyTuple_GET_ITEM(_PREFIX_CACHE, self._ns_counter) + python.Py_INCREF(ns) + else: + ns = python.PyString_FromFormat("ns%d", self._ns_counter) if self._prefix_tail is not None: ns += self._prefix_tail self._ns_counter += 1 @@ -364,17 +368,15 @@ dict_result = python.PyDict_GetItemString( _DEFAULT_NAMESPACE_PREFIXES, c_href) if dict_result is not NULL: - c_prefix = _cstr(<object>dict_result) - - if c_prefix is NULL or \ - tree.xmlSearchNs(self._c_doc, c_node, c_prefix) is not NULL: - # try to simulate ElementTree's namespace prefix creation - while 1: + prefix = <object>dict_result + else: prefix = self.buildNewPrefix() - c_prefix = _cstr(prefix) - # make sure it's not used already - if tree.xmlSearchNs(self._c_doc, c_node, c_prefix) is NULL: - break + c_prefix = _cstr(prefix) + + # make sure the prefix is not in use already + while tree.xmlSearchNs(self._c_doc, c_node, c_prefix) is not NULL: + prefix = self.buildNewPrefix() + c_prefix = _cstr(prefix) c_ns = tree.xmlNewNs(c_node, c_href, c_prefix) if c_ns is NULL: @@ -424,6 +426,14 @@ self._setNodeNs(c_node, _cstr(node_ns_utf)) return 0 +cdef __initPrefixCache(): + cdef int i + return tuple([ python.PyString_FromFormat("ns%d", i) + for i from 0 <= i < 30 ]) + +cdef object _PREFIX_CACHE +_PREFIX_CACHE = __initPrefixCache() + cdef extern from "etree_defs.h": # macro call to 't->tp_new()' for fast instantiation cdef _Document NEW_DOCUMENT "PY_NEW" (object t) From scoder at codespeak.net Sun Mar 9 17:53:10 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 9 Mar 2008 17:53:10 +0100 (CET) Subject: [Lxml-checkins] r52346 - in lxml/trunk: . src/lxml Message-ID: <20080309165310.14190168433@codespeak.net> Author: scoder Date: Sun Mar 9 17:53:04 2008 New Revision: 52346 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/etree_defs.h Log: r3748 at delle: sbehnel | 2008-03-08 19:54:32 +0100 faster _isString(), faster _getNsTag() Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Mar 9 17:53:04 2008 @@ -30,6 +30,8 @@ Other changes ------------- +* General API speed-ups. + * The benchmark suite now uses tail text in the trees, which makes the absolute numbers incomparable to previous results. Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Sun Mar 9 17:53:04 2008 @@ -1002,7 +1002,8 @@ cdef char* c_ns_end cdef Py_ssize_t taglen cdef Py_ssize_t nslen - if isinstance(tag, QName): + # _isString() is much faster than isinstance() + if not _isString(tag) and isinstance(tag, QName): tag = (<QName>tag).text tag = _utf8(tag) c_tag = _cstr(tag) Modified: lxml/trunk/src/lxml/etree_defs.h ============================================================================== --- lxml/trunk/src/lxml/etree_defs.h (original) +++ lxml/trunk/src/lxml/etree_defs.h Sun Mar 9 17:53:04 2008 @@ -119,7 +119,9 @@ (__PY_NEW_GLOBAL_EMPTY_TUPLE)), \ NULL)) -#define _isString(obj) PyObject_TypeCheck(obj, &PyBaseString_Type) +#define _isString(obj) (PyString_CheckExact(obj) || \ + PyUnicode_CheckExact(obj) || \ + PyObject_TypeCheck(obj, &PyBaseString_Type)) #define _isElement(c_node) \ (((c_node)->type == XML_ELEMENT_NODE) || \ From lxml-checkins at codespeak.net Tue Mar 11 05:21:27 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 11 Mar 2008 05:21:27 +0100 (CET) Subject: [Lxml-checkins] March 83% OFF Message-ID: <20080311141431.2915.qmail@KardTrade-TRS.trs.su> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080311/3f34146a/attachment.htm From lxml-checkins at codespeak.net Tue Mar 11 05:51:58 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 11 Mar 2008 05:51:58 +0100 (CET) Subject: [Lxml-checkins] 70% OFF! Hurry, 1 Day Left! Message-ID: <20080311065120.14488.qmail@abyu193.neoplus.adsl.tpnet.pl> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080311/22484e7f/attachment-0001.htm From scoder at codespeak.net Tue Mar 11 10:34:25 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 10:34:25 +0100 (CET) Subject: [Lxml-checkins] r52369 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080311093425.CD4BF168450@codespeak.net> Author: scoder Date: Tue Mar 11 10:34:21 2008 New Revision: 52369 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_etree.py Log: r3754 at delle: sbehnel | 2008-03-11 10:33:26 +0100 let el.base property fall back to document URL also for HTML documents Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Tue Mar 11 10:34:21 2008 @@ -930,7 +930,9 @@ cdef char* c_base c_base = tree.xmlNodeGetBase(self._doc._c_doc, self._c_node) if c_base is NULL: - return None + if self._doc._c_doc.URL is NULL: + return None + return self._doc._c_doc.URL # FIXME: this might be UTF-8 or any other 8-bit encoding base = c_base tree.xmlFree(c_base) Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Tue Mar 11 10:34:21 2008 @@ -1715,6 +1715,17 @@ root.get('{http://www.w3.org/XML/1998/namespace}base'), "https://secret/url") + def test_html_base(self): + etree = self.etree + root = etree.HTML("<html><body></body></html>", + base_url="http://no/such/url") + self.assertEquals(root.base, "http://no/such/url") + + def test_html_base_tag(self): + etree = self.etree + root = etree.HTML('<html><head><base href="http://no/such/url"></head></html>') + self.assertEquals(root.base, "http://no/such/url") + def test_dtd_io(self): # check that DTDs that go in also go back out xml = '''\ From scoder at codespeak.net Tue Mar 11 18:38:32 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 18:38:32 +0100 (CET) Subject: [Lxml-checkins] r52383 - lxml/trunk Message-ID: <20080311173832.B5189168564@codespeak.net> Author: scoder Date: Tue Mar 11 18:38:32 2008 New Revision: 52383 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3756 at delle: sbehnel | 2008-03-11 18:37:41 +0100 print library version and library dirs at build time as a hint (not only) to MacOS-X users Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Tue Mar 11 18:38:32 2008 @@ -29,12 +29,25 @@ else: modules = EXT_MODULES + lib_version = libxslt_version() + print("Using build configuration of libxslt %s" % lib_version) + _include_dirs = include_dirs(static_include_dirs) _library_dirs = library_dirs(static_library_dirs) _cflags = cflags(static_cflags) _define_macros = define_macros() _libraries = libraries() + if _library_dirs: + message = "Building against libxml2/libxslt in " + if len(_library_dirs) > 1: + print(message + "one of the following directories:") + for dir in _library_dirs: + print(" " + dir) + else: + print(message + "the following directory: " + + _library_dirs[0]) + if OPTION_AUTO_RPATH: runtime_library_dirs = _library_dirs else: @@ -128,8 +141,7 @@ macros.append(('WITHOUT_THREADING', None)) return macros -def flags(option): - cmd = "%s --%s" % (find_xslt_config(), option) +def run_command(cmd): try: import subprocess except ImportError: @@ -141,10 +153,21 @@ stdout=subprocess.PIPE, stderr=subprocess.PIPE) rf, ef = p.stdout, p.stderr errors = ef.read() + output = rf.read() + return output or '', errors or '' + +def libxslt_version(): + cmd = "%s --version" % find_xslt_config() + output, errors = run_command(cmd) if errors: print("ERROR: %s" % errors) print("** make sure the development packages of libxml2 and libxslt are installed **\n") - return str(rf.read()).split() + return output.strip() + +def flags(option): + cmd = "%s --%s" % (find_xslt_config(), option) + output, _ = run_command(cmd) + return output.split() XSLT_CONFIG = None From scoder at codespeak.net Tue Mar 11 18:40:07 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 18:40:07 +0100 (CET) Subject: [Lxml-checkins] r52384 - lxml/branch/lxml-2.0 Message-ID: <20080311174007.15F8216856E@codespeak.net> Author: scoder Date: Tue Mar 11 18:40:07 2008 New Revision: 52384 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/setupinfo.py lxml/branch/lxml-2.0/version.txt Log: merge -c 52383 Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Tue Mar 11 18:40:07 2008 @@ -2,6 +2,19 @@ lxml changelog ============== +2.0.3 (Under development) +========================= + +Features added +-------------- + +Bugs fixed +---------- + +Other changes +------------- + + 2.0.2 (2008-02-22) ================== Modified: lxml/branch/lxml-2.0/setupinfo.py ============================================================================== --- lxml/branch/lxml-2.0/setupinfo.py (original) +++ lxml/branch/lxml-2.0/setupinfo.py Tue Mar 11 18:40:07 2008 @@ -29,12 +29,25 @@ else: modules = EXT_MODULES + lib_version = libxslt_version() + print("Using build configuration of libxslt %s" % lib_version) + _include_dirs = include_dirs(static_include_dirs) _library_dirs = library_dirs(static_library_dirs) _cflags = cflags(static_cflags) _define_macros = define_macros() _libraries = libraries() + if _library_dirs: + message = "Building against libxml2/libxslt in " + if len(_library_dirs) > 1: + print(message + "one of the following directories:") + for dir in _library_dirs: + print(" " + dir) + else: + print(message + "the following directory: " + + _library_dirs[0]) + if OPTION_AUTO_RPATH: runtime_library_dirs = _library_dirs else: @@ -128,8 +141,7 @@ macros.append(('WITHOUT_THREADING', None)) return macros -def flags(option): - cmd = "%s --%s" % (find_xslt_config(), option) +def run_command(cmd): try: import subprocess except ImportError: @@ -141,10 +153,21 @@ stdout=subprocess.PIPE, stderr=subprocess.PIPE) rf, ef = p.stdout, p.stderr errors = ef.read() + output = rf.read() + return output or '', errors or '' + +def libxslt_version(): + cmd = "%s --version" % find_xslt_config() + output, errors = run_command(cmd) if errors: print("ERROR: %s" % errors) print("** make sure the development packages of libxml2 and libxslt are installed **\n") - return str(rf.read()).split() + return output.strip() + +def flags(option): + cmd = "%s --%s" % (find_xslt_config(), option) + output, _ = run_command(cmd) + return output.split() XSLT_CONFIG = None Modified: lxml/branch/lxml-2.0/version.txt ============================================================================== --- lxml/branch/lxml-2.0/version.txt (original) +++ lxml/branch/lxml-2.0/version.txt Tue Mar 11 18:40:07 2008 @@ -1 +1 @@ -2.0.2 +2.0.3 From scoder at codespeak.net Tue Mar 11 18:45:18 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 18:45:18 +0100 (CET) Subject: [Lxml-checkins] r52385 - lxml/trunk Message-ID: <20080311174518.014D3168519@codespeak.net> Author: scoder Date: Tue Mar 11 18:45:17 2008 New Revision: 52385 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3758 at delle: sbehnel | 2008-03-11 18:44:25 +0100 later Cython version output during build Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Tue Mar 11 18:45:17 2008 @@ -4,7 +4,6 @@ try: from Cython.Distutils import build_ext as build_pyx import Cython.Compiler.Version - print("Building with Cython %s." % Cython.Compiler.Version.version) CYTHON_INSTALLED = True except ImportError: CYTHON_INSTALLED = False @@ -18,6 +17,7 @@ def ext_modules(static_include_dirs, static_library_dirs, static_cflags): if CYTHON_INSTALLED: source_extension = ".pyx" + print("Building with Cython %s." % Cython.Compiler.Version.version) else: print ("NOTE: Trying to build without Cython, pre-generated " "'src/lxml/etree.c' needs to be available.") From scoder at codespeak.net Tue Mar 11 18:45:43 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 18:45:43 +0100 (CET) Subject: [Lxml-checkins] r52386 - lxml/branch/lxml-2.0 Message-ID: <20080311174543.4F0D9168515@codespeak.net> Author: scoder Date: Tue Mar 11 18:45:42 2008 New Revision: 52386 Modified: lxml/branch/lxml-2.0/setupinfo.py Log: merge -c 52385 Modified: lxml/branch/lxml-2.0/setupinfo.py ============================================================================== --- lxml/branch/lxml-2.0/setupinfo.py (original) +++ lxml/branch/lxml-2.0/setupinfo.py Tue Mar 11 18:45:42 2008 @@ -4,7 +4,6 @@ try: from Cython.Distutils import build_ext as build_pyx import Cython.Compiler.Version - print("Building with Cython %s." % Cython.Compiler.Version.version) CYTHON_INSTALLED = True except ImportError: CYTHON_INSTALLED = False @@ -18,6 +17,7 @@ def ext_modules(static_include_dirs, static_library_dirs, static_cflags): if CYTHON_INSTALLED: source_extension = ".pyx" + print("Building with Cython %s." % Cython.Compiler.Version.version) else: print ("NOTE: Trying to build without Cython, pre-generated " "'src/lxml/etree.c' needs to be available.") From scoder at codespeak.net Tue Mar 11 19:00:21 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 19:00:21 +0100 (CET) Subject: [Lxml-checkins] r52388 - in lxml/trunk: . doc Message-ID: <20080311180021.92D9A16856A@codespeak.net> Author: scoder Date: Tue Mar 11 19:00:19 2008 New Revision: 52388 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/objectify.txt Log: r3762 at delle: sbehnel | 2008-03-11 18:59:24 +0100 doc typo Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Tue Mar 11 19:00:19 2008 @@ -1250,7 +1250,7 @@ have to do is configure a different `class lookup`_ mechanism (or write one yourself). -.. _`class lookup`: element-classes.html +.. _`class lookup`: element_classes.html The first step for the setup is to create a new parser that builds objectify documents. The objectify API is meant for data-centric XML From scoder at codespeak.net Tue Mar 11 19:00:37 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 19:00:37 +0100 (CET) Subject: [Lxml-checkins] r52389 - lxml/branch/lxml-2.0/doc Message-ID: <20080311180037.0D12916856A@codespeak.net> Author: scoder Date: Tue Mar 11 19:00:37 2008 New Revision: 52389 Modified: lxml/branch/lxml-2.0/doc/objectify.txt Log: merge -c 52388 Modified: lxml/branch/lxml-2.0/doc/objectify.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/objectify.txt (original) +++ lxml/branch/lxml-2.0/doc/objectify.txt Tue Mar 11 19:00:37 2008 @@ -1132,7 +1132,7 @@ have to do is configure a different `class lookup`_ mechanism (or write one yourself). -.. _`class lookup`: element-classes.html +.. _`class lookup`: element_classes.html The first step for the setup is to create a new parser that builds objectify documents. The objectify API is meant for data-centric XML From scoder at codespeak.net Tue Mar 11 20:32:22 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 20:32:22 +0100 (CET) Subject: [Lxml-checkins] r52391 - in lxml/trunk: . doc Message-ID: <20080311193222.4A2C0168561@codespeak.net> Author: scoder Date: Tue Mar 11 20:32:21 2008 New Revision: 52391 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/build.txt lxml/trunk/setupinfo.py Log: r3764 at delle: sbehnel | 2008-03-11 20:31:24 +0100 use xml2-config in setupinfo.py and support passing the command name explicitly Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Tue Mar 11 20:32:21 2008 @@ -188,7 +188,8 @@ best way to make sure the right version is used is by passing the path to the script as an option to setup.py:: - python setup.py build --with-xslt-config=/path/to/xslt-config + python setup.py build --with-xslt-config=/path/to/xslt-config \ + --with-xml2-config=/path/to/xml2-config To make sure the newer libxml2 and libxslt versions are used at *runtime*, you should add *all* directories where the newer libraries Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Tue Mar 11 20:32:21 2008 @@ -29,8 +29,9 @@ else: modules = EXT_MODULES - lib_version = libxslt_version() - print("Using build configuration of libxslt %s" % lib_version) + lib_versions = get_library_versions() + print("Using build configuration of libxml2 %s and libxslt %s" % + lib_versions) _include_dirs = include_dirs(static_include_dirs) _library_dirs = library_dirs(static_library_dirs) @@ -141,6 +142,8 @@ macros.append(('WITHOUT_THREADING', None)) return macros +_ERROR_PRINTED = False + def run_command(cmd): try: import subprocess @@ -153,23 +156,49 @@ stdout=subprocess.PIPE, stderr=subprocess.PIPE) rf, ef = p.stdout, p.stderr errors = ef.read() + global _ERROR_PRINTED + if errors and not _ERROR_PRINTED: + _ERROR_PRINTED = True + print("ERROR: %s" % errors) + print("** make sure the development packages of libxml2 and libxslt are installed **\n") output = rf.read() - return output or '', errors or '' + return (output or '').strip() -def libxslt_version(): +def get_library_versions(): + cmd = "%s --version" % find_xml2_config() + xml2_version = run_command(cmd) cmd = "%s --version" % find_xslt_config() - output, errors = run_command(cmd) - if errors: - print("ERROR: %s" % errors) - print("** make sure the development packages of libxml2 and libxslt are installed **\n") - return output.strip() + xslt_version = run_command(cmd) + return xml2_version, xslt_version def flags(option): + cmd = "%s --%s" % (find_xml2_config(), option) + xml2_flags = run_command(cmd) cmd = "%s --%s" % (find_xslt_config(), option) - output, _ = run_command(cmd) - return output.split() + xslt_flags = run_command(cmd) + + flag_list = xml2_flags.split() + for flag in xslt_flags.split(): + if flag not in flag_list: + flag_list.append(flag) + return flag_list XSLT_CONFIG = None +XML2_CONFIG = None + +def find_xml2_config(): + global XML2_CONFIG + if XML2_CONFIG: + return XML2_CONFIG + option = '--with-xml2-config=' + for arg in sys.argv: + if arg.startswith(option): + sys.argv.remove(arg) + XML2_CONFIG = arg[len(option):] + return XML2_CONFIG + else: + XML2_CONFIG = os.getenv('XML2_CONFIG', 'xml2-config') + return XML2_CONFIG def find_xslt_config(): global XSLT_CONFIG @@ -182,7 +211,7 @@ XSLT_CONFIG = arg[len(option):] return XSLT_CONFIG else: - XSLT_CONFIG = 'xslt-config' + XSLT_CONFIG = os.getenv('XSLT_CONFIG', 'xslt-config') return XSLT_CONFIG def has_option(name): From scoder at codespeak.net Tue Mar 11 20:32:48 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 20:32:48 +0100 (CET) Subject: [Lxml-checkins] r52392 - in lxml/branch/lxml-2.0: . doc Message-ID: <20080311193248.2447C168561@codespeak.net> Author: scoder Date: Tue Mar 11 20:32:47 2008 New Revision: 52392 Modified: lxml/branch/lxml-2.0/doc/build.txt lxml/branch/lxml-2.0/setupinfo.py Log: merge -c 52391 Modified: lxml/branch/lxml-2.0/doc/build.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/build.txt (original) +++ lxml/branch/lxml-2.0/doc/build.txt Tue Mar 11 20:32:47 2008 @@ -186,7 +186,8 @@ best way to make sure the right version is used is by passing the path to the script as an option to setup.py:: - python setup.py build --with-xslt-config=/path/to/xslt-config + python setup.py build --with-xslt-config=/path/to/xslt-config \ + --with-xml2-config=/path/to/xml2-config To make sure the newer libxml2 and libxslt versions are used at *runtime*, you should add *all* directories where the newer libraries Modified: lxml/branch/lxml-2.0/setupinfo.py ============================================================================== --- lxml/branch/lxml-2.0/setupinfo.py (original) +++ lxml/branch/lxml-2.0/setupinfo.py Tue Mar 11 20:32:47 2008 @@ -29,8 +29,9 @@ else: modules = EXT_MODULES - lib_version = libxslt_version() - print("Using build configuration of libxslt %s" % lib_version) + lib_versions = get_library_versions() + print("Using build configuration of libxml2 %s and libxslt %s" % + lib_versions) _include_dirs = include_dirs(static_include_dirs) _library_dirs = library_dirs(static_library_dirs) @@ -141,6 +142,8 @@ macros.append(('WITHOUT_THREADING', None)) return macros +_ERROR_PRINTED = False + def run_command(cmd): try: import subprocess @@ -153,23 +156,49 @@ stdout=subprocess.PIPE, stderr=subprocess.PIPE) rf, ef = p.stdout, p.stderr errors = ef.read() + global _ERROR_PRINTED + if errors and not _ERROR_PRINTED: + _ERROR_PRINTED = True + print("ERROR: %s" % errors) + print("** make sure the development packages of libxml2 and libxslt are installed **\n") output = rf.read() - return output or '', errors or '' + return (output or '').strip() -def libxslt_version(): +def get_library_versions(): + cmd = "%s --version" % find_xml2_config() + xml2_version = run_command(cmd) cmd = "%s --version" % find_xslt_config() - output, errors = run_command(cmd) - if errors: - print("ERROR: %s" % errors) - print("** make sure the development packages of libxml2 and libxslt are installed **\n") - return output.strip() + xslt_version = run_command(cmd) + return xml2_version, xslt_version def flags(option): + cmd = "%s --%s" % (find_xml2_config(), option) + xml2_flags = run_command(cmd) cmd = "%s --%s" % (find_xslt_config(), option) - output, _ = run_command(cmd) - return output.split() + xslt_flags = run_command(cmd) + + flag_list = xml2_flags.split() + for flag in xslt_flags.split(): + if flag not in flag_list: + flag_list.append(flag) + return flag_list XSLT_CONFIG = None +XML2_CONFIG = None + +def find_xml2_config(): + global XML2_CONFIG + if XML2_CONFIG: + return XML2_CONFIG + option = '--with-xml2-config=' + for arg in sys.argv: + if arg.startswith(option): + sys.argv.remove(arg) + XML2_CONFIG = arg[len(option):] + return XML2_CONFIG + else: + XML2_CONFIG = os.getenv('XML2_CONFIG', 'xml2-config') + return XML2_CONFIG def find_xslt_config(): global XSLT_CONFIG @@ -182,7 +211,7 @@ XSLT_CONFIG = arg[len(option):] return XSLT_CONFIG else: - XSLT_CONFIG = 'xslt-config' + XSLT_CONFIG = os.getenv('XSLT_CONFIG', 'xslt-config') return XSLT_CONFIG def has_option(name): From scoder at codespeak.net Tue Mar 11 20:36:35 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 20:36:35 +0100 (CET) Subject: [Lxml-checkins] r52393 - lxml/branch/lxml-2.0 Message-ID: <20080311193635.66297168561@codespeak.net> Author: scoder Date: Tue Mar 11 20:36:34 2008 New Revision: 52393 Modified: lxml/branch/lxml-2.0/CHANGES.txt Log: changelog Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Tue Mar 11 20:36:34 2008 @@ -14,6 +14,15 @@ Other changes ------------- +* Setting the XSLT_CONFIG and XML2_CONFIG environment variables at + build time will let setup.py pick up the ``xml2-config`` and + ``xslt-config`` scripts from the supplied path name. + +* Passing ``--with-xml2-config=/path/to/xml2-config`` to setup.py will + override the ``xml2-config`` script that is used to determine the C + compiler options. The same applies for the ``--with-xslt-config`` + option. + 2.0.2 (2008-02-22) ================== From scoder at codespeak.net Tue Mar 11 20:39:24 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 11 Mar 2008 20:39:24 +0100 (CET) Subject: [Lxml-checkins] r52394 - lxml/trunk Message-ID: <20080311193924.B24E6168571@codespeak.net> Author: scoder Date: Tue Mar 11 20:39:24 2008 New Revision: 52394 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3767 at delle: sbehnel | 2008-03-11 20:35:06 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Mar 11 20:39:24 2008 @@ -30,7 +30,16 @@ Other changes ------------- -* General API speed-ups. +* Setting the XSLT_CONFIG and XML2_CONFIG environment variables at + build time will let setup.py pick up the ``xml2-config`` and + ``xslt-config`` scripts from the supplied path name. + +* Passing ``--with-xml2-config=/path/to/xml2-config`` to setup.py will + override the ``xml2-config`` script that is used to determine the C + compiler options. The same applies for the ``--with-xslt-config`` + option. + +* Minor API speed-ups. * The benchmark suite now uses tail text in the trees, which makes the absolute numbers incomparable to previous results. From scoder at codespeak.net Wed Mar 12 18:05:00 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 18:05:00 +0100 (CET) Subject: [Lxml-checkins] r52425 - lxml/trunk Message-ID: <20080312170500.DF1B7169F21@codespeak.net> Author: scoder Date: Wed Mar 12 18:05:00 2008 New Revision: 52425 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3771 at delle: sbehnel | 2008-03-12 18:04:09 +0100 default to not calling xml2-config as xslt-config may already know better what needs to be done Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Wed Mar 12 18:05:00 2008 @@ -30,8 +30,12 @@ modules = EXT_MODULES lib_versions = get_library_versions() - print("Using build configuration of libxml2 %s and libxslt %s" % - lib_versions) + if lib_versions[0]: + print("Using build configuration of libxml2 %s and libxslt %s" % + lib_versions) + else: + print("Using build configuration of libxslt %s" % + lib_versions[1]) _include_dirs = include_dirs(static_include_dirs) _library_dirs = library_dirs(static_library_dirs) @@ -144,7 +148,11 @@ _ERROR_PRINTED = False -def run_command(cmd): +def run_command(cmd, *args): + if not cmd: + return '' + if args: + cmd = ' '.join((cmd,) + args) try: import subprocess except ImportError: @@ -165,17 +173,13 @@ return (output or '').strip() def get_library_versions(): - cmd = "%s --version" % find_xml2_config() - xml2_version = run_command(cmd) - cmd = "%s --version" % find_xslt_config() - xslt_version = run_command(cmd) + xml2_version = run_command(find_xml2_config(), "--version") + xslt_version = run_command(find_xslt_config(), "--version") return xml2_version, xslt_version def flags(option): - cmd = "%s --%s" % (find_xml2_config(), option) - xml2_flags = run_command(cmd) - cmd = "%s --%s" % (find_xslt_config(), option) - xslt_flags = run_command(cmd) + xml2_flags = run_command(find_xml2_config(), "--%s" % option) + xslt_flags = run_command(find_xslt_config(), "--%s" % option) flag_list = xml2_flags.split() for flag in xslt_flags.split(): @@ -197,7 +201,8 @@ XML2_CONFIG = arg[len(option):] return XML2_CONFIG else: - XML2_CONFIG = os.getenv('XML2_CONFIG', 'xml2-config') + # default: do nothing, rely only on xslt-config + XML2_CONFIG = os.getenv('XML2_CONFIG', '') return XML2_CONFIG def find_xslt_config(): From scoder at codespeak.net Wed Mar 12 19:48:54 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 19:48:54 +0100 (CET) Subject: [Lxml-checkins] r52430 - in lxml/trunk: . src/lxml Message-ID: <20080312184854.68582169EFD@codespeak.net> Author: scoder Date: Wed Mar 12 19:48:52 2008 New Revision: 52430 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.objectify.pyx Log: r3773 at delle: sbehnel | 2008-03-12 19:46:51 +0100 fix: attribute assignment of custom PyTypes must call the type's own stringify() Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Wed Mar 12 19:48:52 2008 @@ -101,6 +101,9 @@ XML_SCHEMA_INSTANCE_TYPE_ATTR = "{%s}type" % XML_SCHEMA_INSTANCE_NS +# Forward declaration +cdef class PyType + ################################################################################ # Element class for the main API @@ -489,25 +492,27 @@ _setElementValue(new_element, value) cdef _setElementValue(_Element element, value): - cdef python.PyObject* dict_result + cdef python.PyObject* _pytype if value is None: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif isinstance(value, _Element): _replaceElement(element, value) + return else: cetree.delAttributeFromNsName( element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") if python._isString(value): pytype_name = "str" + _pytype = python.PyDict_GetItem(_PYTYPE_DICT, "str") else: pytype_name = _typename(value) - if isinstance(value, bool): - value = _lower_bool(value) + _pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) + if _pytype is not NULL: + value = (<PyType>_pytype).stringify(value) else: value = str(value) - dict_result = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) - if dict_result is not NULL: + if _pytype is not NULL: cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) else: cetree.delAttributeFromNsName(element._c_node, PYTYPE_NAMESPACE, @@ -884,7 +889,7 @@ """ cdef readonly object name cdef readonly object type_check - cdef object _add_text + cdef readonly object stringify cdef object _type cdef object _schema_types def __init__(self, name, type_check, type_class, stringify=None): @@ -900,9 +905,8 @@ self._type = type_class self.type_check = type_check if stringify is None: - self._add_text = _StringValueSetter(str) - else: - self._add_text = _StringValueSetter(stringify) + stringify = str + self.stringify = stringify self._schema_types = [] def __repr__(self): @@ -973,14 +977,6 @@ def __set__(self, types): self._schema_types = list(types) -cdef class _StringValueSetter: - cdef object _stringify - def __init__(self, stringify): - self._stringify = stringify - - def __call__(self, elem, value): - _add_text(elem, self._stringify(value)) - cdef object _PYTYPE_DICT _PYTYPE_DICT = {} @@ -1204,7 +1200,7 @@ pytype_name = _typename(child) pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) if pytype is not NULL: - (<PyType>pytype)._add_text(element, child) + _add_text(element, (<PyType>pytype).stringify(child)) else: has_string_value = 1 child = str(child) From scoder at codespeak.net Wed Mar 12 19:48:59 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 19:48:59 +0100 (CET) Subject: [Lxml-checkins] r52431 - lxml/trunk Message-ID: <20080312184859.75DDC169F09@codespeak.net> Author: scoder Date: Wed Mar 12 19:48:58 2008 New Revision: 52431 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3774 at delle: sbehnel | 2008-03-12 19:47:58 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 12 19:48:58 2008 @@ -24,6 +24,9 @@ Bugs fixed ---------- +* Attribute assignment of custom PyTypes in objectify could fail to + correctly serialise the value to a string. + * Default encoding for plain text serialisation was different from that of XML serialisation (UTF-8 instead of ASCII). From scoder at codespeak.net Wed Mar 12 19:50:45 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 19:50:45 +0100 (CET) Subject: [Lxml-checkins] r52432 - in lxml/branch/lxml-2.0: . src/lxml Message-ID: <20080312185045.9EA39169EFD@codespeak.net> Author: scoder Date: Wed Mar 12 19:50:45 2008 New Revision: 52432 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx Log: merged fix from trunk rev 52430 Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Wed Mar 12 19:50:45 2008 @@ -11,6 +11,9 @@ Bugs fixed ---------- +* Attribute assignment of custom PyTypes in objectify could fail to + correctly serialise the value to a string. + Other changes ------------- Modified: lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx (original) +++ lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx Wed Mar 12 19:50:45 2008 @@ -106,6 +106,9 @@ XML_SCHEMA_INSTANCE_TYPE_ATTR = "{%s}type" % XML_SCHEMA_INSTANCE_NS +# Forward declaration +cdef class PyType + ################################################################################ # Element class for the main API @@ -494,25 +497,27 @@ _setElementValue(new_element, value) cdef _setElementValue(_Element element, value): - cdef python.PyObject* dict_result + cdef python.PyObject* _pytype if value is None: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif isinstance(value, _Element): _replaceElement(element, value) + return else: cetree.delAttributeFromNsName( element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") if python._isString(value): pytype_name = "str" + _pytype = python.PyDict_GetItem(_PYTYPE_DICT, "str") else: pytype_name = _typename(value) - if isinstance(value, bool): - value = _lower_bool(value) + _pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) + if _pytype is not NULL: + value = (<PyType>_pytype).stringify(value) else: value = str(value) - dict_result = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) - if dict_result is not NULL: + if _pytype is not NULL: cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) else: cetree.delAttributeFromNsName(element._c_node, PYTYPE_NAMESPACE, @@ -889,7 +894,7 @@ """ cdef readonly object name cdef readonly object type_check - cdef object _add_text + cdef readonly object stringify cdef object _type cdef object _schema_types def __init__(self, name, type_check, type_class, stringify=None): @@ -905,9 +910,8 @@ self._type = type_class self.type_check = type_check if stringify is None: - self._add_text = _StringValueSetter(str) - else: - self._add_text = _StringValueSetter(stringify) + stringify = str + self.stringify = stringify self._schema_types = [] def __repr__(self): @@ -978,14 +982,6 @@ def __set__(self, types): self._schema_types = list(types) -cdef class _StringValueSetter: - cdef object _stringify - def __init__(self, stringify): - self._stringify = stringify - - def __call__(self, elem, value): - _add_text(elem, self._stringify(value)) - cdef object _PYTYPE_DICT _PYTYPE_DICT = {} @@ -1209,7 +1205,7 @@ pytype_name = _typename(child) pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) if pytype is not NULL: - (<PyType>pytype)._add_text(element, child) + _add_text(element, (<PyType>pytype).stringify(child)) else: has_string_value = 1 child = str(child) From scoder at codespeak.net Wed Mar 12 19:58:18 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 19:58:18 +0100 (CET) Subject: [Lxml-checkins] r52433 - in lxml/trunk: . src/lxml Message-ID: <20080312185818.3AE46169F09@codespeak.net> Author: scoder Date: Wed Mar 12 19:58:15 2008 New Revision: 52433 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.objectify.pyx Log: r3777 at delle: sbehnel | 2008-03-12 19:57:13 +0100 cleanup Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Wed Mar 12 19:58:15 2008 @@ -504,7 +504,7 @@ element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") if python._isString(value): pytype_name = "str" - _pytype = python.PyDict_GetItem(_PYTYPE_DICT, "str") + _pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) else: pytype_name = _typename(value) _pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) From scoder at codespeak.net Wed Mar 12 19:59:27 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 19:59:27 +0100 (CET) Subject: [Lxml-checkins] r52434 - lxml/branch/lxml-2.0/src/lxml Message-ID: <20080312185927.29443169E97@codespeak.net> Author: scoder Date: Wed Mar 12 19:59:26 2008 New Revision: 52434 Modified: lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx Log: merge -c 52433 Modified: lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx (original) +++ lxml/branch/lxml-2.0/src/lxml/lxml.objectify.pyx Wed Mar 12 19:59:26 2008 @@ -509,7 +509,7 @@ element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") if python._isString(value): pytype_name = "str" - _pytype = python.PyDict_GetItem(_PYTYPE_DICT, "str") + _pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) else: pytype_name = _typename(value) _pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) From scoder at codespeak.net Wed Mar 12 20:34:27 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 12 Mar 2008 20:34:27 +0100 (CET) Subject: [Lxml-checkins] r52435 - in lxml/trunk: . src/lxml/tests Message-ID: <20080312193427.18D61169F3B@codespeak.net> Author: scoder Date: Wed Mar 12 20:34:26 2008 New Revision: 52435 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_objectify.py Log: r3780 at delle: sbehnel | 2008-03-12 20:28:21 +0100 extended test case on custom types in objectify Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Wed Mar 12 20:34:26 2008 @@ -84,13 +84,23 @@ ns = self.lookup.get_namespace("otherNS") ns[None] = self.etree.ElementBase + self._orig_types = objectify.getRegisteredTypes() + def tearDown(self): self.lookup.get_namespace("otherNS").clear() objectify.set_pytype_attribute_tag() del self.lookup del self.parser + + for pytype in objectify.getRegisteredTypes(): + pytype.unregister() + for pytype in self._orig_types: + pytype.register() + del self._orig_types + super(ObjectifyTestCase, self).tearDown() + def test_element_nsmap_default(self): elt = objectify.Element("test") self.assertEquals(elt.nsmap, DEFAULT_NSMAP) @@ -1814,39 +1824,80 @@ def test_registered_types(self): orig_types = objectify.getRegisteredTypes() + orig_types[0].unregister() + self.assertEquals(orig_types[1:], objectify.getRegisteredTypes()) - try: - orig_types[0].unregister() - self.assertEquals(orig_types[1:], objectify.getRegisteredTypes()) - - class NewType(objectify.ObjectifiedDataElement): - pass + class NewType(objectify.ObjectifiedDataElement): + pass - def checkMyType(s): - return True + def checkMyType(s): + return True - pytype = objectify.PyType("mytype", checkMyType, NewType) - pytype.register() - self.assert_(pytype in objectify.getRegisteredTypes()) - pytype.unregister() + pytype = objectify.PyType("mytype", checkMyType, NewType) + self.assert_(pytype not in objectify.getRegisteredTypes()) + pytype.register() + self.assert_(pytype in objectify.getRegisteredTypes()) + pytype.unregister() + self.assert_(pytype not in objectify.getRegisteredTypes()) + + pytype.register(before = [objectify.getRegisteredTypes()[0].name]) + self.assertEquals(pytype, objectify.getRegisteredTypes()[0]) + pytype.unregister() + + pytype.register(after = [objectify.getRegisteredTypes()[0].name]) + self.assertNotEqual(pytype, objectify.getRegisteredTypes()[0]) + pytype.unregister() + + self.assertRaises(ValueError, pytype.register, + before = [objectify.getRegisteredTypes()[0].name], + after = [objectify.getRegisteredTypes()[1].name]) + + def test_registered_type_stringify(self): + from datetime import datetime + def parse_date(value): + if len(value) != 14: + raise ValueError(value) + Y = int(value[0:4]) + M = int(value[4:6]) + D = int(value[6:8]) + h = int(value[8:10]) + m = int(value[10:12]) + s = int(value[12:14]) + return datetime(Y, M, D, h, m, s) + + def stringify_date(date): + return date.strftime("%Y%m%d%H%M%S") + + class DatetimeElement(objectify.ObjectifiedDataElement): + @property + def pyval(self): + return parse_date(self.text) + + datetime_type = objectify.PyType( + "datetime", parse_date, DatetimeElement, stringify_date) + datetime_type.xmlSchemaTypes = "dateTime" + datetime_type.register() + + NAMESPACE = "http://foo.net/xmlns" + NAMESPACE_MAP = {'ns': NAMESPACE} + + r = objectify.Element("{%s}root" % NAMESPACE, nsmap=NAMESPACE_MAP) + time = datetime.now() + r.date = time + + self.assert_(isinstance(r.date, DatetimeElement)) + self.assert_(isinstance(r.date.pyval, datetime)) + + self.assertEquals(r.date.pyval, parse_date(stringify_date(time))) + self.assertEquals(r.date.text, stringify_date(time)) - pytype.register(before = [objectify.getRegisteredTypes()[0].name]) - self.assertEquals(pytype, objectify.getRegisteredTypes()[0]) - pytype.unregister() + r.date = objectify.E.date(time) - pytype.register(after = [objectify.getRegisteredTypes()[0].name]) - self.assertNotEqual(pytype, objectify.getRegisteredTypes()[0]) - pytype.unregister() + self.assert_(isinstance(r.date, DatetimeElement)) + self.assert_(isinstance(r.date.pyval, datetime)) - self.assertRaises(ValueError, pytype.register, - before = [objectify.getRegisteredTypes()[0].name], - after = [objectify.getRegisteredTypes()[1].name]) - - finally: - for pytype in objectify.getRegisteredTypes(): - pytype.unregister() - for pytype in orig_types: - pytype.register() + self.assertEquals(r.date.pyval, parse_date(stringify_date(time))) + self.assertEquals(r.date.text, stringify_date(time)) def test_object_path(self): root = self.XML(xml_str) From scoder at codespeak.net Fri Mar 14 08:08:47 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 14 Mar 2008 08:08:47 +0100 (CET) Subject: [Lxml-checkins] r52473 - in lxml/trunk: . benchmark Message-ID: <20080314070847.69BBC169EE7@codespeak.net> Author: scoder Date: Fri Mar 14 08:08:45 2008 New Revision: 52473 Modified: lxml/trunk/ (props changed) lxml/trunk/benchmark/bench_etree.py lxml/trunk/benchmark/bench_objectify.py lxml/trunk/benchmark/bench_xpath.py lxml/trunk/benchmark/bench_xslt.py lxml/trunk/benchmark/benchbase.py Log: r3783 at delle: sbehnel | 2008-03-14 08:07:45 +0100 benchmark fixes Modified: lxml/trunk/benchmark/bench_etree.py ============================================================================== --- lxml/trunk/benchmark/bench_etree.py (original) +++ lxml/trunk/benchmark/bench_etree.py Fri Mar 14 08:08:45 2008 @@ -12,7 +12,7 @@ # Benchmarks ############################################################ -class BenchMark(benchbase.BenchMarkBase): +class BenchMark(benchbase.TreeBenchMark): def bench_iter_children(self, root): for child in root: pass Modified: lxml/trunk/benchmark/bench_objectify.py ============================================================================== --- lxml/trunk/benchmark/bench_objectify.py (original) +++ lxml/trunk/benchmark/bench_objectify.py Fri Mar 14 08:08:45 2008 @@ -3,13 +3,14 @@ from StringIO import StringIO import benchbase -from benchbase import with_attributes, with_text, onlylib, serialized +from benchbase import with_attributes, with_text, onlylib, serialized, children ############################################################ # Benchmarks ############################################################ -class BenchMark(benchbase.BenchMarkBase): +class BenchMark(benchbase.TreeBenchMark): + repeat100 = range(100) repeat1000 = range(1000) repeat3000 = range(3000) @@ -96,6 +97,17 @@ for i in self.repeat1000: el.getchildren() + @children + def bench_elementmaker(self, children): + E = self.objectify.E + for child in children: + root = E.this( + "test", + E.will( + E.do("nothing"), + E.special, + ) + ) if __name__ == '__main__': benchbase.main(BenchMark) Modified: lxml/trunk/benchmark/bench_xpath.py ============================================================================== --- lxml/trunk/benchmark/bench_xpath.py (original) +++ lxml/trunk/benchmark/bench_xpath.py Fri Mar 14 08:08:45 2008 @@ -9,7 +9,7 @@ # Benchmarks ############################################################ -class XPathBenchMark(benchbase.BenchMarkBase): +class XPathBenchMark(benchbase.TreeBenchMark): @onlylib('lxe') @children def bench_xpath_class(self, children): Modified: lxml/trunk/benchmark/bench_xslt.py ============================================================================== --- lxml/trunk/benchmark/bench_xslt.py (original) +++ lxml/trunk/benchmark/bench_xslt.py Fri Mar 14 08:08:45 2008 @@ -9,7 +9,7 @@ # Benchmarks ############################################################ -class XSLTBenchMark(benchbase.BenchMarkBase): +class XSLTBenchMark(benchbase.TreeBenchMark): @onlylib('lxe') def bench_xslt_extensions_old(self, root): tree = self.etree.XML("""\ Modified: lxml/trunk/benchmark/benchbase.py ============================================================================== --- lxml/trunk/benchmark/benchbase.py (original) +++ lxml/trunk/benchmark/benchbase.py Fri Mar 14 08:08:45 2008 @@ -90,7 +90,7 @@ class SkippedTest(Exception): pass -class BenchMarkBase(object): +class TreeBenchMark(object): atoz = string.ascii_lowercase _LIB_NAME_MAP = { @@ -123,8 +123,8 @@ setattr(self, fname + '_xml', lambda : xml) setattr(self, fname + '_children', lambda : root[:]) - attribute_list = list(izip(count(), ({}, _ATTRIBUTES))) - text_list = list(izip(count(), (None, _TEXT, _UTEXT))) + attribute_list = list(enumerate( [{}, _ATTRIBUTES] )) + text_list = list(enumerate( [None, _TEXT, _UTEXT] )) build_name = self._tree_builder_name self.setup_times = [] @@ -447,9 +447,8 @@ pass else: # use fast element creation in lxml.etree - from lxml.elements import classlookup - classlookup.setElementClassLookup( - classlookup.ElementDefaultClassLookup()) + etree.set_element_class_lookup( + etree.ElementDefaultClassLookup()) if len(sys.argv) > 1: if '-a' in sys.argv or '-c' in sys.argv: From scoder at codespeak.net Sun Mar 16 11:50:15 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 11:50:15 +0100 (CET) Subject: [Lxml-checkins] r52573 - lxml/trunk Message-ID: <20080316105015.146BB16A13C@codespeak.net> Author: scoder Date: Sun Mar 16 11:50:14 2008 New Revision: 52573 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3785 at delle: sbehnel | 2008-03-14 16:34:52 +0100 removed reference to unused file Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Sun Mar 16 11:50:14 2008 @@ -12,4 +12,3 @@ recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png recursive-include fake_pyrex *.py include doc/mkhtml.py doc/rest2html.py -exclude doc/pyrex.txt From scoder at codespeak.net Sun Mar 16 11:50:19 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 11:50:19 +0100 (CET) Subject: [Lxml-checkins] r52574 - in lxml/trunk: . doc Message-ID: <20080316105019.DF07F16A13D@codespeak.net> Author: scoder Date: Sun Mar 16 11:50:19 2008 New Revision: 52574 Removed: lxml/trunk/doc/pyrex.txt Modified: lxml/trunk/ (props changed) Log: r3786 at delle: sbehnel | 2008-03-14 16:36:24 +0100 outdated doc file Deleted: /lxml/trunk/doc/pyrex.txt ============================================================================== --- /lxml/trunk/doc/pyrex.txt Sun Mar 16 11:50:19 2008 +++ (empty file) @@ -1,25 +0,0 @@ -Notes on Pyrex -============== - -The lxml wrapper around libxml2 and libxslt is written in Pyrex_. However, -there are known issues with the current version of Pyrex (0.9.3.1) and version -4.x of gcc. Most Linux distributions have the necessary patches applied, but -there is still a certain chance yours hasn't. Also, MacOS-X is known to ship -with GCC 4, so users may run into problems when compiling Pyrex generated code -on this system. If the C compiler fails to compile the file src/lxml/etree.c, -you likely have used an unpatched version of Pyrex to build it. - -There are two ways to get around this problem. First of all, if you are using -a release version of lxml, it should come with the generated C file in the -source distribution. There is no need to regenerate it using Pyrex. - -However, if you want to use more recent SVN versions of lxml or want to work -on the code, you will need Pyrex to regenerate the C-code. If your version of -Pyrex is not patched, you may try to apply the patch that ships with lxml and -is also part of the SVN checkouts. It should fix the remaining problems. -Apply it to the 0.9.3.1 version of Pyrex, rebuild and install it. If the -problems persist, please report to the lxml mailing list. Try to provide a -clear description of what you did to run into the problems and provide the -compiler output that shows the error. - -.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ From scoder at codespeak.net Sun Mar 16 11:50:23 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 11:50:23 +0100 (CET) Subject: [Lxml-checkins] r52575 - lxml/trunk Message-ID: <20080316105023.2693C16A13E@codespeak.net> Author: scoder Date: Sun Mar 16 11:50:22 2008 New Revision: 52575 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3787 at delle: sbehnel | 2008-03-16 11:03:01 +0100 check dependencies on rebuild (if Cython supports it) Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Sun Mar 16 11:50:22 2008 @@ -1,4 +1,4 @@ -import sys, os +import sys, os, os.path from distutils.core import Extension try: @@ -10,6 +10,8 @@ EXT_MODULES = ["lxml.etree", "lxml.objectify"] +PACKAGE_PATH = "src/lxml/" + def env_var(name): value = os.getenv(name, '') return value.split(os.pathsep) @@ -20,12 +22,12 @@ print("Building with Cython %s." % Cython.Compiler.Version.version) else: print ("NOTE: Trying to build without Cython, pre-generated " - "'src/lxml/etree.c' needs to be available.") + "'%setree.c' needs to be available." % PACKAGE_PATH) source_extension = ".c" if OPTION_WITHOUT_OBJECTIFY: modules = [ entry for entry in EXT_MODULES - if 'objectify' not in entry[0] ] + if 'objectify' not in entry ] else: modules = EXT_MODULES @@ -60,10 +62,12 @@ result = [] for module in modules: + main_module_source = PACKAGE_PATH + module + source_extension + dependencies = find_dependencies(module) result.append( Extension( module, - sources = ["src/lxml/" + module + source_extension], + sources = [main_module_source] + dependencies, extra_compile_args = ['-w'] + _cflags, define_macros = _define_macros, include_dirs = _include_dirs, @@ -73,6 +77,31 @@ )) return result +def find_dependencies(module): + if CYTHON_INSTALLED: + from Cython.Compiler.Version import version + if tuple(version.split('.')) <= (0,9,6,12): + return [] + + package_dir = os.path.join(get_base_dir(), PACKAGE_PATH) + files = os.listdir(package_dir) + pxd_files = [ os.path.join(PACKAGE_PATH, filename) for filename in files + if filename.endswith('.pxd') ] + + if 'etree' in module: + pxi_files = [ os.path.join(PACKAGE_PATH, filename) + for filename in files + if filename.endswith('.pxi') + and 'objectpath' not in filename ] + pxd_files = [ filename for filename in pxd_files + if 'etreepublic' not in filename ] + elif 'objectify' in module: + pxi_files = [ os.path.join(PACKAGE_PATH, 'objectpath.pxi') ] + else: + pxi_files = [] + + return pxd_files + pxi_files + def extra_setup_args(): result = {} if CYTHON_INSTALLED: @@ -226,6 +255,9 @@ except ValueError: return False +def get_base_dir(): + return os.path.join(os.getcwd(), os.path.dirname(sys.argv[0])) + # pick up any commandline options OPTION_WITHOUT_OBJECTIFY = has_option('without-objectify') OPTION_WITHOUT_ASSERT = has_option('without-assert') From scoder at codespeak.net Sun Mar 16 11:50:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 11:50:28 +0100 (CET) Subject: [Lxml-checkins] r52576 - in lxml/trunk: . doc src/lxml/html Message-ID: <20080316105028.AC9E116A13E@codespeak.net> Author: scoder Date: Sun Mar 16 11:50:28 2008 New Revision: 52576 Added: lxml/trunk/src/lxml/html/soupparser.py - copied, changed from r51012, lxml/trunk/src/lxml/html/ElementSoup.py Removed: lxml/trunk/src/lxml/html/ElementSoup.py Modified: lxml/trunk/ (props changed) lxml/trunk/doc/elementsoup.txt Log: r3788 at delle: sbehnel | 2008-03-16 11:49:19 +0100 split of ElementSoup module: soupparser.py with consistent API, ElementSoup.py as legace module Modified: lxml/trunk/doc/elementsoup.txt ============================================================================== --- lxml/trunk/doc/elementsoup.txt (original) +++ lxml/trunk/doc/elementsoup.txt Sun Mar 16 11:50:28 2008 @@ -2,9 +2,6 @@ BeautifulSoup Parser ==================== -:Author: - Stefan Behnel - BeautifulSoup_ is a Python package that parses broken HTML. While libxml2 (and thus lxml) can also parse broken HTML, BeautifulSoup is much more forgiving and has superiour `support for encoding detection`_. @@ -12,24 +9,28 @@ .. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/ .. _`support for encoding detection`: http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit -lxml can benefit from the parsing capabilities of BeautifulSoup through the -`lxml.html.ElementSoup` module. It provides two main functions: `parse()` to -parse a file using BeautifulSoup, and `convert_tree()` to convert a +lxml can benefit from the parsing capabilities of BeautifulSoup +through the ``lxml.html.soupparser`` module. It provides three main +functions: ``fromstring()`` and ``parse()`` to parse a string or file +using BeautifulSoup, and `convert_tree()` to convert an existing BeautifulSoup tree into a list of top-level Elements. +The functions ``fromstring()`` and ``parse()`` behave as known from +ElementTree. The first returns a root Element, the latter returns an +ElementTree. + Here is a document full of tag soup, similar to, but not quite like, HTML: .. sourcecode:: pycon >>> tag_soup = '<meta><head><title>Hello</head<body onload=crash()>Hi all<p>' -all you need to do is pass it to the `parse()` function: +all you need to do is pass it to the ``fromstring()`` function: .. sourcecode:: pycon - >>> from lxml.html.ElementSoup import parse - >>> from StringIO import StringIO - >>> root = parse(StringIO(tag_soup)) + >>> from lxml.html.soupparser import fromstring + >>> root = fromstring(tag_soup) To see what we have here, you can serialise it: @@ -49,5 +50,10 @@ already, right? BeautifulSoup did its best, and so now it's a tree. To control which Element implementation is used, you can pass a -``makeelement`` factory function to ``parse()``. By default, this is based on -the HTML parser defined in ``lxml.html``. +``makeelement`` factory function to ``parse()`` and ``fromstring()``. +By default, this is based on the HTML parser defined in ``lxml.html``. + +There is also a legacy module called ``ElementSoup``, which mimics the +interface provided by ElementTree's own ElementSoup_ module. + +.. _ElementSoup: http://effbot.org/zone/element-soup.htm Deleted: /lxml/trunk/src/lxml/html/ElementSoup.py ============================================================================== --- /lxml/trunk/src/lxml/html/ElementSoup.py Sun Mar 16 11:50:28 2008 +++ (empty file) @@ -1,94 +0,0 @@ -__doc__ = """External interface to the BeautifulSoup HTML parser. -""" - -__all__ = ["parse", "convert_tree"] - -from lxml import etree, html -from BeautifulSoup import \ - BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString - - -def parse(file, beautifulsoup=None, makeelement=None): - if beautifulsoup is None: - beautifulsoup = BeautifulSoup - if makeelement is None: - makeelement = html.html_parser.makeelement - if not hasattr(file, 'read'): - file = open(file) - tree = beautifulsoup(file) - root = _convert_tree(tree, makeelement) - # from ET: wrap the document in a html root element, if necessary - if len(root) == 1 and root[0].tag == "html": - return root[0] - root.tag = "html" - return root - -def convert_tree(beautiful_soup_tree, makeelement=None): - if makeelement is None: - makeelement = html.html_parser.makeelement - root = _convert_tree(beautiful_soup_tree, makeelement) - children = root.getchildren() - for child in children: - root.remove(child) - return children - - -# helpers - -def _convert_tree(beautiful_soup_tree, makeelement): - root = makeelement(beautiful_soup_tree.name, - attrib=dict(beautiful_soup_tree.attrs)) - _convert_children(root, beautiful_soup_tree, makeelement) - return root - -def _convert_children(parent, beautiful_soup_tree, makeelement): - SubElement = etree.SubElement - et_child = None - for child in beautiful_soup_tree: - if isinstance(child, Tag): - et_child = SubElement(parent, child.name, attrib=dict( - [(k, unescape(v)) for (k,v) in child.attrs])) - _convert_children(et_child, child, makeelement) - elif type(child) is NavigableString: - _append_text(parent, et_child, unescape(unicode(child))) - else: - if isinstance(child, Comment): - parent.append(etree.Comment(child.string)) - elif isinstance(child, ProcessingInstruction): - parent.append(etree.ProcessingInstruction( - *child.string.split(' ', 1))) - else: # CData - _append_text(parent, et_child, unescape(unicode(child))) - -def _append_text(parent, element, text): - if element is None: - parent.text = (parent.text or '') + text - else: - element.tail = (element.tail or '') + text - - -# copied from ET's ElementSoup - -import htmlentitydefs, re - -handle_entities = re.compile("&(\w+);").sub - -try: - name2codepoint = htmlentitydefs.name2codepoint -except AttributeError: - # Emulate name2codepoint for Python 2.2 and earlier - name2codepoint = {} - for name, entity in htmlentitydefs.entitydefs.items(): - if len(entity) == 1: - name2codepoint[name] = ord(entity) - else: - name2codepoint[name] = int(entity[2:-1]) - -def unescape(string): - # work around oddities in BeautifulSoup's entity handling - def unescape_entity(m): - try: - return unichr(name2codepoint[m.group(1)]) - except KeyError: - return m.group(0) # use as is - return handle_entities(unescape_entity, string) Copied: lxml/trunk/src/lxml/html/soupparser.py (from r51012, lxml/trunk/src/lxml/html/ElementSoup.py) ============================================================================== --- lxml/trunk/src/lxml/html/ElementSoup.py (original) +++ lxml/trunk/src/lxml/html/soupparser.py Sun Mar 16 11:50:28 2008 @@ -1,29 +1,50 @@ __doc__ = """External interface to the BeautifulSoup HTML parser. """ -__all__ = ["parse", "convert_tree"] +__all__ = ["fromstring", "parse", "convert_tree"] from lxml import etree, html from BeautifulSoup import \ BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString +def fromstring(data, beautifulsoup=None, makeelement=None): + """Parse a string of HTML data into an Element tree using the + BeautifulSoup parser. + + Returns the root ``<html>`` Element of the tree. + + You can pass a different BeautifulSoup parser through the + `beautifulsoup` keyword, and a diffent Element factory function + through the `makeelement` keyword. By default, the standard + ``BeautifulSoup`` class and the default factory of `lxml.html` are + used. + """ + return _parse(data, beautifulsoup, makeelement) + def parse(file, beautifulsoup=None, makeelement=None): - if beautifulsoup is None: - beautifulsoup = BeautifulSoup - if makeelement is None: - makeelement = html.html_parser.makeelement + """Parse a file into an ElemenTree using the BeautifulSoup parser. + + You can pass a different BeautifulSoup parser through the + `beautifulsoup` keyword, and a diffent Element factory function + through the `makeelement` keyword. By default, the standard + ``BeautifulSoup`` class and the default factory of `lxml.html` are + used. + """ if not hasattr(file, 'read'): file = open(file) - tree = beautifulsoup(file) - root = _convert_tree(tree, makeelement) - # from ET: wrap the document in a html root element, if necessary - if len(root) == 1 and root[0].tag == "html": - return root[0] - root.tag = "html" - return root + root = _parse(file, beautifulsoup, makeelement) + return etree.ElementTree(root) def convert_tree(beautiful_soup_tree, makeelement=None): + """Convert a BeautifulSoup tree to a list of Element trees. + + Returns a list instead of a single root Element to support + HTML-like soup with more than one root element. + + You can pass a different Element factory through the `makeelement` + keyword. + """ if makeelement is None: makeelement = html.html_parser.makeelement root = _convert_tree(beautiful_soup_tree, makeelement) @@ -35,6 +56,19 @@ # helpers +def _parse(source, beautifulsoup, makeelement): + if beautifulsoup is None: + beautifulsoup = BeautifulSoup + if makeelement is None: + makeelement = html.html_parser.makeelement + tree = beautifulsoup(source) + root = _convert_tree(tree, makeelement) + # from ET: wrap the document in a html root element, if necessary + if len(root) == 1 and root[0].tag == "html": + return root[0] + root.tag = "html" + return root + def _convert_tree(beautiful_soup_tree, makeelement): root = makeelement(beautiful_soup_tree.name, attrib=dict(beautiful_soup_tree.attrs)) From scoder at codespeak.net Sun Mar 16 12:04:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 12:04:03 +0100 (CET) Subject: [Lxml-checkins] r52578 - in lxml/branch/lxml-2.0: . doc src/lxml/html Message-ID: <20080316110403.0100C16A130@codespeak.net> Author: scoder Date: Sun Mar 16 12:04:03 2008 New Revision: 52578 Added: lxml/branch/lxml-2.0/src/lxml/html/soupparser.py - copied unchanged from r52576, lxml/trunk/src/lxml/html/soupparser.py Removed: lxml/branch/lxml-2.0/src/lxml/html/ElementSoup.py Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/doc/elementsoup.txt Log: merge -c 52576 Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Sun Mar 16 12:04:03 2008 @@ -17,6 +17,10 @@ Other changes ------------- +* ``lxml.html.ElementSoup`` was replaced by a new module + ``lxml.html.soupparser`` with a more consistent API. The old module + remains for compatibility with ElementTree's own ElementSoup module. + * Setting the XSLT_CONFIG and XML2_CONFIG environment variables at build time will let setup.py pick up the ``xml2-config`` and ``xslt-config`` scripts from the supplied path name. Modified: lxml/branch/lxml-2.0/doc/elementsoup.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/elementsoup.txt (original) +++ lxml/branch/lxml-2.0/doc/elementsoup.txt Sun Mar 16 12:04:03 2008 @@ -2,9 +2,6 @@ BeautifulSoup Parser ==================== -:Author: - Stefan Behnel - BeautifulSoup_ is a Python package that parses broken HTML. While libxml2 (and thus lxml) can also parse broken HTML, BeautifulSoup is much more forgiving and has superiour `support for encoding detection`_. @@ -12,22 +9,32 @@ .. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/ .. _`support for encoding detection`: http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit -lxml can benefit from the parsing capabilities of BeautifulSoup through the -`lxml.html.ElementSoup` module. It provides two main functions: `parse()` to -parse a file using BeautifulSoup, and `convert_tree()` to convert a +lxml can benefit from the parsing capabilities of BeautifulSoup +through the ``lxml.html.soupparser`` module. It provides three main +functions: ``fromstring()`` and ``parse()`` to parse a string or file +using BeautifulSoup, and `convert_tree()` to convert an existing BeautifulSoup tree into a list of top-level Elements. -Here is a document full of tag soup, similar to, but not quite like, HTML:: +The functions ``fromstring()`` and ``parse()`` behave as known from +ElementTree. The first returns a root Element, the latter returns an +ElementTree. + +Here is a document full of tag soup, similar to, but not quite like, HTML: + +.. sourcecode:: pycon >>> tag_soup = '<meta><head><title>Hello</head<body onload=crash()>Hi all<p>' -all you need to do is pass it to the `parse()` function:: +all you need to do is pass it to the ``fromstring()`` function: + +.. sourcecode:: pycon - >>> from lxml.html.ElementSoup import parse - >>> from StringIO import StringIO - >>> root = parse(StringIO(tag_soup)) + >>> from lxml.html.soupparser import fromstring + >>> root = fromstring(tag_soup) -To see what we have here, you can serialise it:: +To see what we have here, you can serialise it: + +.. sourcecode:: pycon >>> from lxml.etree import tostring >>> print tostring(root, pretty_print=True), @@ -43,5 +50,10 @@ already, right? BeautifulSoup did its best, and so now it's a tree. To control which Element implementation is used, you can pass a -``makeelement`` factory function to ``parse()``. By default, this is based on -the HTML parser defined in ``lxml.html``. +``makeelement`` factory function to ``parse()`` and ``fromstring()``. +By default, this is based on the HTML parser defined in ``lxml.html``. + +There is also a legacy module called ``ElementSoup``, which mimics the +interface provided by ElementTree's own ElementSoup_ module. + +.. _ElementSoup: http://effbot.org/zone/element-soup.htm Deleted: /lxml/branch/lxml-2.0/src/lxml/html/ElementSoup.py ============================================================================== --- /lxml/branch/lxml-2.0/src/lxml/html/ElementSoup.py Sun Mar 16 12:04:03 2008 +++ (empty file) @@ -1,94 +0,0 @@ -__doc__ = """External interface to the BeautifulSoup HTML parser. -""" - -__all__ = ["parse", "convert_tree"] - -from lxml import etree, html -from BeautifulSoup import \ - BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString - - -def parse(file, beautifulsoup=None, makeelement=None): - if beautifulsoup is None: - beautifulsoup = BeautifulSoup - if makeelement is None: - makeelement = html.html_parser.makeelement - if not hasattr(file, 'read'): - file = open(file) - tree = beautifulsoup(file) - root = _convert_tree(tree, makeelement) - # from ET: wrap the document in a html root element, if necessary - if len(root) == 1 and root[0].tag == "html": - return root[0] - root.tag = "html" - return root - -def convert_tree(beautiful_soup_tree, makeelement=None): - if makeelement is None: - makeelement = html.html_parser.makeelement - root = _convert_tree(beautiful_soup_tree, makeelement) - children = root.getchildren() - for child in children: - root.remove(child) - return children - - -# helpers - -def _convert_tree(beautiful_soup_tree, makeelement): - root = makeelement(beautiful_soup_tree.name, - attrib=dict(beautiful_soup_tree.attrs)) - _convert_children(root, beautiful_soup_tree, makeelement) - return root - -def _convert_children(parent, beautiful_soup_tree, makeelement): - SubElement = etree.SubElement - et_child = None - for child in beautiful_soup_tree: - if isinstance(child, Tag): - et_child = SubElement(parent, child.name, attrib=dict( - [(k, unescape(v)) for (k,v) in child.attrs])) - _convert_children(et_child, child, makeelement) - elif type(child) is NavigableString: - _append_text(parent, et_child, unescape(unicode(child))) - else: - if isinstance(child, Comment): - parent.append(etree.Comment(child.string)) - elif isinstance(child, ProcessingInstruction): - parent.append(etree.ProcessingInstruction( - *child.string.split(' ', 1))) - else: # CData - _append_text(parent, et_child, unescape(unicode(child))) - -def _append_text(parent, element, text): - if element is None: - parent.text = (parent.text or '') + text - else: - element.tail = (element.tail or '') + text - - -# copied from ET's ElementSoup - -import htmlentitydefs, re - -handle_entities = re.compile("&(\w+);").sub - -try: - name2codepoint = htmlentitydefs.name2codepoint -except AttributeError: - # Emulate name2codepoint for Python 2.2 and earlier - name2codepoint = {} - for name, entity in htmlentitydefs.entitydefs.items(): - if len(entity) == 1: - name2codepoint[name] = ord(entity) - else: - name2codepoint[name] = int(entity[2:-1]) - -def unescape(string): - # work around oddities in BeautifulSoup's entity handling - def unescape_entity(m): - try: - return unichr(name2codepoint[m.group(1)]) - except KeyError: - return m.group(0) # use as is - return handle_entities(unescape_entity, string) From scoder at codespeak.net Sun Mar 16 12:05:13 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 12:05:13 +0100 (CET) Subject: [Lxml-checkins] r52579 - in lxml/trunk: . src/lxml/html Message-ID: <20080316110513.0164416A130@codespeak.net> Author: scoder Date: Sun Mar 16 12:05:13 2008 New Revision: 52579 Added: lxml/trunk/src/lxml/html/ElementSoup.py Modified: lxml/trunk/ (props changed) Log: r3793 at delle: sbehnel | 2008-03-16 12:04:17 +0100 missing module Added: lxml/trunk/src/lxml/html/ElementSoup.py ============================================================================== --- (empty file) +++ lxml/trunk/src/lxml/html/ElementSoup.py Sun Mar 16 12:05:13 2008 @@ -0,0 +1,10 @@ +__doc__ = """Legacy interface to the BeautifulSoup HTML parser. +""" + +__all__ = ["parse", "convert_tree"] + +from soupparser import convert_tree, parse as _parse + +def parse(file, beautifulsoup=None, makeelement=None): + root = _parse(file, beautifulsoup=beautifulsoup, makeelement=makeelement) + return root.getroot() From scoder at codespeak.net Sun Mar 16 12:06:07 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 12:06:07 +0100 (CET) Subject: [Lxml-checkins] r52580 - lxml/branch/lxml-2.0/src/lxml/html Message-ID: <20080316110607.21F8116A130@codespeak.net> Author: scoder Date: Sun Mar 16 12:06:06 2008 New Revision: 52580 Added: lxml/branch/lxml-2.0/src/lxml/html/ElementSoup.py - copied unchanged from r52579, lxml/trunk/src/lxml/html/ElementSoup.py Log: merge -c 52579 From scoder at codespeak.net Sun Mar 16 12:08:13 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 12:08:13 +0100 (CET) Subject: [Lxml-checkins] r52581 - lxml/branch/lxml-2.0 Message-ID: <20080316110813.3D22116A130@codespeak.net> Author: scoder Date: Sun Mar 16 12:08:12 2008 New Revision: 52581 Modified: lxml/branch/lxml-2.0/setupinfo.py Log: merge -c 52425 Modified: lxml/branch/lxml-2.0/setupinfo.py ============================================================================== --- lxml/branch/lxml-2.0/setupinfo.py (original) +++ lxml/branch/lxml-2.0/setupinfo.py Sun Mar 16 12:08:12 2008 @@ -30,8 +30,12 @@ modules = EXT_MODULES lib_versions = get_library_versions() - print("Using build configuration of libxml2 %s and libxslt %s" % - lib_versions) + if lib_versions[0]: + print("Using build configuration of libxml2 %s and libxslt %s" % + lib_versions) + else: + print("Using build configuration of libxslt %s" % + lib_versions[1]) _include_dirs = include_dirs(static_include_dirs) _library_dirs = library_dirs(static_library_dirs) @@ -144,7 +148,11 @@ _ERROR_PRINTED = False -def run_command(cmd): +def run_command(cmd, *args): + if not cmd: + return '' + if args: + cmd = ' '.join((cmd,) + args) try: import subprocess except ImportError: @@ -165,17 +173,13 @@ return (output or '').strip() def get_library_versions(): - cmd = "%s --version" % find_xml2_config() - xml2_version = run_command(cmd) - cmd = "%s --version" % find_xslt_config() - xslt_version = run_command(cmd) + xml2_version = run_command(find_xml2_config(), "--version") + xslt_version = run_command(find_xslt_config(), "--version") return xml2_version, xslt_version def flags(option): - cmd = "%s --%s" % (find_xml2_config(), option) - xml2_flags = run_command(cmd) - cmd = "%s --%s" % (find_xslt_config(), option) - xslt_flags = run_command(cmd) + xml2_flags = run_command(find_xml2_config(), "--%s" % option) + xslt_flags = run_command(find_xslt_config(), "--%s" % option) flag_list = xml2_flags.split() for flag in xslt_flags.split(): @@ -197,7 +201,8 @@ XML2_CONFIG = arg[len(option):] return XML2_CONFIG else: - XML2_CONFIG = os.getenv('XML2_CONFIG', 'xml2-config') + # default: do nothing, rely only on xslt-config + XML2_CONFIG = os.getenv('XML2_CONFIG', '') return XML2_CONFIG def find_xslt_config(): From scoder at codespeak.net Sun Mar 16 12:11:02 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 12:11:02 +0100 (CET) Subject: [Lxml-checkins] r52582 - in lxml/branch/lxml-2.0: . doc Message-ID: <20080316111102.1583E16A130@codespeak.net> Author: scoder Date: Sun Mar 16 12:11:01 2008 New Revision: 52582 Removed: lxml/branch/lxml-2.0/doc/pyrex.txt Modified: lxml/branch/lxml-2.0/MANIFEST.in Log: merge -r 52573:52574 Modified: lxml/branch/lxml-2.0/MANIFEST.in ============================================================================== --- lxml/branch/lxml-2.0/MANIFEST.in (original) +++ lxml/branch/lxml-2.0/MANIFEST.in Sun Mar 16 12:11:01 2008 @@ -12,4 +12,3 @@ recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png recursive-include fake_pyrex *.py include doc/mkhtml.py doc/rest2html.py -exclude doc/pyrex.txt Deleted: /lxml/branch/lxml-2.0/doc/pyrex.txt ============================================================================== --- /lxml/branch/lxml-2.0/doc/pyrex.txt Sun Mar 16 12:11:01 2008 +++ (empty file) @@ -1,25 +0,0 @@ -Notes on Pyrex -============== - -The lxml wrapper around libxml2 and libxslt is written in Pyrex_. However, -there are known issues with the current version of Pyrex (0.9.3.1) and version -4.x of gcc. Most Linux distributions have the necessary patches applied, but -there is still a certain chance yours hasn't. Also, MacOS-X is known to ship -with GCC 4, so users may run into problems when compiling Pyrex generated code -on this system. If the C compiler fails to compile the file src/lxml/etree.c, -you likely have used an unpatched version of Pyrex to build it. - -There are two ways to get around this problem. First of all, if you are using -a release version of lxml, it should come with the generated C file in the -source distribution. There is no need to regenerate it using Pyrex. - -However, if you want to use more recent SVN versions of lxml or want to work -on the code, you will need Pyrex to regenerate the C-code. If your version of -Pyrex is not patched, you may try to apply the patch that ships with lxml and -is also part of the SVN checkouts. It should fix the remaining problems. -Apply it to the 0.9.3.1 version of Pyrex, rebuild and install it. If the -problems persist, please report to the lxml mailing list. Try to provide a -clear description of what you did to run into the problems and provide the -compiler output that shows the error. - -.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ From scoder at codespeak.net Sun Mar 16 13:53:25 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 13:53:25 +0100 (CET) Subject: [Lxml-checkins] r52596 - in lxml/trunk: . src/lxml Message-ID: <20080316125325.766C216A149@codespeak.net> Author: scoder Date: Sun Mar 16 13:53:25 2008 New Revision: 52596 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/parser.pxi Log: r3799 at delle: sbehnel | 2008-03-16 13:52:28 +0100 cleanup, raise a more specific type error on an unparsable source Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Sun Mar 16 13:53:25 2008 @@ -1294,10 +1294,22 @@ cdef _Document _parseDocument(source, _BaseParser parser, base_url): cdef _Document doc + if _isString(source): + # parse the file directly from the filesystem + doc = _parseDocumentFromURL(_encodeFilename(source), parser) + # fix base URL if requested + if base_url is not None: + base_url = _encodeFilenameUTF8(base_url) + if doc._c_doc.URL is not NULL: + tree.xmlFree(doc._c_doc.URL) + doc._c_doc.URL = tree.xmlStrdup(_cstr(base_url)) + return doc + if base_url is not None: url = base_url else: url = _getFilenameForFile(source) + if hasattr(source, 'getvalue') and hasattr(source, 'tell'): # StringIO - reading from start? if source.tell() == 0: @@ -1309,16 +1321,7 @@ return _parseFilelikeDocument( source, _encodeFilenameUTF8(url), parser) - # Otherwise parse the file directly from the filesystem - filename = _encodeFilename(source) - doc = _parseDocumentFromURL(filename, parser) - # fix base URL if requested - if base_url is not None: - base_url = _encodeFilenameUTF8(base_url) - if doc._c_doc.URL is not NULL: - tree.xmlFree(doc._c_doc.URL) - doc._c_doc.URL = tree.xmlStrdup(_cstr(base_url)) - return doc + raise TypeError("cannot parse from '%s'" % python._fqtypename(source)) cdef _Document _parseDocumentFromURL(url, _BaseParser parser): cdef xmlDoc* c_doc From scoder at codespeak.net Sun Mar 16 13:54:08 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 16 Mar 2008 13:54:08 +0100 (CET) Subject: [Lxml-checkins] r52597 - lxml/branch/lxml-2.0/src/lxml Message-ID: <20080316125408.7843616A149@codespeak.net> Author: scoder Date: Sun Mar 16 13:54:06 2008 New Revision: 52597 Modified: lxml/branch/lxml-2.0/src/lxml/parser.pxi Log: merge -c 52596 Modified: lxml/branch/lxml-2.0/src/lxml/parser.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/parser.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/parser.pxi Sun Mar 16 13:54:06 2008 @@ -1302,10 +1302,22 @@ cdef _Document _parseDocument(source, _BaseParser parser, base_url): cdef _Document doc + if _isString(source): + # parse the file directly from the filesystem + doc = _parseDocumentFromURL(_encodeFilename(source), parser) + # fix base URL if requested + if base_url is not None: + base_url = _encodeFilenameUTF8(base_url) + if doc._c_doc.URL is not NULL: + tree.xmlFree(doc._c_doc.URL) + doc._c_doc.URL = tree.xmlStrdup(_cstr(base_url)) + return doc + if base_url is not None: url = base_url else: url = _getFilenameForFile(source) + if hasattr(source, 'getvalue') and hasattr(source, 'tell'): # StringIO - reading from start? if source.tell() == 0: @@ -1317,16 +1329,7 @@ return _parseFilelikeDocument( source, _encodeFilenameUTF8(url), parser) - # Otherwise parse the file directly from the filesystem - filename = _encodeFilename(source) - doc = _parseDocumentFromURL(filename, parser) - # fix base URL if requested - if base_url is not None: - base_url = _encodeFilenameUTF8(base_url) - if doc._c_doc.URL is not NULL: - tree.xmlFree(doc._c_doc.URL) - doc._c_doc.URL = tree.xmlStrdup(_cstr(base_url)) - return doc + raise TypeError("cannot parse from '%s'" % python._fqtypename(source)) cdef _Document _parseDocumentFromURL(url, _BaseParser parser): cdef xmlDoc* c_doc From scoder at codespeak.net Mon Mar 17 13:43:48 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 17 Mar 2008 13:43:48 +0100 (CET) Subject: [Lxml-checkins] r52640 - in lxml/trunk: . src/lxml/tests Message-ID: <20080317124348.2DFF4169E71@codespeak.net> Author: scoder Date: Mon Mar 17 13:43:47 2008 New Revision: 52640 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_elementtree.py Log: r3802 at delle: sbehnel | 2008-03-17 13:42:51 +0100 error test for parse(None) Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Mon Mar 17 13:43:47 2008 @@ -2708,6 +2708,10 @@ parse = self.etree.parse self.assertRaises(IOError, parse, fileInTestDir('notthere.xml')) + def test_parse_error_none(self): + parse = self.etree.parse + self.assertRaises(TypeError, parse, None) + def test_parse_error(self): # ET < 1.3 raises ExpatError parse = self.etree.parse From scoder at codespeak.net Mon Mar 17 17:19:20 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 17 Mar 2008 17:19:20 +0100 (CET) Subject: [Lxml-checkins] r52646 - lxml/trunk Message-ID: <20080317161920.CFE67169E19@codespeak.net> Author: scoder Date: Mon Mar 17 17:19:19 2008 New Revision: 52646 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py lxml/trunk/versioninfo.py Log: r3804 at delle: sbehnel | 2008-03-17 17:09:07 +0100 setupinfo.py fix for testing Cython version Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Mon Mar 17 17:19:19 2008 @@ -1,6 +1,8 @@ import sys, os, os.path from distutils.core import Extension +from versioninfo import get_base_dir, split_version + try: from Cython.Distutils import build_ext as build_pyx import Cython.Compiler.Version @@ -78,10 +80,11 @@ return result def find_dependencies(module): - if CYTHON_INSTALLED: - from Cython.Compiler.Version import version - if tuple(version.split('.')) <= (0,9,6,12): - return [] + if not CYTHON_INSTALLED: + return [] + from Cython.Compiler.Version import version + if split_version(version) <= (0,9,6,12): + return [] package_dir = os.path.join(get_base_dir(), PACKAGE_PATH) files = os.listdir(package_dir) @@ -255,9 +258,6 @@ except ValueError: return False -def get_base_dir(): - return os.path.join(os.getcwd(), os.path.dirname(sys.argv[0])) - # pick up any commandline options OPTION_WITHOUT_OBJECTIFY = has_option('without-objectify') OPTION_WITHOUT_ASSERT = has_option('without-assert') Modified: lxml/trunk/versioninfo.py ============================================================================== --- lxml/trunk/versioninfo.py (original) +++ lxml/trunk/versioninfo.py Mon Mar 17 17:19:19 2008 @@ -5,7 +5,7 @@ def version(): global __LXML_VERSION if __LXML_VERSION is None: - __LXML_VERSION = open(os.path.join(get_src_dir(), 'version.txt')).read().strip() + __LXML_VERSION = open(os.path.join(get_base_dir(), 'version.txt')).read().strip() return __LXML_VERSION def branch_version(): @@ -17,7 +17,7 @@ def svn_version(): _version = version() - src_dir = get_src_dir() + src_dir = get_base_dir() revision = 0 base_url = None @@ -89,7 +89,7 @@ """Extract part of changelog pertaining to version. """ _version = version() - f = open(os.path.join(get_src_dir(), "CHANGES.txt"), 'r') + f = open(os.path.join(get_base_dir(), "CHANGES.txt"), 'r') lines = [] for line in f: if line.startswith('====='): @@ -114,7 +114,7 @@ svn_version += '.0' version_h = open( - os.path.join(get_src_dir(), 'src', 'lxml', 'lxml-version.h'), + os.path.join(get_base_dir(), 'src', 'lxml', 'lxml-version.h'), 'w') version_h.write('''\ #ifndef LXML_VERSION_STRING @@ -123,10 +123,23 @@ ''' % svn_version) version_h.close() -def get_src_dir(): +def get_base_dir(): return os.path.join(os.getcwd(), os.path.dirname(sys.argv[0])) def fix_alphabeta(version, alphabeta): if ('.' + alphabeta) in version: return version return version.replace(alphabeta, '.' + alphabeta) + +def split_version(version): + find_digits = re.compile('([0-9]+)(.*)').match + l = [] + for part in version.split('.'): + try: + l.append( int(part) ) + except ValueError: + match = find_digits(part) + if match: + l.append( int(match.group(1)) ) + l.append( match.group(2) ) + return tuple(l) From scoder at codespeak.net Mon Mar 17 17:19:26 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 17 Mar 2008 17:19:26 +0100 (CET) Subject: [Lxml-checkins] r52647 - in lxml/trunk: . src/lxml Message-ID: <20080317161926.5CDB4169E19@codespeak.net> Author: scoder Date: Mon Mar 17 17:19:25 2008 New Revision: 52647 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi Log: r3805 at delle: sbehnel | 2008-03-17 17:18:23 +0100 replaced _getFilenameForFile() by an equivalent implementation that seems to work better with GTK's threading Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Mon Mar 17 17:19:25 2008 @@ -1107,20 +1107,16 @@ Returns None if not a file object. """ # file instances have a name attribute - try: - return source.name - except AttributeError: - pass + filename = getattr3(source, 'name', None) + if filename is not None: + return filename # gzip file instances have a filename attribute - try: - return source.filename - except AttributeError: - pass + filename = getattr3(source, 'filename', None) + if filename is not None: + return filename # urllib2 provides a geturl() method - try: - geturl = source.geturl - except AttributeError: - # can't determine filename - return None - else: + geturl = getattr3(source, 'geturl', None) + if geturl is not None: return geturl() + # can't determine filename + return None From lxml-checkins at codespeak.net Mon Mar 17 19:10:01 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 17 Mar 2008 19:10:01 +0100 (CET) Subject: [Lxml-checkins] Men's Health id 29161134 Message-ID: <20080318195901.4558.qmail@adsl-pool-222.123.22-143.tttmaxnet.com> Canadian Doctor Latoya Saldana Best Price On Net March 87% OFF! http://www.google.ws/pagead/iclk?sa=l&ai=rgqhe&num=71109&adurl=http://qwcf.changereach.com From scoder at codespeak.net Wed Mar 19 07:44:45 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 19 Mar 2008 07:44:45 +0100 (CET) Subject: [Lxml-checkins] r52712 - in lxml/trunk: . doc src/lxml/html Message-ID: <20080319064445.1C0FB169EA2@codespeak.net> Author: scoder Date: Wed Mar 19 07:44:44 2008 New Revision: 52712 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/elementsoup.txt lxml/trunk/src/lxml/html/soupparser.py Log: r3808 at delle: sbehnel | 2008-03-18 22:17:23 +0100 entity replacement fixes for soupparser/ElementSoup, some cleanup, support for passing keyword arguments on to BS Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 19 07:44:44 2008 @@ -8,6 +8,11 @@ Features added -------------- +* ElementSoup/soupparser.parse() allows passing keyword arguments on + to BeautifulSoup. + +* ``fromstring()`` method in ``lxml.html.soupparser``. + * ``XSLTAccessControl`` instances have a property ``options`` that returns a dict of access configuration options. @@ -24,6 +29,9 @@ Bugs fixed ---------- +* The BeautifulSoup parser did not replace entities, which made them + turn up in text content. + * Attribute assignment of custom PyTypes in objectify could fail to correctly serialise the value to a string. Modified: lxml/trunk/doc/elementsoup.txt ============================================================================== --- lxml/trunk/doc/elementsoup.txt (original) +++ lxml/trunk/doc/elementsoup.txt Wed Mar 19 07:44:44 2008 @@ -53,7 +53,46 @@ ``makeelement`` factory function to ``parse()`` and ``fromstring()``. By default, this is based on the HTML parser defined in ``lxml.html``. -There is also a legacy module called ``ElementSoup``, which mimics the -interface provided by ElementTree's own ElementSoup_ module. +By default, the BeautifulSoup parser also replaces the entities it +finds by their character equivalent. + +.. sourcecode:: pycon + + >>> tag_soup = '<body>©€-õƽ<p>' + >>> body = fromstring(tag_soup).find('.//body') + >>> body.text + u'\xa9\u20ac-\xf5\u01bd' + +If you want them back on the way out, you can serialise with the +'html' method, which will always use escaping for safety reasons: + +.. sourcecode:: pycon + + >>> tostring(body, method="html") + '<body>©€-õƽ<p></p></body>' + + >>> tostring(body, method="html", encoding="utf-8") + '<body>©€-õƽ<p></p></body>' + + >>> tostring(body, method="html", encoding=unicode) + u'<body>©€-õƽ<p></p></body>' + +Otherwise, when serialising to XML, only the plain ASCII encoding will +escape non-ASCII characters: + +.. sourcecode:: pycon + + >>> tostring(body) + '<body>©€-õƽ<p/></body>' + + >>> tostring(body, encoding="utf-8") + '<body>\xc2\xa9\xe2\x82\xac-\xc3\xb5\xc6\xbd<p/></body>' + + >>> tostring(body, encoding=unicode) + u'<body>\xa9\u20ac-\xf5\u01bd<p/></body>' + +There is also a legacy module called ``lxml.html.ElementSoup``, which +mimics the interface provided by ElementTree's own ElementSoup_ +module. .. _ElementSoup: http://effbot.org/zone/element-soup.htm Modified: lxml/trunk/src/lxml/html/soupparser.py ============================================================================== --- lxml/trunk/src/lxml/html/soupparser.py (original) +++ lxml/trunk/src/lxml/html/soupparser.py Wed Mar 19 07:44:44 2008 @@ -8,7 +8,7 @@ BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString -def fromstring(data, beautifulsoup=None, makeelement=None): +def fromstring(data, beautifulsoup=None, makeelement=None, **bsargs): """Parse a string of HTML data into an Element tree using the BeautifulSoup parser. @@ -20,9 +20,9 @@ ``BeautifulSoup`` class and the default factory of `lxml.html` are used. """ - return _parse(data, beautifulsoup, makeelement) + return _parse(data, beautifulsoup, makeelement, **bsargs) -def parse(file, beautifulsoup=None, makeelement=None): +def parse(file, beautifulsoup=None, makeelement=None, **bsargs): """Parse a file into an ElemenTree using the BeautifulSoup parser. You can pass a different BeautifulSoup parser through the @@ -33,7 +33,7 @@ """ if not hasattr(file, 'read'): file = open(file) - root = _parse(file, beautifulsoup, makeelement) + root = _parse(file, beautifulsoup, makeelement, **bsargs) return etree.ElementTree(root) def convert_tree(beautiful_soup_tree, makeelement=None): @@ -56,12 +56,14 @@ # helpers -def _parse(source, beautifulsoup, makeelement): +def _parse(source, beautifulsoup, makeelement, **bsargs): if beautifulsoup is None: beautifulsoup = BeautifulSoup if makeelement is None: makeelement = html.html_parser.makeelement - tree = beautifulsoup(source) + if 'convertEntities' not in bsargs: + bsargs['convertEntities'] = 'html' + tree = beautifulsoup(source, **bsargs) root = _convert_tree(tree, makeelement) # from ET: wrap the document in a html root element, if necessary if len(root) == 1 and root[0].tag == "html": @@ -84,15 +86,15 @@ [(k, unescape(v)) for (k,v) in child.attrs])) _convert_children(et_child, child, makeelement) elif type(child) is NavigableString: - _append_text(parent, et_child, unescape(unicode(child))) + _append_text(parent, et_child, unescape(child)) else: if isinstance(child, Comment): - parent.append(etree.Comment(child.string)) + parent.append(etree.Comment(child)) elif isinstance(child, ProcessingInstruction): parent.append(etree.ProcessingInstruction( - *child.string.split(' ', 1))) + *child.split(' ', 1))) else: # CData - _append_text(parent, et_child, unescape(unicode(child))) + _append_text(parent, et_child, unescape(child)) def _append_text(parent, element, text): if element is None: @@ -103,21 +105,11 @@ # copied from ET's ElementSoup -import htmlentitydefs, re +from htmlentitydefs import name2codepoint +import re handle_entities = re.compile("&(\w+);").sub -try: - name2codepoint = htmlentitydefs.name2codepoint -except AttributeError: - # Emulate name2codepoint for Python 2.2 and earlier - name2codepoint = {} - for name, entity in htmlentitydefs.entitydefs.items(): - if len(entity) == 1: - name2codepoint[name] = ord(entity) - else: - name2codepoint[name] = int(entity[2:-1]) - def unescape(string): # work around oddities in BeautifulSoup's entity handling def unescape_entity(m): From scoder at codespeak.net Wed Mar 19 07:44:51 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 19 Mar 2008 07:44:51 +0100 (CET) Subject: [Lxml-checkins] r52713 - lxml/trunk Message-ID: <20080319064451.607DE169EAA@codespeak.net> Author: scoder Date: Wed Mar 19 07:44:51 2008 New Revision: 52713 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3809 at delle: sbehnel | 2008-03-18 22:23:40 +0100 moved last Changelog entries back to 2.0.3 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 19 07:44:51 2008 @@ -8,11 +8,6 @@ Features added -------------- -* ElementSoup/soupparser.parse() allows passing keyword arguments on - to BeautifulSoup. - -* ``fromstring()`` method in ``lxml.html.soupparser``. - * ``XSLTAccessControl`` instances have a property ``options`` that returns a dict of access configuration options. @@ -29,9 +24,6 @@ Bugs fixed ---------- -* The BeautifulSoup parser did not replace entities, which made them - turn up in text content. - * Attribute assignment of custom PyTypes in objectify could fail to correctly serialise the value to a string. From scoder at codespeak.net Wed Mar 19 07:44:58 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 19 Mar 2008 07:44:58 +0100 (CET) Subject: [Lxml-checkins] r52714 - lxml/trunk Message-ID: <20080319064458.972C0169EB5@codespeak.net> Author: scoder Date: Wed Mar 19 07:44:58 2008 New Revision: 52714 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3810 at delle: sbehnel | 2008-03-18 22:26:03 +0100 removed Changelog that are already in 2.0.3 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 19 07:44:58 2008 @@ -24,24 +24,12 @@ Bugs fixed ---------- -* Attribute assignment of custom PyTypes in objectify could fail to - correctly serialise the value to a string. - * Default encoding for plain text serialisation was different from that of XML serialisation (UTF-8 instead of ASCII). Other changes ------------- -* Setting the XSLT_CONFIG and XML2_CONFIG environment variables at - build time will let setup.py pick up the ``xml2-config`` and - ``xslt-config`` scripts from the supplied path name. - -* Passing ``--with-xml2-config=/path/to/xml2-config`` to setup.py will - override the ``xml2-config`` script that is used to determine the C - compiler options. The same applies for the ``--with-xslt-config`` - option. - * Minor API speed-ups. * The benchmark suite now uses tail text in the trees, which makes the From scoder at codespeak.net Wed Mar 19 07:45:06 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 19 Mar 2008 07:45:06 +0100 (CET) Subject: [Lxml-checkins] r52715 - in lxml/branch/lxml-2.0: . doc src/lxml/html Message-ID: <20080319064506.47209169EA2@codespeak.net> Author: scoder Date: Wed Mar 19 07:45:05 2008 New Revision: 52715 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/doc/elementsoup.txt lxml/branch/lxml-2.0/src/lxml/html/soupparser.py Log: trunk merge 52712: updated BS parser Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Wed Mar 19 07:45:05 2008 @@ -8,9 +8,17 @@ Features added -------------- +* soupparser.parse() allows passing keyword arguments on to + BeautifulSoup. + +* ``fromstring()`` method in ``lxml.html.soupparser``. + Bugs fixed ---------- +* The BeautifulSoup parser (soupparser.py) did not replace entities, + which made them turn up in text content. + * Attribute assignment of custom PyTypes in objectify could fail to correctly serialise the value to a string. Modified: lxml/branch/lxml-2.0/doc/elementsoup.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/elementsoup.txt (original) +++ lxml/branch/lxml-2.0/doc/elementsoup.txt Wed Mar 19 07:45:05 2008 @@ -53,7 +53,46 @@ ``makeelement`` factory function to ``parse()`` and ``fromstring()``. By default, this is based on the HTML parser defined in ``lxml.html``. -There is also a legacy module called ``ElementSoup``, which mimics the -interface provided by ElementTree's own ElementSoup_ module. +By default, the BeautifulSoup parser also replaces the entities it +finds by their character equivalent. + +.. sourcecode:: pycon + + >>> tag_soup = '<body>©€-õƽ<p>' + >>> body = fromstring(tag_soup).find('.//body') + >>> body.text + u'\xa9\u20ac-\xf5\u01bd' + +If you want them back on the way out, you can serialise with the +'html' method, which will always use escaping for safety reasons: + +.. sourcecode:: pycon + + >>> tostring(body, method="html") + '<body>©€-õƽ<p></p></body>' + + >>> tostring(body, method="html", encoding="utf-8") + '<body>©€-õƽ<p></p></body>' + + >>> tostring(body, method="html", encoding=unicode) + u'<body>©€-õƽ<p></p></body>' + +Otherwise, when serialising to XML, only the plain ASCII encoding will +escape non-ASCII characters: + +.. sourcecode:: pycon + + >>> tostring(body) + '<body>©€-õƽ<p/></body>' + + >>> tostring(body, encoding="utf-8") + '<body>\xc2\xa9\xe2\x82\xac-\xc3\xb5\xc6\xbd<p/></body>' + + >>> tostring(body, encoding=unicode) + u'<body>\xa9\u20ac-\xf5\u01bd<p/></body>' + +There is also a legacy module called ``lxml.html.ElementSoup``, which +mimics the interface provided by ElementTree's own ElementSoup_ +module. .. _ElementSoup: http://effbot.org/zone/element-soup.htm Modified: lxml/branch/lxml-2.0/src/lxml/html/soupparser.py ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/html/soupparser.py (original) +++ lxml/branch/lxml-2.0/src/lxml/html/soupparser.py Wed Mar 19 07:45:05 2008 @@ -8,7 +8,7 @@ BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString -def fromstring(data, beautifulsoup=None, makeelement=None): +def fromstring(data, beautifulsoup=None, makeelement=None, **bsargs): """Parse a string of HTML data into an Element tree using the BeautifulSoup parser. @@ -20,9 +20,9 @@ ``BeautifulSoup`` class and the default factory of `lxml.html` are used. """ - return _parse(data, beautifulsoup, makeelement) + return _parse(data, beautifulsoup, makeelement, **bsargs) -def parse(file, beautifulsoup=None, makeelement=None): +def parse(file, beautifulsoup=None, makeelement=None, **bsargs): """Parse a file into an ElemenTree using the BeautifulSoup parser. You can pass a different BeautifulSoup parser through the @@ -33,7 +33,7 @@ """ if not hasattr(file, 'read'): file = open(file) - root = _parse(file, beautifulsoup, makeelement) + root = _parse(file, beautifulsoup, makeelement, **bsargs) return etree.ElementTree(root) def convert_tree(beautiful_soup_tree, makeelement=None): @@ -56,12 +56,14 @@ # helpers -def _parse(source, beautifulsoup, makeelement): +def _parse(source, beautifulsoup, makeelement, **bsargs): if beautifulsoup is None: beautifulsoup = BeautifulSoup if makeelement is None: makeelement = html.html_parser.makeelement - tree = beautifulsoup(source) + if 'convertEntities' not in bsargs: + bsargs['convertEntities'] = 'html' + tree = beautifulsoup(source, **bsargs) root = _convert_tree(tree, makeelement) # from ET: wrap the document in a html root element, if necessary if len(root) == 1 and root[0].tag == "html": @@ -84,15 +86,15 @@ [(k, unescape(v)) for (k,v) in child.attrs])) _convert_children(et_child, child, makeelement) elif type(child) is NavigableString: - _append_text(parent, et_child, unescape(unicode(child))) + _append_text(parent, et_child, unescape(child)) else: if isinstance(child, Comment): - parent.append(etree.Comment(child.string)) + parent.append(etree.Comment(child)) elif isinstance(child, ProcessingInstruction): parent.append(etree.ProcessingInstruction( - *child.string.split(' ', 1))) + *child.split(' ', 1))) else: # CData - _append_text(parent, et_child, unescape(unicode(child))) + _append_text(parent, et_child, unescape(child)) def _append_text(parent, element, text): if element is None: @@ -103,21 +105,11 @@ # copied from ET's ElementSoup -import htmlentitydefs, re +from htmlentitydefs import name2codepoint +import re handle_entities = re.compile("&(\w+);").sub -try: - name2codepoint = htmlentitydefs.name2codepoint -except AttributeError: - # Emulate name2codepoint for Python 2.2 and earlier - name2codepoint = {} - for name, entity in htmlentitydefs.entitydefs.items(): - if len(entity) == 1: - name2codepoint[name] = ord(entity) - else: - name2codepoint[name] = int(entity[2:-1]) - def unescape(string): # work around oddities in BeautifulSoup's entity handling def unescape_entity(m): From scoder at codespeak.net Wed Mar 19 07:45:12 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 19 Mar 2008 07:45:12 +0100 (CET) Subject: [Lxml-checkins] r52716 - lxml/trunk Message-ID: <20080319064512.5359E169EA3@codespeak.net> Author: scoder Date: Wed Mar 19 07:45:11 2008 New Revision: 52716 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3811 at delle: sbehnel | 2008-03-19 07:42:53 +0100 added changelog from 2.0.3 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 19 07:45:11 2008 @@ -74,6 +74,43 @@ ``objectify.set_default_parser()`` +2.0.3 (Under development) +========================= + +Features added +-------------- + +* soupparser.parse() allows passing keyword arguments on to + BeautifulSoup. + +* ``fromstring()`` method in ``lxml.html.soupparser``. + +Bugs fixed +---------- + +* The BeautifulSoup parser (soupparser.py) did not replace entities, + which made them turn up in text content. + +* Attribute assignment of custom PyTypes in objectify could fail to + correctly serialise the value to a string. + +Other changes +------------- + +* ``lxml.html.ElementSoup`` was replaced by a new module + ``lxml.html.soupparser`` with a more consistent API. The old module + remains for compatibility with ElementTree's own ElementSoup module. + +* Setting the XSLT_CONFIG and XML2_CONFIG environment variables at + build time will let setup.py pick up the ``xml2-config`` and + ``xslt-config`` scripts from the supplied path name. + +* Passing ``--with-xml2-config=/path/to/xml2-config`` to setup.py will + override the ``xml2-config`` script that is used to determine the C + compiler options. The same applies for the ``--with-xslt-config`` + option. + + 2.0.2 (2008-02-22) ================== From scoder at codespeak.net Fri Mar 21 08:52:04 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 21 Mar 2008 08:52:04 +0100 (CET) Subject: [Lxml-checkins] r52793 - in lxml/trunk: . src/lxml Message-ID: <20080321075204.6EE141684D5@codespeak.net> Author: scoder Date: Fri Mar 21 08:52:02 2008 New Revision: 52793 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/etree_defs.h lxml/trunk/src/lxml/python.pxd Log: r3817 at delle: sbehnel | 2008-03-20 08:32:48 +0100 macro cleanup in .h/.pxd files Modified: lxml/trunk/src/lxml/etree_defs.h ============================================================================== --- lxml/trunk/src/lxml/etree_defs.h (original) +++ lxml/trunk/src/lxml/etree_defs.h Fri Mar 21 08:52:02 2008 @@ -93,11 +93,6 @@ long _ftol2( double dblSource ) { return _ftol( dblSource ); } #endif -/* Redefinition of some Python builtins as C functions */ -#define callable(o) PyCallable_Check(o) -#define _cstr(s) PyString_AS_STRING(s) -#define _fqtypename(o) (((PyTypeObject*)o)->ob_type->tp_name) - #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) @@ -109,15 +104,11 @@ #define unlikely_condition(x) (x) #endif /* __GNUC__ */ -static PyObject* __PY_NEW_GLOBAL_EMPTY_TUPLE = NULL; - #define PY_NEW(T) \ (((PyTypeObject*)(T))->tp_new( \ - (PyTypeObject*)(T), \ - (unlikely_condition(__PY_NEW_GLOBAL_EMPTY_TUPLE == NULL) ? \ - (__PY_NEW_GLOBAL_EMPTY_TUPLE = PyTuple_New(0)) : \ - (__PY_NEW_GLOBAL_EMPTY_TUPLE)), \ - NULL)) + (PyTypeObject*)(T), __pyx_empty_tuple, NULL)) + +#define _fqtypename(o) (((PyTypeObject*)o)->ob_type->tp_name) #define _isString(obj) (PyString_CheckExact(obj) || \ PyUnicode_CheckExact(obj) || \ Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Mar 21 08:52:02 2008 @@ -103,6 +103,10 @@ cdef void PyEval_RestoreThread(PyThreadState* state) cdef PyObject* PyThreadState_GetDict() + # some handy functions + cdef int callable "PyCallable_Check" (object obj) + cdef char* _cstr "PyString_AS_STRING" (object s) + cdef extern from "pythread.h": ctypedef void* PyThread_type_lock cdef PyThread_type_lock PyThread_allocate_lock() @@ -118,6 +122,4 @@ cdef extern from "etree_defs.h": # redefines some functions as macros cdef int _isString(object obj) cdef char* _fqtypename(object t) - cdef int callable(object obj) - cdef char* _cstr(object s) cdef object PY_NEW(object t) From scoder at codespeak.net Sun Mar 23 18:49:38 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 23 Mar 2008 18:49:38 +0100 (CET) Subject: [Lxml-checkins] r52876 - in lxml/trunk: . src/lxml Message-ID: <20080323174938.7F57B16840C@codespeak.net> Author: scoder Date: Sun Mar 23 18:49:36 2008 New Revision: 52876 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/etree_defs.h Log: r3819 at delle: sbehnel | 2008-03-23 18:06:47 +0100 macro fix Modified: lxml/trunk/src/lxml/etree_defs.h ============================================================================== --- lxml/trunk/src/lxml/etree_defs.h (original) +++ lxml/trunk/src/lxml/etree_defs.h Sun Mar 23 18:49:36 2008 @@ -108,7 +108,7 @@ (((PyTypeObject*)(T))->tp_new( \ (PyTypeObject*)(T), __pyx_empty_tuple, NULL)) -#define _fqtypename(o) (((PyTypeObject*)o)->ob_type->tp_name) +#define _fqtypename(o) (((PyTypeObject*)(o))->ob_type->tp_name) #define _isString(obj) (PyString_CheckExact(obj) || \ PyUnicode_CheckExact(obj) || \ From scoder at codespeak.net Sun Mar 23 18:49:43 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 23 Mar 2008 18:49:43 +0100 (CET) Subject: [Lxml-checkins] r52877 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080323174943.3CC42168411@codespeak.net> Author: scoder Date: Sun Mar 23 18:49:42 2008 New Revision: 52877 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/iterparse.pxi lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/tests/test_xmlschema.py lxml/trunk/src/lxml/xmlschema.pxi Log: r3820 at delle: sbehnel | 2008-03-23 18:48:28 +0100 fix for iterparse crash with XML schema validation Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Mar 23 18:49:42 2008 @@ -88,6 +88,8 @@ Bugs fixed ---------- +* Crash when using ``iterparse()`` with XML Schema validation. + * The BeautifulSoup parser (soupparser.py) did not replace entities, which made them turn up in text content. Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Sun Mar 23 18:49:42 2008 @@ -382,7 +382,8 @@ error = xmlparser.xmlParseChunk(pctxt, NULL, 0, 1) self._source = None break - if error != 0: + if error != 0 or (context._validator is not None and + not context._validator.isvalid()): self._source = None del context._events[:] _raiseParseError(pctxt, self._filename, context._error_log) Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Sun Mar 23 18:49:42 2008 @@ -387,6 +387,8 @@ cdef python.PyThread_type_lock _lock def __dealloc__(self): + if self._validator is not None: + self._validator.disconnect() if self._lock is not NULL: python.PyThread_free_lock(self._lock) if self._c_ctxt is not NULL: @@ -425,10 +427,10 @@ return 0 cdef int cleanup(self) except -1: - self._resetParserContext() - self.clear() if self._validator is not None: self._validator.disconnect() + self._resetParserContext() + self.clear() self._error_log.disconnect() if config.ENABLE_THREADING and self._lock is not NULL: python.PyThread_release_lock(self._lock) Modified: lxml/trunk/src/lxml/tests/test_xmlschema.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xmlschema.py (original) +++ lxml/trunk/src/lxml/tests/test_xmlschema.py Sun Mar 23 18:49:42 2008 @@ -66,6 +66,41 @@ self.assertRaises(etree.XMLSyntaxError, self.parse, '<a><c></c></a>', parser=parser) + def test_xmlschema_iterparse(self): + schema = self.parse(''' +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> + <xsd:element name="a" type="AType"/> + <xsd:complexType name="AType"> + <xsd:sequence> + <xsd:element name="b" type="xsd:string" /> + </xsd:sequence> + </xsd:complexType> +</xsd:schema> +''') + schema = etree.XMLSchema(schema) + xml = StringIO('<a><b></b></a>') + events = [ (event, el.tag) + for (event, el) in etree.iterparse(xml, schema=schema) ] + + self.assertEquals([('end', 'b'), ('end', 'a')], + events) + + def test_xmlschema_iterparse_fail(self): + schema = self.parse(''' +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> + <xsd:element name="a" type="AType"/> + <xsd:complexType name="AType"> + <xsd:sequence> + <xsd:element name="b" type="xsd:string" /> + </xsd:sequence> + </xsd:complexType> +</xsd:schema> +''') + schema = etree.XMLSchema(schema) + self.assertRaises( + etree.XMLSyntaxError, + list, etree.iterparse(StringIO('<a><c></c></a>'), schema=schema)) + def test_xmlschema_elementtree_error(self): self.assertRaises(ValueError, etree.XMLSchema, etree.ElementTree()) Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Sun Mar 23 18:49:42 2008 @@ -136,8 +136,7 @@ cdef xmlschema.xmlSchemaSAXPlugStruct* _sax_plug def __dealloc__(self): - if self._sax_plug: - self.disconnect() + self.disconnect() if self._valid_ctxt: xmlschema.xmlSchemaFreeValidCtxt(self._valid_ctxt) @@ -154,8 +153,9 @@ self._valid_ctxt, &c_ctxt.sax, &c_ctxt.userData) cdef void disconnect(self): - xmlschema.xmlSchemaSAXUnplug(self._sax_plug) - self._sax_plug = NULL + if self._sax_plug is not NULL: + xmlschema.xmlSchemaSAXUnplug(self._sax_plug) + self._sax_plug = NULL cdef bint isvalid(self): if self._valid_ctxt is NULL: From scoder at codespeak.net Sun Mar 23 18:52:16 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 23 Mar 2008 18:52:16 +0100 (CET) Subject: [Lxml-checkins] r52878 - in lxml/branch/lxml-2.0: . src/lxml src/lxml/tests Message-ID: <20080323175216.61C7516840C@codespeak.net> Author: scoder Date: Sun Mar 23 18:52:15 2008 New Revision: 52878 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/src/lxml/iterparse.pxi lxml/branch/lxml-2.0/src/lxml/parser.pxi lxml/branch/lxml-2.0/src/lxml/tests/test_xmlschema.py lxml/branch/lxml-2.0/src/lxml/xmlschema.pxi Log: iterparse crash fix merged from trunk rev 52877 Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Sun Mar 23 18:52:15 2008 @@ -16,6 +16,8 @@ Bugs fixed ---------- +* Crash when using ``iterparse()`` with XML Schema validation. + * The BeautifulSoup parser (soupparser.py) did not replace entities, which made them turn up in text content. Modified: lxml/branch/lxml-2.0/src/lxml/iterparse.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/iterparse.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/iterparse.pxi Sun Mar 23 18:52:15 2008 @@ -382,7 +382,8 @@ error = xmlparser.xmlParseChunk(pctxt, NULL, 0, 1) self._source = None break - if error != 0: + if error != 0 or (context._validator is not None and + not context._validator.isvalid()): self._source = None del context._events[:] _raiseParseError(pctxt, self._filename, context._error_log) Modified: lxml/branch/lxml-2.0/src/lxml/parser.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/parser.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/parser.pxi Sun Mar 23 18:52:15 2008 @@ -387,6 +387,8 @@ cdef python.PyThread_type_lock _lock def __dealloc__(self): + if self._validator is not None: + self._validator.disconnect() if self._lock is not NULL: python.PyThread_free_lock(self._lock) if self._c_ctxt is not NULL: @@ -425,10 +427,10 @@ return 0 cdef int cleanup(self) except -1: - self._resetParserContext() - self.clear() if self._validator is not None: self._validator.disconnect() + self._resetParserContext() + self.clear() self._error_log.disconnect() if config.ENABLE_THREADING and self._lock is not NULL: python.PyThread_release_lock(self._lock) Modified: lxml/branch/lxml-2.0/src/lxml/tests/test_xmlschema.py ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/tests/test_xmlschema.py (original) +++ lxml/branch/lxml-2.0/src/lxml/tests/test_xmlschema.py Sun Mar 23 18:52:15 2008 @@ -66,6 +66,41 @@ self.assertRaises(etree.XMLSyntaxError, self.parse, '<a><c></c></a>', parser=parser) + def test_xmlschema_iterparse(self): + schema = self.parse(''' +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> + <xsd:element name="a" type="AType"/> + <xsd:complexType name="AType"> + <xsd:sequence> + <xsd:element name="b" type="xsd:string" /> + </xsd:sequence> + </xsd:complexType> +</xsd:schema> +''') + schema = etree.XMLSchema(schema) + xml = StringIO('<a><b></b></a>') + events = [ (event, el.tag) + for (event, el) in etree.iterparse(xml, schema=schema) ] + + self.assertEquals([('end', 'b'), ('end', 'a')], + events) + + def test_xmlschema_iterparse_fail(self): + schema = self.parse(''' +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> + <xsd:element name="a" type="AType"/> + <xsd:complexType name="AType"> + <xsd:sequence> + <xsd:element name="b" type="xsd:string" /> + </xsd:sequence> + </xsd:complexType> +</xsd:schema> +''') + schema = etree.XMLSchema(schema) + self.assertRaises( + etree.XMLSyntaxError, + list, etree.iterparse(StringIO('<a><c></c></a>'), schema=schema)) + def test_xmlschema_elementtree_error(self): self.assertRaises(ValueError, etree.XMLSchema, etree.ElementTree()) Modified: lxml/branch/lxml-2.0/src/lxml/xmlschema.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/xmlschema.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/xmlschema.pxi Sun Mar 23 18:52:15 2008 @@ -136,8 +136,7 @@ cdef xmlschema.xmlSchemaSAXPlugStruct* _sax_plug def __dealloc__(self): - if self._sax_plug: - self.disconnect() + self.disconnect() if self._valid_ctxt: xmlschema.xmlSchemaFreeValidCtxt(self._valid_ctxt) @@ -154,8 +153,9 @@ self._valid_ctxt, &c_ctxt.sax, &c_ctxt.userData) cdef void disconnect(self): - xmlschema.xmlSchemaSAXUnplug(self._sax_plug) - self._sax_plug = NULL + if self._sax_plug is not NULL: + xmlschema.xmlSchemaSAXUnplug(self._sax_plug) + self._sax_plug = NULL cdef bint isvalid(self): if self._valid_ctxt is NULL: From lxml-checkins at codespeak.net Tue Mar 25 10:06:16 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 25 Mar 2008 10:06:16 +0100 (CET) Subject: [Lxml-checkins] MedHelp id 1634116 Message-ID: <20080325130537.4569.qmail@dsl85-107-6010.ttnet.net.tr> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080325/aacd730c/attachment.htm From scoder at codespeak.net Tue Mar 25 22:01:55 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 25 Mar 2008 22:01:55 +0100 (CET) Subject: [Lxml-checkins] r52942 - in lxml/trunk: . src/lxml Message-ID: <20080325210155.473F6168511@codespeak.net> Author: scoder Date: Tue Mar 25 22:01:52 2008 New Revision: 52942 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/iterparse.pxi Log: r3823 at delle: sbehnel | 2008-03-24 08:37:13 +0100 iterparse cleanup Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Tue Mar 25 22:01:52 2008 @@ -164,7 +164,8 @@ return 0 -cdef void _pushSaxStartEvent(xmlparser.xmlParserCtxt* c_ctxt, xmlNode* c_node): +cdef inline void _pushSaxStartEvent(xmlparser.xmlParserCtxt* c_ctxt, + xmlNode* c_node): cdef _IterparseContext context context = <_IterparseContext>c_ctxt._private try: @@ -175,7 +176,8 @@ c_ctxt.disableSAX = 1 context._store_raised() -cdef void _pushSaxEndEvent(xmlparser.xmlParserCtxt* c_ctxt, xmlNode* c_node): +cdef inline void _pushSaxEndEvent(xmlparser.xmlParserCtxt* c_ctxt, + xmlNode* c_node): cdef _IterparseContext context context = <_IterparseContext>c_ctxt._private try: @@ -186,57 +188,35 @@ c_ctxt.disableSAX = 1 context._store_raised() -cdef xmlparser.startElementNsSAX2Func _getOrigStart(xmlparser.xmlParserCtxt* c_ctxt): - return (<_IterparseContext>c_ctxt._private)._origSaxStart - -cdef xmlparser.startElementSAXFunc _getOrigStartNoNs(xmlparser.xmlParserCtxt* c_ctxt): - return (<_IterparseContext>c_ctxt._private)._origSaxStartNoNs - -cdef xmlparser.endElementNsSAX2Func _getOrigEnd(xmlparser.xmlParserCtxt* c_ctxt): - return (<_IterparseContext>c_ctxt._private)._origSaxEnd - -cdef xmlparser.endElementSAXFunc _getOrigEndNoNs(xmlparser.xmlParserCtxt* c_ctxt): - return (<_IterparseContext>c_ctxt._private)._origSaxEndNoNs - cdef void _iterparseSaxStart(void* ctxt, char* localname, char* prefix, char* URI, int nb_namespaces, char** namespaces, int nb_attributes, int nb_defaulted, char** attributes): - # no Python in here! cdef xmlparser.xmlParserCtxt* c_ctxt - cdef xmlparser.startElementNsSAX2Func origStart c_ctxt = <xmlparser.xmlParserCtxt*>ctxt - origStart = _getOrigStart(c_ctxt) - origStart(ctxt, localname, prefix, URI, nb_namespaces, namespaces, - nb_attributes, nb_defaulted, attributes) + (<_IterparseContext>c_ctxt._private)._origSaxStart( + ctxt, localname, prefix, URI, + nb_namespaces, namespaces, + nb_attributes, nb_defaulted, attributes) _pushSaxStartEvent(c_ctxt, c_ctxt.node) cdef void _iterparseSaxEnd(void* ctxt, char* localname, char* prefix, char* URI): - # no Python in here! cdef xmlparser.xmlParserCtxt* c_ctxt - cdef xmlparser.endElementNsSAX2Func origEnd c_ctxt = <xmlparser.xmlParserCtxt*>ctxt _pushSaxEndEvent(c_ctxt, c_ctxt.node) - origEnd = _getOrigEnd(c_ctxt) - origEnd(ctxt, localname, prefix, URI) + (<_IterparseContext>c_ctxt._private)._origSaxEnd(ctxt, localname, prefix, URI) cdef void _iterparseSaxStartNoNs(void* ctxt, char* name, char** attributes): - # no Python in here! cdef xmlparser.xmlParserCtxt* c_ctxt - cdef xmlparser.startElementSAXFunc origStart c_ctxt = <xmlparser.xmlParserCtxt*>ctxt - origStart = _getOrigStartNoNs(c_ctxt) - origStart(ctxt, name, attributes) + (<_IterparseContext>c_ctxt._private)._origSaxStartNoNs(ctxt, name, attributes) _pushSaxStartEvent(c_ctxt, c_ctxt.node) cdef void _iterparseSaxEndNoNs(void* ctxt, char* name): - # no Python in here! cdef xmlparser.xmlParserCtxt* c_ctxt - cdef xmlparser.endElementSAXFunc origEnd c_ctxt = <xmlparser.xmlParserCtxt*>ctxt _pushSaxEndEvent(c_ctxt, c_ctxt.node) - origEnd = _getOrigEndNoNs(c_ctxt) - origEnd(ctxt, name) + (<_IterparseContext>c_ctxt._private)._origSaxEndNoNs(ctxt, name) cdef class iterparse(_BaseParser): """iterparse(self, source, events=("end",), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, schema=None) From scoder at codespeak.net Tue Mar 25 22:02:23 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 25 Mar 2008 22:02:23 +0100 (CET) Subject: [Lxml-checkins] r52943 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080325210223.12039168511@codespeak.net> Author: scoder Date: Tue Mar 25 22:02:21 2008 New Revision: 52943 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/iterparse.pxi lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/tree.pxd lxml/trunk/src/lxml/xmlparser.pxd Log: r3824 at delle: sbehnel | 2008-03-25 21:58:26 +0100 new event types 'comment' and 'pi' in iterparse() Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Mar 25 22:02:21 2008 @@ -8,6 +8,8 @@ Features added -------------- +* New event types 'comment' and 'pi' in ``iterparse()``. + * ``XSLTAccessControl`` instances have a property ``options`` that returns a dict of access configuration options. Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Tue Mar 25 22:02:21 2008 @@ -4,22 +4,31 @@ __ITERPARSE_CHUNK_SIZE = 32768 ctypedef enum _IterparseEventFilter: - ITERPARSE_FILTER_START = 1 - ITERPARSE_FILTER_END = 2 - ITERPARSE_FILTER_START_NS = 4 - ITERPARSE_FILTER_END_NS = 8 + ITERPARSE_FILTER_START = 1 + ITERPARSE_FILTER_END = 2 + ITERPARSE_FILTER_START_NS = 4 + ITERPARSE_FILTER_END_NS = 8 + ITERPARSE_FILTER_COMMENT = 16 + ITERPARSE_FILTER_PI = 32 -cdef int _buildIterparseEventFilter(events): +cdef int _buildIterparseEventFilter(events) except -1: cdef int event_filter event_filter = 0 - if 'start' in events: - event_filter = event_filter | ITERPARSE_FILTER_START - if 'end' in events: - event_filter = event_filter | ITERPARSE_FILTER_END - if 'start-ns' in events: - event_filter = event_filter | ITERPARSE_FILTER_START_NS - if 'end-ns' in events: - event_filter = event_filter | ITERPARSE_FILTER_END_NS + for event in events: + if event == 'start': + event_filter |= ITERPARSE_FILTER_START + elif event == 'end': + event_filter |= ITERPARSE_FILTER_END + elif event == 'start-ns': + event_filter |= ITERPARSE_FILTER_START_NS + elif event == 'end-ns': + event_filter |= ITERPARSE_FILTER_END_NS + elif event == 'comment': + event_filter |= ITERPARSE_FILTER_COMMENT + elif event == 'pi': + event_filter |= ITERPARSE_FILTER_PI + else: + raise ValueError("invalid event name '%s'" % event) return event_filter cdef int _countNsDefs(xmlNode* c_node): @@ -53,6 +62,8 @@ cdef xmlparser.endElementNsSAX2Func _origSaxEnd cdef xmlparser.startElementSAXFunc _origSaxStartNoNs cdef xmlparser.endElementSAXFunc _origSaxEndNoNs + cdef xmlparser.commentSAXFunc _origSaxComment + cdef xmlparser.processingInstructionSAXFunc _origSaxPI cdef _Element _root cdef _Document _doc cdef int _event_filter @@ -98,6 +109,14 @@ sax.endElementNs = _iterparseSaxEnd sax.endElement = _iterparseSaxEndNoNs + self._origSaxComment = sax.comment + if self._event_filter & ITERPARSE_FILTER_COMMENT: + sax.comment = _iterparseSaxComment + + self._origSaxPI = sax.processingInstruction + if self._event_filter & ITERPARSE_FILTER_PI: + sax.processingInstruction = _iterparseSaxPI + cdef _setEventFilter(self, events, tag): self._event_filter = _buildIterparseEventFilter(events) if tag is None or tag == '*': @@ -162,7 +181,18 @@ for i from 0 <= i < ns_count: python.PyList_Append(self._events, event) return 0 - + + cdef int pushEvent(self, event, xmlNode* c_node) except -1: + cdef _Element root + if self._doc is None: + self._doc = _documentFactory(c_node.doc, None) + root = self._doc.getroot() + if root is not None and root._c_node.type == tree.XML_ELEMENT_NODE: + self._root = root + node = _elementFactory(self._doc, c_node) + python.PyList_Append(self._events, (event, node)) + return 0 + cdef inline void _pushSaxStartEvent(xmlparser.xmlParserCtxt* c_ctxt, xmlNode* c_node): @@ -188,6 +218,18 @@ c_ctxt.disableSAX = 1 context._store_raised() +cdef inline void _pushSaxEvent(xmlparser.xmlParserCtxt* c_ctxt, + event, xmlNode* c_node): + cdef _IterparseContext context + context = <_IterparseContext>c_ctxt._private + try: + context.pushEvent(event, c_node) + except: + if c_ctxt.errNo == xmlerror.XML_ERR_OK: + c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR + c_ctxt.disableSAX = 1 + context._store_raised() + cdef void _iterparseSaxStart(void* ctxt, char* localname, char* prefix, char* URI, int nb_namespaces, char** namespaces, int nb_attributes, int nb_defaulted, @@ -218,6 +260,37 @@ _pushSaxEndEvent(c_ctxt, c_ctxt.node) (<_IterparseContext>c_ctxt._private)._origSaxEndNoNs(ctxt, name) +cdef void _iterparseSaxComment(void* ctxt, char* text): + cdef xmlNode* c_node + cdef xmlparser.xmlParserCtxt* c_ctxt + c_ctxt = <xmlparser.xmlParserCtxt*>ctxt + (<_IterparseContext>c_ctxt._private)._origSaxComment(ctxt, text) + c_node = _iterparseFindLastNode(c_ctxt) + if c_node is not NULL: + _pushSaxEvent(c_ctxt, "comment", c_node) + +cdef void _iterparseSaxPI(void* ctxt, char* target, char* data): + cdef xmlNode* c_node + cdef xmlparser.xmlParserCtxt* c_ctxt + c_ctxt = <xmlparser.xmlParserCtxt*>ctxt + (<_IterparseContext>c_ctxt._private)._origSaxPI(ctxt, target, data) + c_node = _iterparseFindLastNode(c_ctxt) + if c_node is not NULL: + _pushSaxEvent(c_ctxt, "pi", c_node) + +cdef inline xmlNode* _iterparseFindLastNode(xmlparser.xmlParserCtxt* c_ctxt): + # this mimics what libxml2 creates for comments/PIs + if c_ctxt.inSubset == 1: + return c_ctxt.myDoc.intSubset.last + elif c_ctxt.inSubset == 2: + return c_ctxt.myDoc.extSubset.last + elif c_ctxt.node is NULL: + return c_ctxt.myDoc.last + elif c_ctxt.node.type == tree.XML_ELEMENT_NODE: + return c_ctxt.node.last + else: + return c_ctxt.node.next + cdef class iterparse(_BaseParser): """iterparse(self, source, events=("end",), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, schema=None) Incremental parser. @@ -258,6 +331,8 @@ - schema - an XMLSchema to validate against """ cdef object _source + cdef object _events + cdef object _tag cdef readonly object root def __init__(self, source, events=("end",), *, tag=None, attribute_defaults=False, dtd_validation=False, @@ -276,15 +351,11 @@ self._source = source if html: # make sure we're not looking for namespaces - if 'start' in events: - if 'end' in events: - events = ('start', 'end') - else: - events = ('start',) - elif 'end' in events: - events = ('end',) - else: - events = () + events = tuple([ event for event in events + if event != 'start-ns' and event != 'end-ns' ]) + + self._events = events + self._tag = tag parse_options = _XML_DEFAULT_PARSE_OPTIONS if load_dtd: @@ -305,7 +376,6 @@ None, filename, encoding) context = <_IterparseContext>self._getPushParserContext() - context._setEventFilter(events, tag) context.prepare() # parser will not be unlocked - no other methods supported @@ -318,7 +388,10 @@ return context._error_log.copy() cdef _ParserContext _createContext(self, target): - return _IterparseContext() + cdef _IterparseContext context + context = _IterparseContext() + context._setEventFilter(self._events, self._tag) + return context def copy(self): raise TypeError("iterparse parsers cannot be copied") Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Tue Mar 25 22:02:21 2008 @@ -272,7 +272,7 @@ self.assert_([ log for log in logs if 15 == log.column ]) - def test_iterparse_comments(self): + def test_iterparse_tree_comments(self): # ET removes comments iterparse = self.etree.iterparse tostring = self.etree.tostring @@ -285,13 +285,61 @@ '<a><!--A--><b><!-- B --><c/></b><!--C--></a>', tostring(root)) + def test_iterparse_comments(self): + # ET removes comments + iterparse = self.etree.iterparse + tostring = self.etree.tostring + + def name(event, el): + if event == 'comment': + return el.text + else: + return el.tag + + f = StringIO('<a><!--A--><b><!-- B --><c/></b><!--C--></a>') + events = list(iterparse(f, events=('end', 'comment'))) + root = events[-1][1] + self.assertEquals(6, len(events)) + self.assertEquals(['A', ' B ', 'c', 'b', 'C', 'a'], + [ name(*item) for item in events ]) + self.assertEquals( + '<a><!--A--><b><!-- B --><c/></b><!--C--></a>', + tostring(root)) + + def test_iterparse_pis(self): + # ET removes pis + iterparse = self.etree.iterparse + tostring = self.etree.tostring + ElementTree = self.etree.ElementTree + + def name(event, el): + if event == 'pi': + return (el.target, el.text) + else: + return el.tag + + f = StringIO('<?pia a?><a><?pib b?><b><?pic c?><c/></b><?pid d?></a><?pie e?>') + events = list(iterparse(f, events=('end', 'pi'))) + root = events[-2][1] + self.assertEquals(8, len(events)) + self.assertEquals([('pia','a'), ('pib','b'), ('pic','c'), 'c', 'b', + ('pid','d'), 'a', ('pie','e')], + [ name(*item) for item in events ]) + self.assertEquals( + '<?pia a?><a><?pib b?><b><?pic c?><c/></b><?pid d?></a><?pie e?>', + tostring(ElementTree(root))) + def test_iterparse_remove_comments(self): iterparse = self.etree.iterparse tostring = self.etree.tostring f = StringIO('<a><!--A--><b><!-- B --><c/></b><!--C--></a>') - events = list(iterparse(f, remove_comments=True)) + events = list(iterparse(f, remove_comments=True, + events=('end', 'comment'))) root = events[-1][1] + self.assertEquals(3, len(events)) + self.assertEquals(['c', 'b', 'a'], + [ el.tag for (event, el) in events ]) self.assertEquals( '<a><b><c/></b></a>', tostring(root)) @@ -356,6 +404,55 @@ self.assertRaises( LookupError, self.etree.XMLParser, encoding="hopefully unknown") + def test_parser_target_comment(self): + events = [] + class Target(object): + def start(self, tag, attrib): + events.append("start-" + tag) + def end(self, tag): + events.append("end-" + tag) + def data(self, data): + events.append("data-" + data) + def comment(self, text): + events.append("comment-" + text) + def close(self): + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + parser.feed('<!--a--><root>A<!--b--><sub/><!--c-->B</root><!--d-->') + done = parser.close() + + self.assertEquals("DONE", done) + self.assertEquals(["comment-a", "start-root", "data-A", "comment-b", + "start-sub", "end-sub", "comment-c", "data-B", + "end-root", "comment-d"], + events) + + def test_parser_target_pi(self): + events = [] + class Target(object): + def start(self, tag, attrib): + events.append("start-" + tag) + def end(self, tag): + events.append("end-" + tag) + def data(self, data): + events.append("data-" + data) + def pi(self, target, data): + events.append("pi-" + target + "-" + data) + def close(self): + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + parser.feed('<?test a?><root>A<?test b?>B</root><?test c?>') + done = parser.close() + + self.assertEquals("DONE", done) + self.assertEquals(["pi-test-a", "start-root", "data-A", "pi-test-b", + "data-B", "end-root", "pi-test-c"], + events) + def test_iterwalk_tag(self): iterwalk = self.etree.iterwalk root = self.etree.XML('<a><b><d/></b><c/></a>') Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Tue Mar 25 22:02:21 2008 @@ -115,6 +115,7 @@ void* attributes void* elements xmlNode* children + xmlNode* last xmlDoc* doc ctypedef struct xmlDoc: Modified: lxml/trunk/src/lxml/xmlparser.pxd ============================================================================== --- lxml/trunk/src/lxml/xmlparser.pxd (original) +++ lxml/trunk/src/lxml/xmlparser.pxd Tue Mar 25 22:02:21 2008 @@ -96,6 +96,7 @@ int spaceMax bint html bint progressive + int inSubset int charset xmlParserInput* input From scoder at codespeak.net Wed Mar 26 00:18:06 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 00:18:06 +0100 (CET) Subject: [Lxml-checkins] r52951 - in lxml/trunk: . src/lxml Message-ID: <20080325231806.2FB1916851D@codespeak.net> Author: scoder Date: Wed Mar 26 00:18:04 2008 New Revision: 52951 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/iterparse.pxi lxml/trunk/src/lxml/saxparser.pxi Log: r3828 at delle: sbehnel | 2008-03-25 22:30:28 +0100 cleanup Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Wed Mar 26 00:18:04 2008 @@ -325,6 +325,8 @@ - remove_blank_text - discard blank text nodes - remove_comments - discard comments - remove_pis - discard processing instructions + - compact - safe memory for short text content (default: True) + - resolve_entities - replace entities by their text value (default: True) Other keyword arguments: - encoding - override the document encoding @@ -337,7 +339,8 @@ def __init__(self, source, events=("end",), *, tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, - remove_comments=False, remove_pis=False, encoding=None, + compact=True, resolve_entities=True, remove_comments=False, + remove_pis=False, encoding=None, html=False, XMLSchema schema=None): cdef _IterparseContext context cdef char* c_encoding @@ -366,10 +369,14 @@ if attribute_defaults: parse_options = parse_options | xmlparser.XML_PARSE_DTDATTR | \ xmlparser.XML_PARSE_DTDLOAD - if not no_network: - parse_options = parse_options ^ xmlparser.XML_PARSE_NONET if remove_blank_text: parse_options = parse_options | xmlparser.XML_PARSE_NOBLANKS + if not no_network: + parse_options = parse_options ^ xmlparser.XML_PARSE_NONET + if not compact: + parse_options = parse_options ^ xmlparser.XML_PARSE_COMPACT + if not resolve_entities: + parse_options = parse_options ^ xmlparser.XML_PARSE_NOENT _BaseParser.__init__(self, parse_options, html, schema, remove_comments, remove_pis, Modified: lxml/trunk/src/lxml/saxparser.pxi ============================================================================== --- lxml/trunk/src/lxml/saxparser.pxi (original) +++ lxml/trunk/src/lxml/saxparser.pxi Wed Mar 26 00:18:04 2008 @@ -141,7 +141,7 @@ c_attributes[3], c_attributes[4] - c_attributes[3], "strict") python.PyDict_SetItem(attrib, name, value) - c_attributes = c_attributes + 5 + c_attributes += 5 if c_nb_namespaces == 0: nsmap = EMPTY_READ_ONLY_DICT else: @@ -153,7 +153,7 @@ prefix = funicode(c_namespaces[0]) python.PyDict_SetItem( nsmap, prefix, funicode(c_namespaces[1])) - c_namespaces = c_namespaces + 2 + c_namespaces += 2 element = context._target._handleSaxStart(tag, attrib, nsmap) if element is not None and c_ctxt.input is not NULL: if c_ctxt.input.line < 65535: From scoder at codespeak.net Wed Mar 26 00:18:14 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 00:18:14 +0100 (CET) Subject: [Lxml-checkins] r52952 - in lxml/trunk: . src/lxml Message-ID: <20080325231814.91D5A168520@codespeak.net> Author: scoder Date: Wed Mar 26 00:18:13 2008 New Revision: 52952 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/parser.pxi Log: r3829 at delle: sbehnel | 2008-03-26 00:12:53 +0100 stop early on feed parser errors Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Wed Mar 26 00:18:13 2008 @@ -503,7 +503,7 @@ not context._validator.isvalid(): well_formed = 0 # actually not 'valid', but anyway ... elif recover or (c_ctxt.wellFormed and \ - c_ctxt.lastError.level < xmlerror.XML_ERR_ERROR): + c_ctxt.lastError.level < xmlerror.XML_ERR_ERROR): well_formed = 1 elif not c_ctxt.replaceEntities and not c_ctxt.validate \ and context is not None: @@ -946,7 +946,8 @@ py_buffer_len = py_buffer_len - buffer_len c_data = c_data + buffer_len - if error: + if error or (not pctxt.wellFormed and + not self._parse_options & xmlparser.XML_PARSE_RECOVER): self._feed_parser_running = 0 try: context._handleParseResult(self, NULL, None) From scoder at codespeak.net Wed Mar 26 00:18:25 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 00:18:25 +0100 (CET) Subject: [Lxml-checkins] r52953 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080325231825.F355C16851D@codespeak.net> Author: scoder Date: Wed Mar 26 00:18:24 2008 New Revision: 52953 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/parsertarget.pxi lxml/trunk/src/lxml/saxparser.pxi lxml/trunk/src/lxml/tests/test_elementtree.py lxml/trunk/src/lxml/xmlparser.pxd Log: r3830 at delle: sbehnel | 2008-03-26 00:14:51 +0100 fix entity handling in target parser Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 26 00:18:24 2008 @@ -90,6 +90,8 @@ Bugs fixed ---------- +* Handle entity replacements correctly in target parser. + * Crash when using ``iterparse()`` with XML Schema validation. * The BeautifulSoup parser (soupparser.py) did not replace entities, Modified: lxml/trunk/src/lxml/parsertarget.pxi ============================================================================== --- lxml/trunk/src/lxml/parsertarget.pxi (original) +++ lxml/trunk/src/lxml/parsertarget.pxi Wed Mar 26 00:18:24 2008 @@ -110,6 +110,8 @@ cdef object _handleParseResult(self, _BaseParser parser, xmlDoc* result, filename): + if not self._c_ctxt.wellFormed: + _raiseParseError(self._c_ctxt, filename, self._error_log) self._raise_if_stored() return self._python_target.close() @@ -118,11 +120,13 @@ if result is not NULL and result._private is NULL: # no _Document proxy => orphen tree.xmlFreeDoc(result) - if self._c_ctxt.myDoc is not NULL and \ - self._c_ctxt.myDoc is not result and \ - self._c_ctxt.myDoc._private is NULL: - # no _Document proxy => orphen - tree.xmlFreeDoc(self._c_ctxt.myDoc) + if self._c_ctxt.myDoc is not NULL: + if self._c_ctxt.myDoc is not result and \ + self._c_ctxt.myDoc._private is NULL: + # no _Document proxy => orphen + tree.xmlFreeDoc(self._c_ctxt.myDoc) self._c_ctxt.myDoc = NULL + if not self._c_ctxt.wellFormed: + _raiseParseError(self._c_ctxt, filename, self._error_log) self._raise_if_stored() raise _TargetParserResult(self._python_target.close()) Modified: lxml/trunk/src/lxml/saxparser.pxi ============================================================================== --- lxml/trunk/src/lxml/saxparser.pxi (original) +++ lxml/trunk/src/lxml/saxparser.pxi Wed Mar 26 00:18:24 2008 @@ -11,6 +11,10 @@ cdef class _SaxParserTarget: cdef int _sax_event_filter cdef int _sax_event_propagate + def __cinit__(self): + self._sax_event_filter = 0 + self._sax_event_propagate = 0 + cdef _handleSaxStart(self, tag, attrib, nsmap): return None cdef _handleSaxEnd(self, tag): @@ -77,10 +81,8 @@ if self._target._sax_event_filter & SAX_EVENT_DATA: sax.characters = _handleSaxData - if self._target._sax_event_propagate & SAX_EVENT_DOCTYPE: - self._origSaxDoctype = sax.internalSubset - else: - self._origSaxDoctype = sax.internalSubset = NULL + # doctype propagation is always required for entity replacement + self._origSaxDoctype = sax.internalSubset if self._target._sax_event_filter & SAX_EVENT_DOCTYPE: sax.internalSubset = _handleSaxDoctype @@ -98,6 +100,10 @@ if self._target._sax_event_filter & SAX_EVENT_COMMENT: sax.comment = _handleSaxComment + # enforce entity replacement + sax.reference = NULL + c_ctxt.replaceEntities = 1 + cdef void _handleSaxException(self, xmlparser.xmlParserCtxt* c_ctxt): self._store_raised() if c_ctxt.errNo == xmlerror.XML_ERR_OK: Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Wed Mar 26 00:18:24 2008 @@ -3333,6 +3333,74 @@ "end-sub", "data-B", "end-root"], events) + def test_parser_target_entity(self): + events = [] + class Target(object): + def __init__(self): + self._data = [] + def _flush_data(self): + if self._data: + events.append("data-" + ''.join(self._data)) + del self._data[:] + def start(self, tag, attrib): + self._flush_data() + events.append("start-" + tag) + def end(self, tag): + self._flush_data() + events.append("end-" + tag) + def data(self, data): + self._data.append(data) + def close(self): + self._flush_data() + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + dtd = ''' + <!DOCTYPE root [ + <!ELEMENT root (sub*)> + <!ELEMENT sub (#PCDATA)> + <!ENTITY ent "an entity"> + ]> + ''' + parser.feed(dtd+'<root><sub/><sub>this is &ent;</sub><sub/></root>') + done = parser.close() + + self.assertEquals("DONE", done) + self.assertEquals(["start-root", "start-sub", "end-sub", "start-sub", + "data-this is an entity", + "end-sub", "start-sub", "end-sub", "end-root"], + events) + + def test_parser_target_entity_unknown(self): + events = [] + class Target(object): + def __init__(self): + self._data = [] + def _flush_data(self): + if self._data: + events.append("data-" + ''.join(self._data)) + del self._data[:] + def start(self, tag, attrib): + self._flush_data() + events.append("start-" + tag) + def end(self, tag): + self._flush_data() + events.append("end-" + tag) + def data(self, data): + self._data.append(data) + def close(self): + self._flush_data() + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + def feed(): + parser.feed('<root><sub/><sub>some &ent;</sub><sub/></root>') + parser.close() + + self.assertRaises(self.etree.ParseError, feed) + def test_treebuilder(self): builder = self.etree.TreeBuilder() el = builder.start("root", {'a':'A', 'b':'B'}) Modified: lxml/trunk/src/lxml/xmlparser.pxd ============================================================================== --- lxml/trunk/src/lxml/xmlparser.pxd (original) +++ lxml/trunk/src/lxml/xmlparser.pxd Wed Mar 26 00:18:24 2008 @@ -40,6 +40,8 @@ ctypedef void (*endDocumentSAXFunc)(void* ctx) + ctypedef void (*referenceSAXFunc)(void * ctx, char* name) + cdef int XML_SAX2_MAGIC cdef extern from "libxml/tree.h": @@ -59,6 +61,7 @@ endElementSAXFunc endElement charactersSAXFunc characters cdataBlockSAXFunc cdataBlock + referenceSAXFunc reference commentSAXFunc comment processingInstructionSAXFunc processingInstruction endDocumentSAXFunc endDocument From scoder at codespeak.net Wed Mar 26 00:21:27 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 00:21:27 +0100 (CET) Subject: [Lxml-checkins] r52954 - in lxml/branch/lxml-2.0: . src/lxml src/lxml/tests Message-ID: <20080325232127.180D2168533@codespeak.net> Author: scoder Date: Wed Mar 26 00:21:24 2008 New Revision: 52954 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/src/lxml/parsertarget.pxi lxml/branch/lxml-2.0/src/lxml/saxparser.pxi lxml/branch/lxml-2.0/src/lxml/tests/test_elementtree.py lxml/branch/lxml-2.0/src/lxml/xmlparser.pxd Log: merge -c 52953: fix entity handling in target parser Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Wed Mar 26 00:21:24 2008 @@ -16,6 +16,8 @@ Bugs fixed ---------- +* Handle entity replacements correctly in target parser. + * Crash when using ``iterparse()`` with XML Schema validation. * The BeautifulSoup parser (soupparser.py) did not replace entities, Modified: lxml/branch/lxml-2.0/src/lxml/parsertarget.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/parsertarget.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/parsertarget.pxi Wed Mar 26 00:21:24 2008 @@ -110,6 +110,8 @@ cdef object _handleParseResult(self, _BaseParser parser, xmlDoc* result, filename): + if not self._c_ctxt.wellFormed: + _raiseParseError(self._c_ctxt, filename, self._error_log) self._raise_if_stored() return self._python_target.close() @@ -118,11 +120,13 @@ if result is not NULL and result._private is NULL: # no _Document proxy => orphen tree.xmlFreeDoc(result) - if self._c_ctxt.myDoc is not NULL and \ - self._c_ctxt.myDoc is not result and \ - self._c_ctxt.myDoc._private is NULL: - # no _Document proxy => orphen - tree.xmlFreeDoc(self._c_ctxt.myDoc) + if self._c_ctxt.myDoc is not NULL: + if self._c_ctxt.myDoc is not result and \ + self._c_ctxt.myDoc._private is NULL: + # no _Document proxy => orphen + tree.xmlFreeDoc(self._c_ctxt.myDoc) self._c_ctxt.myDoc = NULL + if not self._c_ctxt.wellFormed: + _raiseParseError(self._c_ctxt, filename, self._error_log) self._raise_if_stored() raise _TargetParserResult(self._python_target.close()) Modified: lxml/branch/lxml-2.0/src/lxml/saxparser.pxi ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/saxparser.pxi (original) +++ lxml/branch/lxml-2.0/src/lxml/saxparser.pxi Wed Mar 26 00:21:24 2008 @@ -11,6 +11,10 @@ cdef class _SaxParserTarget: cdef int _sax_event_filter cdef int _sax_event_propagate + def __cinit__(self): + self._sax_event_filter = 0 + self._sax_event_propagate = 0 + cdef _handleSaxStart(self, tag, attrib, nsmap): return None cdef _handleSaxEnd(self, tag): @@ -77,10 +81,8 @@ if self._target._sax_event_filter & SAX_EVENT_DATA: sax.characters = _handleSaxData - if self._target._sax_event_propagate & SAX_EVENT_DOCTYPE: - self._origSaxDoctype = sax.internalSubset - else: - self._origSaxDoctype = sax.internalSubset = NULL + # doctype propagation is always required for entity replacement + self._origSaxDoctype = sax.internalSubset if self._target._sax_event_filter & SAX_EVENT_DOCTYPE: sax.internalSubset = _handleSaxDoctype @@ -98,6 +100,10 @@ if self._target._sax_event_filter & SAX_EVENT_COMMENT: sax.comment = _handleSaxComment + # enforce entity replacement + sax.reference = NULL + c_ctxt.replaceEntities = 1 + cdef void _handleSaxException(self, xmlparser.xmlParserCtxt* c_ctxt): self._store_raised() if c_ctxt.errNo == xmlerror.XML_ERR_OK: Modified: lxml/branch/lxml-2.0/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/tests/test_elementtree.py (original) +++ lxml/branch/lxml-2.0/src/lxml/tests/test_elementtree.py Wed Mar 26 00:21:24 2008 @@ -3174,6 +3174,74 @@ "end-sub", "data-B", "end-root"], events) + def test_parser_target_entity(self): + events = [] + class Target(object): + def __init__(self): + self._data = [] + def _flush_data(self): + if self._data: + events.append("data-" + ''.join(self._data)) + del self._data[:] + def start(self, tag, attrib): + self._flush_data() + events.append("start-" + tag) + def end(self, tag): + self._flush_data() + events.append("end-" + tag) + def data(self, data): + self._data.append(data) + def close(self): + self._flush_data() + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + dtd = ''' + <!DOCTYPE root [ + <!ELEMENT root (sub*)> + <!ELEMENT sub (#PCDATA)> + <!ENTITY ent "an entity"> + ]> + ''' + parser.feed(dtd+'<root><sub/><sub>this is &ent;</sub><sub/></root>') + done = parser.close() + + self.assertEquals("DONE", done) + self.assertEquals(["start-root", "start-sub", "end-sub", "start-sub", + "data-this is an entity", + "end-sub", "start-sub", "end-sub", "end-root"], + events) + + def test_parser_target_entity_unknown(self): + events = [] + class Target(object): + def __init__(self): + self._data = [] + def _flush_data(self): + if self._data: + events.append("data-" + ''.join(self._data)) + del self._data[:] + def start(self, tag, attrib): + self._flush_data() + events.append("start-" + tag) + def end(self, tag): + self._flush_data() + events.append("end-" + tag) + def data(self, data): + self._data.append(data) + def close(self): + self._flush_data() + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + def feed(): + parser.feed('<root><sub/><sub>some &ent;</sub><sub/></root>') + parser.close() + + self.assertRaises(self.etree.ParseError, feed) + def test_treebuilder(self): builder = self.etree.TreeBuilder() el = builder.start("root", {'a':'A', 'b':'B'}) Modified: lxml/branch/lxml-2.0/src/lxml/xmlparser.pxd ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/xmlparser.pxd (original) +++ lxml/branch/lxml-2.0/src/lxml/xmlparser.pxd Wed Mar 26 00:21:24 2008 @@ -40,6 +40,8 @@ ctypedef void (*endDocumentSAXFunc)(void* ctx) + ctypedef void (*referenceSAXFunc)(void * ctx, char* name) + cdef int XML_SAX2_MAGIC cdef extern from "libxml/tree.h": @@ -59,6 +61,7 @@ endElementSAXFunc endElement charactersSAXFunc characters cdataBlockSAXFunc cdataBlock + referenceSAXFunc reference commentSAXFunc comment processingInstructionSAXFunc processingInstruction endDocumentSAXFunc endDocument From ianb at codespeak.net Wed Mar 26 17:54:49 2008 From: ianb at codespeak.net (ianb at codespeak.net) Date: Wed, 26 Mar 2008 17:54:49 +0100 (CET) Subject: [Lxml-checkins] r52960 - in lxml/trunk: . src/lxml/html src/lxml/html/tests Message-ID: <20080326165449.7DFFD169EC4@codespeak.net> Author: ianb Date: Wed Mar 26 17:54:46 2008 New Revision: 52960 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/html/diff.py lxml/trunk/src/lxml/html/tests/test_diff.txt Log: Fix empty tags (e.g., <br>) in diffs. Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 26 17:54:46 2008 @@ -29,6 +29,9 @@ * Default encoding for plain text serialisation was different from that of XML serialisation (UTF-8 instead of ASCII). +* ``lxml.html.diff`` didn't treat empty tags properly (e.g., + ``<br>``). + Other changes ------------- Modified: lxml/trunk/src/lxml/html/diff.py ============================================================================== --- lxml/trunk/src/lxml/html/diff.py (original) +++ lxml/trunk/src/lxml/html/diff.py Wed Mar 26 17:54:46 2008 @@ -139,6 +139,8 @@ ############################################################ def htmldiff(old_html, new_html): + ## FIXME: this should take parsed documents too, and use their body + ## or other content. """ Do a diff of the old and new document. The documents are HTML *fragments* (str/UTF8 or unicode), they are not complete documents (i.e., no <html> tag). @@ -310,8 +312,6 @@ endtag = chunk[1] == '/' name = chunk.split()[0].strip('<>/') if name in empty_tags: - assert not endtag, ( - "Empty tag %r should have no end tag" % chunk) balanced.append(chunk) continue if endtag: @@ -669,7 +669,7 @@ yield ('img', el.attrib['src'], start_tag(el)) else: yield start_tag(el) - if el.tag in empty_tags and not el.text and not len(el): + if el.tag in empty_tags and not el.text and not len(el) and not el.tail: return start_words = split_words(el.text) for word in start_words: Modified: lxml/trunk/src/lxml/html/tests/test_diff.txt ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_diff.txt (original) +++ lxml/trunk/src/lxml/html/tests/test_diff.txt Wed Mar 26 17:54:46 2008 @@ -66,6 +66,11 @@ >>> pdiff('<a href="http://yahoo.com">search</a>', '<a href="http://yahoo.com">search</a>') <a href="http://yahoo.com">search</a> +A test of empty elements: + + >>> pdiff('some <br> text', 'some <br> test') + some <ins><br> test</ins> <del><br> text</del> + The sixteen combinations:: First "insert start" (del start/middle/end/none): @@ -177,8 +182,8 @@ >>> panno('<p>Hi <img src="/foo"> You</p>', ... '<p>Hi You</p>', ... '<p>Hi You <img src="/bar"></p>') - <p><span version="0">Hi</span> <span version="1">You</span> <span - version="2"><img src="/bar"></span></p> + <p><span version="0">Hi You</span> <span version="2"><img + src="/bar"></span></p> >>> panno('<p><a href="/foo">Hey</a></p>', ... '<p><a href="/bar">Hey</a></p>') <p><a href="/bar"><span version="0">Hey</span></a></p> From scoder at codespeak.net Wed Mar 26 21:34:56 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:34:56 +0100 (CET) Subject: [Lxml-checkins] r52961 - in lxml/trunk: . src/lxml Message-ID: <20080326203456.50586169ED4@codespeak.net> Author: scoder Date: Wed Mar 26 21:34:54 2008 New Revision: 52961 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/iterparse.pxi Log: r3834 at delle: sbehnel | 2008-03-26 18:06:27 +0100 fix root node finding with comments/pis in iterparse() Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Wed Mar 26 21:34:54 2008 @@ -145,8 +145,9 @@ ns_count = _countNsDefs(c_node) if self._event_filter & ITERPARSE_FILTER_END_NS: python.PyList_Append(self._ns_stack, ns_count) - if self._doc is None: - self._doc = _documentFactory(c_node.doc, None) + if self._root is None: + if self._doc is None: + self._doc = _documentFactory(c_node.doc, None) self._root = self._doc.getroot() if self._tag_tuple is None or \ _tagMatches(c_node, self._tag_href, self._tag_name): @@ -168,8 +169,9 @@ ITERPARSE_FILTER_END_NS): node = self._pop_node() else: - if self._doc is None: - self._doc = _documentFactory(c_node.doc, None) + if self._root is None: + if self._doc is None: + self._doc = _documentFactory(c_node.doc, None) self._root = self._doc.getroot() node = _elementFactory(self._doc, c_node) python.PyList_Append(self._events, ("end", node)) From scoder at codespeak.net Wed Mar 26 21:35:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:35:03 +0100 (CET) Subject: [Lxml-checkins] r52962 - in lxml/trunk: . doc Message-ID: <20080326203503.66907169EDB@codespeak.net> Author: scoder Date: Wed Mar 26 21:35:02 2008 New Revision: 52962 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/parsing.txt Log: r3835 at delle: sbehnel | 2008-03-26 18:07:10 +0100 iterparse doc updates and fixes Modified: lxml/trunk/doc/parsing.txt ============================================================================== --- lxml/trunk/doc/parsing.txt (original) +++ lxml/trunk/doc/parsing.txt Wed Mar 26 21:35:02 2008 @@ -333,10 +333,12 @@ iterparse and iterwalk ====================== -As known from ElementTree, the ``iterparse()`` utility function returns an -iterator that generates parser events for an XML file (or file-like object), -while building the tree. The values are tuples ``(event-type, object)``. The -event types are 'start', 'end', 'start-ns' and 'end-ns'. +As known from ElementTree, the ``iterparse()`` utility function +returns an iterator that generates parser events for an XML file (or +file-like object), while building the tree. The values are tuples +``(event-type, object)``. The event types supported by ElementTree +and lxml.etree are the strings 'start', 'end', 'start-ns' and +'end-ns'. The 'start' and 'end' events represent opening and closing elements and are accompanied by the respective element. By default, only 'end' events are @@ -384,6 +386,43 @@ end {testns}empty-element end root +The 'start-ns' and 'end-ns' events notify about namespace +declarations. They do not come with Elements. Instead, the value of +the 'start-ns' event is a tuple ``(prefix, namespaceURI)`` that +designates the beginning of a prefix-namespace mapping. The +corresponding ``end-ns`` event does not have a value (None). It is +common practice to use a list as namespace stack and pop the last +entry on the 'end-ns' event. + +.. sourcecode:: pycon + + >>> print xml, + <root> + <element key='value'>text</element> + <element>text</element>tail + <empty-element xmlns="testns" /> + </root> + + >>> events = ("start", "end", "start-ns", "end-ns") + >>> context = etree.iterparse(StringIO(xml), events=events) + >>> for action, elem in context: + ... if action in ('start', 'end'): + ... print action, elem.tag + ... elif action == 'start-ns': + ... print action, elem + ... else: + ... print action + start root + start element + end element + start element + end element + start-ns ('', 'testns') + start {testns}empty-element + end {testns}empty-element + end-ns + end root + Selective tag events -------------------- @@ -409,6 +448,51 @@ end {testns}empty-element +Comments and PIs +---------------- + +As an extension over ElementTree, the ``iterparse()`` function in +lxml.etree also supports the event types 'comment' and 'pi' for the +respective XML structures. + +.. sourcecode:: pycon + + >>> commented_xml = '''\ + ... <?some pi ?> + ... <!-- a comment --> + ... <root> + ... <element key='value'>text</element> + ... <!-- another comment --> + ... <element>text</element>tail + ... <empty-element xmlns="testns" /> + ... </root> + ... ''' + + >>> events = ("start", "end", "comment", "pi") + >>> context = etree.iterparse(StringIO(commented_xml), events=events) + >>> for action, elem in context: + ... if action in ('start', 'end'): + ... print action, elem.tag + ... elif action == 'pi': + ... print action, "-%s=%s-" % (elem.target, elem.text) + ... else: # 'comment' + ... print action, "-%s-" % elem.text + pi -some=pi - + comment - a comment - + start root + start element + end element + comment - another comment - + start element + end element + start {testns}empty-element + end {testns}empty-element + end root + + >>> print context.root.tag + root + + Modifying the tree ------------------ @@ -454,21 +538,6 @@ traverse all elements and do the tag selection by hand in the event handler code. -The 'start-ns' and 'end-ns' events notify about namespace declarations and -generate tuples ``(prefix, URI)``: - -.. sourcecode:: pycon - - >>> events = ("start-ns", "end-ns") - >>> context = etree.iterparse(StringIO(xml), events=events) - >>> for action, obj in context: - ... print action, obj - start-ns ('', 'testns') - end-ns None - -It is common practice to use a list as namespace stack and pop the last entry -on the 'end-ns' event. - iterwalk -------- From scoder at codespeak.net Wed Mar 26 21:35:07 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:35:07 +0100 (CET) Subject: [Lxml-checkins] r52963 - in lxml/trunk: . doc Message-ID: <20080326203507.E17F0169EE1@codespeak.net> Author: scoder Date: Wed Mar 26 21:35:07 2008 New Revision: 52963 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/parsing.txt lxml/trunk/setupinfo.py Log: r3836 at delle: sbehnel | 2008-03-26 18:25:37 +0100 parser doc updates Modified: lxml/trunk/doc/parsing.txt ============================================================================== --- lxml/trunk/doc/parsing.txt (original) +++ lxml/trunk/doc/parsing.txt Wed Mar 26 21:35:07 2008 @@ -10,15 +10,18 @@ .. 1 Parsers 1.1 Parser options - 1.2 Parsing HTML - 1.3 Doctype information - 2 The feed parser interface - 3 iterparse and iterwalk - 3.1 Selective tag events - 3.2 Modifying the tree - 3.3 iterwalk - 4 Python unicode strings - 4.1 Serialising to Unicode strings + 1.2 Error log + 1.3 Parsing HTML + 1.4 Doctype information + 2 The target parser interface + 3 The feed parser interface + 4 iterparse and iterwalk + 4.1 Selective tag events + 4.2 Comments and PIs + 4.3 Modifying the tree + 4.4 iterwalk + 5 Python unicode strings + 5.1 Serialising to Unicode strings The usual setup procedure: @@ -238,15 +241,19 @@ ... print "end", tag ... def data(self, data): ... print "data", repr(data) + ... def comment(self, text): + ... print "comment", text ... def close(self): ... print "close" ... return "closed!" >>> parser = etree.XMLParser(target = EchoTarget()) - >>> result = etree.XML("<element>some text</element>", parser) + >>> result = etree.XML("<element>some<!--comment-->text</element>", parser) start element {} - data u'some text' + data u'some' + comment comment + data u'text' end element close @@ -254,9 +261,19 @@ closed! Note that the parser does *not* build a tree in this case. The result -of the parser run is what the target object returns from its +of the parser run is whatever the target object returns from its ``close()`` method. If you want to return an XML tree here, you have -to create it programmatically in the target object. +to create it programmatically in the target object. An example for a +parser target that builds a tree is the ``TreeBuilder``. + + >>> parser = etree.XMLParser(target = etree.TreeBuilder()) + + >>> result = etree.XML("<element>some<!--comment-->text</element>", parser) + + >>> print result.tag + element + >>> print result[0].text + comment The feed parser interface Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Wed Mar 26 21:35:07 2008 @@ -83,8 +83,8 @@ if not CYTHON_INSTALLED: return [] from Cython.Compiler.Version import version - if split_version(version) <= (0,9,6,12): - return [] +# if split_version(version) <= (0,9,6,12): +# return [] package_dir = os.path.join(get_base_dir(), PACKAGE_PATH) files = os.listdir(package_dir) From scoder at codespeak.net Wed Mar 26 21:35:11 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:35:11 +0100 (CET) Subject: [Lxml-checkins] r52964 - lxml/trunk Message-ID: <20080326203511.93ED6169EE2@codespeak.net> Author: scoder Date: Wed Mar 26 21:35:11 2008 New Revision: 52964 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3837 at delle: sbehnel | 2008-03-26 18:30:41 +0100 accidental commit Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Wed Mar 26 21:35:11 2008 @@ -83,8 +83,8 @@ if not CYTHON_INSTALLED: return [] from Cython.Compiler.Version import version -# if split_version(version) <= (0,9,6,12): -# return [] + if split_version(version) <= (0,9,6,12): + return [] package_dir = os.path.join(get_base_dir(), PACKAGE_PATH) files = os.listdir(package_dir) From scoder at codespeak.net Wed Mar 26 21:35:15 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:35:15 +0100 (CET) Subject: [Lxml-checkins] r52965 - in lxml/trunk: . doc Message-ID: <20080326203515.72DB0169EE1@codespeak.net> Author: scoder Date: Wed Mar 26 21:35:15 2008 New Revision: 52965 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/parsing.txt Log: r3838 at delle: sbehnel | 2008-03-26 19:00:34 +0100 doc fixes Modified: lxml/trunk/doc/parsing.txt ============================================================================== --- lxml/trunk/doc/parsing.txt (original) +++ lxml/trunk/doc/parsing.txt Wed Mar 26 21:35:15 2008 @@ -279,15 +279,19 @@ The feed parser interface ========================= -Since lxml 2.0, the parsers have a feed parser interface that is compatible to -the `ElementTree parsers`_. You can use it to feed data into the parser in a -controlled step-by-step way. Note that you can only use one interface at a -time with each parser: the ``parse()`` or ``XML()`` functions, or the feed -parser interface. +Since lxml 2.0, the parsers have a feed parser interface that is +compatible to the `ElementTree parsers`_. You can use it to feed data +into the parser in a controlled step-by-step way. + +In lxml.etree, you can use both interfaces to a parser at the same +time: the ``parse()`` or ``XML()`` functions, and the feed parser +interface. Both are independent and will not conflict (except if used +in conjunction with a parser target object as described above). .. _`ElementTree parsers`: http://effbot.org/elementtree/elementtree-xmlparser.htm -To start parsing with a feed parser, just call its ``feed()`` method: +To start parsing with a feed parser, just call its ``feed()`` method +to feed it some data. .. sourcecode:: pycon @@ -309,9 +313,10 @@ >>> print root[0].tag a -If you do not call ``close()``, the parser will stay locked and subsequent -usages will block till the end of times. So make sure you also close it in -the exception case. +If you do not call ``close()``, the parser will stay locked and +subsequent feeds will keep appending data, usually resulting in a non +well-formed document and an unexpected parser error. So make sure you +always close the parser after use, also in the exception case. Another way of achieving the same step-by-step parsing is by writing your own file-like object that returns a chunk of data on each ``read()`` call. Where @@ -341,7 +346,7 @@ >>> print result closed! -Again, this prevents the automatic creating of an XML tree and leaves +Again, this prevents the automatic creation of an XML tree and leaves all the event handling to the target object. The ``close()`` method of the parser forwards the return value of the target's ``close()`` method. From scoder at codespeak.net Wed Mar 26 21:35:19 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:35:19 +0100 (CET) Subject: [Lxml-checkins] r52966 - in lxml/trunk: . doc Message-ID: <20080326203519.DA773169ED4@codespeak.net> Author: scoder Date: Wed Mar 26 21:35:19 2008 New Revision: 52966 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/parsing.txt Log: r3839 at delle: sbehnel | 2008-03-26 19:05:56 +0100 doc fixes Modified: lxml/trunk/doc/parsing.txt ============================================================================== --- lxml/trunk/doc/parsing.txt (original) +++ lxml/trunk/doc/parsing.txt Wed Mar 26 21:35:19 2008 @@ -362,9 +362,9 @@ and lxml.etree are the strings 'start', 'end', 'start-ns' and 'end-ns'. -The 'start' and 'end' events represent opening and closing elements and are -accompanied by the respective element. By default, only 'end' events are -generated: +The 'start' and 'end' events represent opening and closing elements. +They are accompanied by the respective Element instance. By default, +only 'end' events are generated: .. sourcecode:: pycon From scoder at codespeak.net Wed Mar 26 21:56:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 21:56:28 +0100 (CET) Subject: [Lxml-checkins] r52968 - in lxml/branch/lxml-2.0: . src/lxml/html src/lxml/html/tests Message-ID: <20080326205628.656F6169EE7@codespeak.net> Author: scoder Date: Wed Mar 26 21:56:25 2008 New Revision: 52968 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/src/lxml/html/diff.py lxml/branch/lxml-2.0/src/lxml/html/tests/test_diff.txt Log: trunk merge -c 52960: empty tag fix for lxml.html.diff Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Wed Mar 26 21:56:25 2008 @@ -16,6 +16,9 @@ Bugs fixed ---------- +* ``lxml.html.diff`` didn't treat empty tags properly (e.g., + ``<br>``). + * Handle entity replacements correctly in target parser. * Crash when using ``iterparse()`` with XML Schema validation. Modified: lxml/branch/lxml-2.0/src/lxml/html/diff.py ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/html/diff.py (original) +++ lxml/branch/lxml-2.0/src/lxml/html/diff.py Wed Mar 26 21:56:25 2008 @@ -139,6 +139,8 @@ ############################################################ def htmldiff(old_html, new_html): + ## FIXME: this should take parsed documents too, and use their body + ## or other content. """ Do a diff of the old and new document. The documents are HTML *fragments* (str/UTF8 or unicode), they are not complete documents (i.e., no <html> tag). @@ -310,8 +312,6 @@ endtag = chunk[1] == '/' name = chunk.split()[0].strip('<>/') if name in empty_tags: - assert not endtag, ( - "Empty tag %r should have no end tag" % chunk) balanced.append(chunk) continue if endtag: @@ -669,7 +669,7 @@ yield ('img', el.attrib['src'], start_tag(el)) else: yield start_tag(el) - if el.tag in empty_tags and not el.text and not len(el): + if el.tag in empty_tags and not el.text and not len(el) and not el.tail: return start_words = split_words(el.text) for word in start_words: Modified: lxml/branch/lxml-2.0/src/lxml/html/tests/test_diff.txt ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/html/tests/test_diff.txt (original) +++ lxml/branch/lxml-2.0/src/lxml/html/tests/test_diff.txt Wed Mar 26 21:56:25 2008 @@ -66,6 +66,11 @@ >>> pdiff('<a href="http://yahoo.com">search</a>', '<a href="http://yahoo.com">search</a>') <a href="http://yahoo.com">search</a> +A test of empty elements: + + >>> pdiff('some <br> text', 'some <br> test') + some <ins><br> test</ins> <del><br> text</del> + The sixteen combinations:: First "insert start" (del start/middle/end/none): @@ -177,8 +182,8 @@ >>> panno('<p>Hi <img src="/foo"> You</p>', ... '<p>Hi You</p>', ... '<p>Hi You <img src="/bar"></p>') - <p><span version="0">Hi</span> <span version="1">You</span> <span - version="2"><img src="/bar"></span></p> + <p><span version="0">Hi You</span> <span version="2"><img + src="/bar"></span></p> >>> panno('<p><a href="/foo">Hey</a></p>', ... '<p><a href="/bar">Hey</a></p>') <p><a href="/bar"><span version="0">Hey</span></a></p> From scoder at codespeak.net Wed Mar 26 22:09:23 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 22:09:23 +0100 (CET) Subject: [Lxml-checkins] r52969 - in lxml/branch/lxml-2.0: . doc Message-ID: <20080326210923.30D68169EDD@codespeak.net> Author: scoder Date: Wed Mar 26 22:09:21 2008 New Revision: 52969 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/doc/main.txt lxml/branch/lxml-2.0/doc/mkhtml.py Log: prepare release Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Wed Mar 26 22:09:21 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -2.0.3 (Under development) -========================= +2.0.3 (2008-03-26) +================== Features added -------------- Modified: lxml/branch/lxml-2.0/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/main.txt (original) +++ lxml/branch/lxml-2.0/doc/main.txt Wed Mar 26 22:09:21 2008 @@ -145,8 +145,8 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 2.0.2`_, released 2008-02-22 -(`changes for 2.0.2`_). `Older versions`_ are listed below. +The latest version is `lxml 2.0.3`_, released 2008-02-22 +(`changes for 2.0.3`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions @@ -206,6 +206,8 @@ Old Versions ------------ +* `lxml 2.0.2`_, released 2008-02-22 (`changes for 2.0.2`_) + * `lxml 2.0.1`_, released 2008-02-13 (`changes for 2.0.1`_) * `lxml 2.0`_, released 2008-02-01 (`changes for 2.0`_) @@ -260,6 +262,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.0.3`: lxml-2.0.3.tgz .. _`lxml 2.0.2`: lxml-2.0.2.tgz .. _`lxml 2.0.1`: lxml-2.0.1.tgz .. _`lxml 2.0`: lxml-2.0.tgz @@ -288,6 +291,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.0.3`: changes-2.0.3.html .. _`changes for 2.0.2`: changes-2.0.2.html .. _`changes for 2.0.1`: changes-2.0.1.html .. _`changes for 2.0`: changes-2.0.html Modified: lxml/branch/lxml-2.0/doc/mkhtml.py ============================================================================== --- lxml/branch/lxml-2.0/doc/mkhtml.py (original) +++ lxml/branch/lxml-2.0/doc/mkhtml.py Wed Mar 26 22:09:21 2008 @@ -3,8 +3,8 @@ import os, shutil, re, sys, copy, time SITE_STRUCTURE = [ - ('lxml', ('main.txt', 'intro.txt', 'lxml2.txt', 'FAQ.txt', - 'compatibility.txt', 'performance.txt')), + ('lxml', ('main.txt', 'intro.txt', '../INSTALL.txt', 'lxml2.txt', + 'performance.txt', 'compatibility.txt', 'FAQ.txt')), ('Developing with lxml', ('tutorial.txt', '@API reference', 'api.txt', 'parsing.txt', 'validation.txt', 'xpathxslt.txt', @@ -12,7 +12,8 @@ 'cssselect.txt', 'elementsoup.txt')), ('Extending lxml', ('resolvers.txt', 'extensions.txt', 'element_classes.txt', 'sax.txt', 'capi.txt')), - ('Developing lxml', ('build.txt', 'lxml-source-howto.txt')), + ('Developing lxml', ('build.txt', 'lxml-source-howto.txt', + '@Release Changelog')), ] RST2HTML_OPTIONS = " ".join([ @@ -26,6 +27,11 @@ "API reference" : "api/index.html" } +BASENAME_MAP = { + 'main' : 'index', + 'INSTALL' : 'installation', +} + htmlnsmap = {"h" : "http://www.w3.org/1999/xhtml"} find_title = XPath("/h:html/h:head/h:title/text()", namespaces=htmlnsmap) @@ -51,7 +57,7 @@ if page_title: page_title = page_title[0] else: - page_title = replace_invalid(' ', basename.capitalize()) + page_title = replace_invalid('', basename.capitalize()) build_menu_entry(page_title, basename+".html", section_head, headings=find_headings(tree)) @@ -78,7 +84,7 @@ tag = el.tag if tag[0] != '{': el.tag = "{http://www.w3.org/1999/xhtml}" + tag - current_menu = find_menu(menu_root, name=name) + current_menu = find_menu(menu_root, name=replace_invalid(' ', name)) if current_menu: for submenu in current_menu: submenu.set("class", submenu.get("class", ""). @@ -102,6 +108,10 @@ shutil.copy(pubkey, dirname) + href_map = HREF_MAP.copy() + changelog_basename = 'changes-%s' % release + href_map['Release Changelog'] = changelog_basename + '.html' + trees = {} menu = Element("div", {"class":"sidemenu"}) # build HTML pages and parse them back @@ -111,13 +121,12 @@ if filename.startswith('@'): # special menu entry page_title = filename[1:] - url = HREF_MAP[page_title] + url = href_map[page_title] build_menu_entry(page_title, url, section_head) else: path = os.path.join(doc_dir, filename) - basename = os.path.splitext(filename)[0] - if basename == 'main': - basename = 'index' + basename = os.path.splitext(os.path.basename(filename))[0] + basename = BASENAME_MAP.get(basename, basename) outname = basename + '.html' outpath = os.path.join(dirname, outname) @@ -128,20 +137,16 @@ build_menu(tree, basename, section_head) + # also convert CHANGES.txt + rest2html(script, + os.path.join(lxml_path, 'CHANGES.txt'), + os.path.join(dirname, 'changes-%s.html' % release), + '') + # integrate menu for tree, basename, outpath in trees.itervalues(): new_tree = merge_menu(tree, menu, basename) new_tree.write(outpath) - # also convert INSTALL.txt and CHANGES.txt - rest2html(script, - os.path.join(lxml_path, 'INSTALL.txt'), - os.path.join(dirname, 'installation.html'), - stylesheet_url) - rest2html(script, - os.path.join(lxml_path, 'CHANGES.txt'), - os.path.join(dirname, 'changes-%s.html' % release), - stylesheet_url) - if __name__ == '__main__': publish(sys.argv[1], sys.argv[2], sys.argv[3]) From scoder at codespeak.net Wed Mar 26 22:12:50 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 26 Mar 2008 22:12:50 +0100 (CET) Subject: [Lxml-checkins] r52970 - lxml/branch/lxml-2.0/doc Message-ID: <20080326211250.17873169EDB@codespeak.net> Author: scoder Date: Wed Mar 26 22:12:50 2008 New Revision: 52970 Modified: lxml/branch/lxml-2.0/doc/elementsoup.txt Log: doc fixes Modified: lxml/branch/lxml-2.0/doc/elementsoup.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/elementsoup.txt (original) +++ lxml/branch/lxml-2.0/doc/elementsoup.txt Wed Mar 26 22:12:50 2008 @@ -19,22 +19,16 @@ ElementTree. The first returns a root Element, the latter returns an ElementTree. -Here is a document full of tag soup, similar to, but not quite like, HTML: - -.. sourcecode:: pycon +Here is a document full of tag soup, similar to, but not quite like, HTML:: >>> tag_soup = '<meta><head><title>Hello</head<body onload=crash()>Hi all<p>' -all you need to do is pass it to the ``fromstring()`` function: - -.. sourcecode:: pycon +all you need to do is pass it to the ``fromstring()`` function:: >>> from lxml.html.soupparser import fromstring >>> root = fromstring(tag_soup) -To see what we have here, you can serialise it: - -.. sourcecode:: pycon +To see what we have here, you can serialise it:: >>> from lxml.etree import tostring >>> print tostring(root, pretty_print=True), @@ -54,9 +48,7 @@ By default, this is based on the HTML parser defined in ``lxml.html``. By default, the BeautifulSoup parser also replaces the entities it -finds by their character equivalent. - -.. sourcecode:: pycon +finds by their character equivalent:: >>> tag_soup = '<body>©€-õƽ<p>' >>> body = fromstring(tag_soup).find('.//body') @@ -64,9 +56,7 @@ u'\xa9\u20ac-\xf5\u01bd' If you want them back on the way out, you can serialise with the -'html' method, which will always use escaping for safety reasons: - -.. sourcecode:: pycon +'html' method, which will always use escaping for safety reasons:: >>> tostring(body, method="html") '<body>©€-õƽ<p></p></body>' @@ -78,9 +68,7 @@ u'<body>©€-õƽ<p></p></body>' Otherwise, when serialising to XML, only the plain ASCII encoding will -escape non-ASCII characters: - -.. sourcecode:: pycon +escape non-ASCII characters:: >>> tostring(body) '<body>©€-õƽ<p/></body>' From scoder at codespeak.net Thu Mar 27 09:47:44 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 09:47:44 +0100 (CET) Subject: [Lxml-checkins] r52982 - lxml/branch/lxml-2.0/doc Message-ID: <20080327084744.33A3E169ED5@codespeak.net> Author: scoder Date: Thu Mar 27 09:47:40 2008 New Revision: 52982 Modified: lxml/branch/lxml-2.0/doc/main.txt Log: release fix Modified: lxml/branch/lxml-2.0/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/main.txt (original) +++ lxml/branch/lxml-2.0/doc/main.txt Thu Mar 27 09:47:40 2008 @@ -145,7 +145,7 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 2.0.3`_, released 2008-02-22 +The latest version is `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions From lxml-checkins at codespeak.net Thu Mar 27 11:03:59 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Thu, 27 Mar 2008 11:03:59 +0100 (CET) Subject: [Lxml-checkins] MedHelp 7796096 Message-ID: <20080327140025.9126.qmail@dsl.dynamic8121512074.ttnet.net.tr> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080327/8700b0fd/attachment.htm From scoder at codespeak.net Thu Mar 27 17:01:40 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:01:40 +0100 (CET) Subject: [Lxml-checkins] r52989 - lxml/branch/lxml-2.0 Message-ID: <20080327160140.7CD11169F6E@codespeak.net> Author: scoder Date: Thu Mar 27 17:01:38 2008 New Revision: 52989 Modified: lxml/branch/lxml-2.0/MANIFEST.in Log: include Makefile Modified: lxml/branch/lxml-2.0/MANIFEST.in ============================================================================== --- lxml/branch/lxml-2.0/MANIFEST.in (original) +++ lxml/branch/lxml-2.0/MANIFEST.in Thu Mar 27 17:01:38 2008 @@ -2,7 +2,7 @@ include setup.py ez_setup.py setupinfo.py versioninfo.py include test.py selftest.py selftest2.py include update-error-constants.py -include MANIFEST.in version.txt +include MANIFEST.in Makefile version.txt include CHANGES.txt CREDITS.txt INSTALL.txt LICENSES.txt README.txt TODO.txt recursive-include src *.pyx *.pxd *.pxi *.py recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c From scoder at codespeak.net Thu Mar 27 17:12:00 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:12:00 +0100 (CET) Subject: [Lxml-checkins] r52990 - lxml/branch/lxml-2.0 Message-ID: <20080327161200.6BDFE16847F@codespeak.net> Author: scoder Date: Thu Mar 27 17:12:00 2008 New Revision: 52990 Modified: lxml/branch/lxml-2.0/MANIFEST.in Log: include lxml.html test data in dist Modified: lxml/branch/lxml-2.0/MANIFEST.in ============================================================================== --- lxml/branch/lxml-2.0/MANIFEST.in (original) +++ lxml/branch/lxml-2.0/MANIFEST.in Thu Mar 27 17:12:00 2008 @@ -8,6 +8,7 @@ recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd +recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png recursive-include fake_pyrex *.py From scoder at codespeak.net Thu Mar 27 17:15:42 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:15:42 +0100 (CET) Subject: [Lxml-checkins] r52991 - lxml/branch/lxml-2.0 Message-ID: <20080327161542.E7628169F8A@codespeak.net> Author: scoder Date: Thu Mar 27 17:15:42 2008 New Revision: 52991 Modified: lxml/branch/lxml-2.0/MANIFEST.in Log: include .txt test files in dist Modified: lxml/branch/lxml-2.0/MANIFEST.in ============================================================================== --- lxml/branch/lxml-2.0/MANIFEST.in (original) +++ lxml/branch/lxml-2.0/MANIFEST.in Thu Mar 27 17:15:42 2008 @@ -7,7 +7,7 @@ recursive-include src *.pyx *.pxd *.pxi *.py recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h -recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd +recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.txt recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png From scoder at codespeak.net Thu Mar 27 17:21:10 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:21:10 +0100 (CET) Subject: [Lxml-checkins] r52992 - lxml/tag/lxml-2.0.2 Message-ID: <20080327162110.73668169F8E@codespeak.net> Author: scoder Date: Thu Mar 27 17:21:10 2008 New Revision: 52992 Added: lxml/tag/lxml-2.0.2/ - copied from r51802, lxml/branch/lxml-2.0/ Log: tag for 2.0.2 From scoder at codespeak.net Thu Mar 27 17:22:32 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:22:32 +0100 (CET) Subject: [Lxml-checkins] r52993 - lxml/tag/lxml-2.0.3 Message-ID: <20080327162232.653DC169F8E@codespeak.net> Author: scoder Date: Thu Mar 27 17:22:30 2008 New Revision: 52993 Added: lxml/tag/lxml-2.0.3/ - copied from r52982, lxml/branch/lxml-2.0/ Log: tag for 2.0.3 From scoder at codespeak.net Thu Mar 27 17:33:48 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:33:48 +0100 (CET) Subject: [Lxml-checkins] r52994 - lxml/branch/lxml-2.0 Message-ID: <20080327163348.D8A32169F7A@codespeak.net> Author: scoder Date: Thu Mar 27 17:33:48 2008 New Revision: 52994 Modified: lxml/branch/lxml-2.0/MANIFEST.in Log: another MANIFEST fix Modified: lxml/branch/lxml-2.0/MANIFEST.in ============================================================================== --- lxml/branch/lxml-2.0/MANIFEST.in (original) +++ lxml/branch/lxml-2.0/MANIFEST.in Thu Mar 27 17:33:48 2008 @@ -7,7 +7,7 @@ recursive-include src *.pyx *.pxd *.pxi *.py recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h -recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.txt +recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png From scoder at codespeak.net Thu Mar 27 17:41:40 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:41:40 +0100 (CET) Subject: [Lxml-checkins] r52995 - lxml/trunk Message-ID: <20080327164140.03459169F9D@codespeak.net> Author: scoder Date: Thu Mar 27 17:41:40 2008 New Revision: 52995 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3849 at delle: sbehnel | 2008-03-26 22:14:35 +0100 changelog fix Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Mar 27 17:41:40 2008 @@ -29,9 +29,6 @@ * Default encoding for plain text serialisation was different from that of XML serialisation (UTF-8 instead of ASCII). -* ``lxml.html.diff`` didn't treat empty tags properly (e.g., - ``<br>``). - Other changes ------------- @@ -79,8 +76,8 @@ ``objectify.set_default_parser()`` -2.0.3 (Under development) -========================= +2.0.3 (2008-03-26) +================== Features added -------------- @@ -93,6 +90,9 @@ Bugs fixed ---------- +* ``lxml.html.diff`` didn't treat empty tags properly (e.g., + ``<br>``). + * Handle entity replacements correctly in target parser. * Crash when using ``iterparse()`` with XML Schema validation. From scoder at codespeak.net Thu Mar 27 17:41:49 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:41:49 +0100 (CET) Subject: [Lxml-checkins] r52996 - in lxml/trunk: . doc Message-ID: <20080327164149.9E491169F9E@codespeak.net> Author: scoder Date: Thu Mar 27 17:41:49 2008 New Revision: 52996 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r3850 at delle: sbehnel | 2008-03-27 09:47:33 +0100 2.0.3 release fix Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Thu Mar 27 17:41:49 2008 @@ -142,8 +142,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.0.2`_, released 2008-02-22 -(`changes for 2.0.2`_). `Older versions`_ are listed below. +The latest version is `lxml 2.0.3`_, released 2008-03-26 +(`changes for 2.0.3`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -211,6 +211,10 @@ Old Versions ------------ +* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_) + +* `lxml 2.0.2`_, released 2008-02-22 (`changes for 2.0.2`_) + * `lxml 2.0.1`_, released 2008-02-13 (`changes for 2.0.1`_) * `lxml 2.0`_, released 2008-02-01 (`changes for 2.0`_) @@ -265,6 +269,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.0.3`: lxml-2.0.3.tgz .. _`lxml 2.0.2`: lxml-2.0.2.tgz .. _`lxml 2.0.1`: lxml-2.0.1.tgz .. _`lxml 2.0`: lxml-2.0.tgz @@ -293,6 +298,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.0.3`: changes-2.0.3.html .. _`changes for 2.0.2`: changes-2.0.2.html .. _`changes for 2.0.1`: changes-2.0.1.html .. _`changes for 2.0`: changes-2.0.html From scoder at codespeak.net Thu Mar 27 17:41:53 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:41:53 +0100 (CET) Subject: [Lxml-checkins] r52997 - in lxml/trunk: . src/lxml Message-ID: <20080327164153.14607169FA0@codespeak.net> Author: scoder Date: Thu Mar 27 17:41:52 2008 New Revision: 52997 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/proxy.pxi Log: r3851 at delle: sbehnel | 2008-03-27 13:13:31 +0100 inline small functions Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Thu Mar 27 17:41:52 2008 @@ -4,7 +4,7 @@ # structure of the respective node to avoid multiple instantiation of # the Python class -cdef _Element getProxy(xmlNode* c_node): +cdef inline _Element getProxy(xmlNode* c_node): """Get a proxy for a given node. """ #print "getProxy for:", <int>c_node @@ -13,10 +13,10 @@ else: return None -cdef int hasProxy(xmlNode* c_node): +cdef inline int hasProxy(xmlNode* c_node): return c_node._private is not NULL -cdef int _registerProxy(_Element proxy) except -1: +cdef inline int _registerProxy(_Element proxy) except -1: """Register a proxy and type for the node it's proxying for. """ cdef xmlNode* c_node @@ -31,7 +31,7 @@ proxy._gc_doc = <python.PyObject*>proxy._doc python.Py_INCREF(proxy._doc) -cdef int _unregisterProxy(_Element proxy) except -1: +cdef inline int _unregisterProxy(_Element proxy) except -1: """Unregister a proxy for the node it's proxying for. """ cdef xmlNode* c_node @@ -40,7 +40,7 @@ c_node._private = NULL return 0 -cdef void _releaseProxy(_Element proxy): +cdef inline void _releaseProxy(_Element proxy): """An additional DECREF for the document. """ python.Py_XDECREF(proxy._gc_doc) From scoder at codespeak.net Thu Mar 27 17:41:58 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:41:58 +0100 (CET) Subject: [Lxml-checkins] r52998 - in lxml/trunk: . src/lxml Message-ID: <20080327164158.62C1D169F9D@codespeak.net> Author: scoder Date: Thu Mar 27 17:41:57 2008 New Revision: 52998 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/classlookup.pxi lxml/trunk/src/lxml/nsclasses.pxi lxml/trunk/src/lxml/public-api.pxi Log: r3852 at delle: sbehnel | 2008-03-27 13:42:49 +0100 inline small functions Modified: lxml/trunk/src/lxml/classlookup.pxi ============================================================================== --- lxml/trunk/src/lxml/classlookup.pxi (original) +++ lxml/trunk/src/lxml/classlookup.pxi Thu Mar 27 17:41:57 2008 @@ -107,8 +107,9 @@ """ self._setFallback(lookup) - cdef object _callFallback(self, _Document doc, xmlNode* c_node): - return self._fallback_function(self.fallback, doc, c_node) +cdef inline object _callLookupFallback(FallbackElementClassLookup lookup, + _Document doc, xmlNode* c_node): + return lookup._fallback_function(lookup.fallback, doc, c_node) ################################################################################ @@ -235,7 +236,7 @@ dict_result = python.PyDict_GetItem(lookup._class_mapping, value) if dict_result is not NULL: return <object>dict_result - return lookup._callFallback(doc, c_node) + return _callLookupFallback(lookup, doc, c_node) ################################################################################ @@ -253,7 +254,7 @@ if doc._parser._class_lookup is not None: return doc._parser._class_lookup._lookup_function( doc._parser._class_lookup, doc, c_node) - return (<FallbackElementClassLookup>state)._callFallback(doc, c_node) + return _callLookupFallback(<FallbackElementClassLookup>state, doc, c_node) ################################################################################ @@ -312,7 +313,7 @@ cls = lookup.lookup(element_type, doc, ns, name) if cls is not None: return cls - return lookup._callFallback(doc, c_node) + return _callLookupFallback(lookup, doc, c_node) ################################################################################ @@ -383,7 +384,7 @@ if cls is not None: return cls - return lookup._callFallback(doc, c_node) + return _callLookupFallback(lookup, doc, c_node) ################################################################################ # Global setup Modified: lxml/trunk/src/lxml/nsclasses.pxi ============================================================================== --- lxml/trunk/src/lxml/nsclasses.pxi (original) +++ lxml/trunk/src/lxml/nsclasses.pxi Thu Mar 27 17:41:57 2008 @@ -131,7 +131,7 @@ lookup = <ElementNamespaceClassLookup>state if c_node.type != tree.XML_ELEMENT_NODE: - return lookup._callFallback(doc, c_node) + return _callLookupFallback(lookup, doc, c_node) c_namespace_utf = _getNs(c_node) if c_namespace_utf is not NULL: @@ -155,7 +155,7 @@ if dict_result is not NULL: return <object>dict_result - return lookup._callFallback(doc, c_node) + return _callLookupFallback(lookup, doc, c_node) ################################################################################ Modified: lxml/trunk/src/lxml/public-api.pxi ============================================================================== --- lxml/trunk/src/lxml/public-api.pxi (original) +++ lxml/trunk/src/lxml/public-api.pxi Thu Mar 27 17:41:57 2008 @@ -41,7 +41,7 @@ cdef public api object callLookupFallback(FallbackElementClassLookup lookup, _Document doc, xmlNode* c_node): - return lookup._callFallback(doc, c_node) + return _callLookupFallback(lookup, doc, c_node) cdef public api int tagMatches(xmlNode* c_node, char* c_href, char* c_name): if c_node is NULL: From scoder at codespeak.net Thu Mar 27 17:42:02 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:02 +0100 (CET) Subject: [Lxml-checkins] r52999 - in lxml/trunk: . src/lxml Message-ID: <20080327164202.6945E169F9F@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:02 2008 New Revision: 52999 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi Log: r3853 at delle: sbehnel | 2008-03-27 13:45:57 +0100 inline small functions Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Thu Mar 27 17:42:02 2008 @@ -347,10 +347,10 @@ xml_string = xml_string[i:] return xml_string -cdef int _hasText(xmlNode* c_node): +cdef inline int _hasText(xmlNode* c_node): return c_node is not NULL and _textNodeOrSkip(c_node.children) is not NULL -cdef int _hasTail(xmlNode* c_node): +cdef inline int _hasTail(xmlNode* c_node): return c_node is not NULL and _textNodeOrSkip(c_node.next) is not NULL cdef _collectText(xmlNode* c_node): @@ -442,7 +442,7 @@ cdef bint _hasChild(xmlNode* c_node): return c_node is not NULL and _findChildForwards(c_node, 0) is not NULL -cdef Py_ssize_t _countElements(xmlNode* c_node): +cdef inline Py_ssize_t _countElements(xmlNode* c_node): "Counts the elements within the following siblings and the node itself." cdef Py_ssize_t count count = 0 @@ -506,13 +506,13 @@ c_node = _nextElement(c_node) return result -cdef xmlNode* _findChild(xmlNode* c_node, Py_ssize_t index): +cdef inline xmlNode* _findChild(xmlNode* c_node, Py_ssize_t index): if index < 0: return _findChildBackwards(c_node, -index - 1) else: return _findChildForwards(c_node, index) -cdef xmlNode* _findChildForwards(xmlNode* c_node, Py_ssize_t index): +cdef inline xmlNode* _findChildForwards(xmlNode* c_node, Py_ssize_t index): """Return child element of c_node with index, or return NULL if not found. """ cdef xmlNode* c_child @@ -527,7 +527,7 @@ c_child = c_child.next return NULL -cdef xmlNode* _findChildBackwards(xmlNode* c_node, Py_ssize_t index): +cdef inline xmlNode* _findChildBackwards(xmlNode* c_node, Py_ssize_t index): """Return child element of c_node with index, or return NULL if not found. Search from the end. """ @@ -543,7 +543,7 @@ c_child = c_child.prev return NULL -cdef xmlNode* _textNodeOrSkip(xmlNode* c_node): +cdef inline xmlNode* _textNodeOrSkip(xmlNode* c_node): """Return the node if it's a text node. Skip over ignorable nodes in a series of text nodes. Return NULL if a non-ignorable node is found. @@ -560,7 +560,7 @@ return NULL return NULL -cdef xmlNode* _nextElement(xmlNode* c_node): +cdef inline xmlNode* _nextElement(xmlNode* c_node): """Given a node, find the next sibling that is an element. """ if c_node is NULL: @@ -572,7 +572,7 @@ c_node = c_node.next return NULL -cdef xmlNode* _previousElement(xmlNode* c_node): +cdef inline xmlNode* _previousElement(xmlNode* c_node): """Given a node, find the next sibling that is an element. """ if c_node is NULL: @@ -584,7 +584,7 @@ c_node = c_node.prev return NULL -cdef xmlNode* _parentElement(xmlNode* c_node): +cdef inline xmlNode* _parentElement(xmlNode* c_node): "Given a node, find the parent element." if c_node is NULL or not _isElement(c_node): return NULL @@ -593,7 +593,7 @@ return NULL return c_node -cdef bint _tagMatches(xmlNode* c_node, char* c_href, char* c_name): +cdef inline bint _tagMatches(xmlNode* c_node, char* c_href, char* c_name): """Tests if the node matches namespace URI and tag name. A node matches if it matches both c_href and c_name. @@ -892,7 +892,7 @@ # parent element has moved; change them too.. moveNodeToDocument(element._doc, c_node) -cdef int isutf8(char* s): +cdef inline int isutf8(char* s): cdef char c c = s[0] while c != c'\0': From scoder at codespeak.net Thu Mar 27 17:42:06 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:06 +0100 (CET) Subject: [Lxml-checkins] r53000 - in lxml/trunk: . benchmark Message-ID: <20080327164206.6C69F169F9E@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:05 2008 New Revision: 53000 Modified: lxml/trunk/ (props changed) lxml/trunk/benchmark/benchbase.py Log: r3854 at delle: sbehnel | 2008-03-27 13:46:25 +0100 fix callgrind instrumentation setup in benchmarks Modified: lxml/trunk/benchmark/benchbase.py ============================================================================== --- lxml/trunk/benchmark/benchbase.py (original) +++ lxml/trunk/benchmark/benchbase.py Thu Mar 27 17:42:05 2008 @@ -500,6 +500,7 @@ if callgrind_zero: cmd = open("callgrind.cmd", 'w') + cmd.write('+Instrumentation\n') cmd.write('Zero\n') cmd.close() From scoder at codespeak.net Thu Mar 27 17:42:10 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:10 +0100 (CET) Subject: [Lxml-checkins] r53001 - in lxml/trunk: . src/lxml Message-ID: <20080327164210.7C352169FA0@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:09 2008 New Revision: 53001 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi Log: r3855 at delle: sbehnel | 2008-03-27 13:47:59 +0100 inline small function Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Thu Mar 27 17:42:09 2008 @@ -439,7 +439,7 @@ element._c_node, _cstr(ns), NULL) return '%s:%s' % (c_ns.prefix, tag) -cdef bint _hasChild(xmlNode* c_node): +cdef inline bint _hasChild(xmlNode* c_node): return c_node is not NULL and _findChildForwards(c_node, 0) is not NULL cdef inline Py_ssize_t _countElements(xmlNode* c_node): From scoder at codespeak.net Thu Mar 27 17:42:14 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:14 +0100 (CET) Subject: [Lxml-checkins] r53002 - in lxml/trunk: . doc Message-ID: <20080327164214.09CFF169FA1@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:13 2008 New Revision: 53002 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt Log: r3856 at delle: sbehnel | 2008-03-27 16:44:42 +0100 prepare release Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Mar 27 17:42:13 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -2.1alpha1 (Under development) -============================= +2.1alpha1 (2008-03-27) +====================== Features added -------------- Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Thu Mar 27 17:42:13 2008 @@ -142,8 +142,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.0.3`_, released 2008-03-26 -(`changes for 2.0.3`_). `Older versions`_ are listed below. +The latest version is `lxml 2.1alpha1`_, released 2008-03-27 +(`changes for 2.1alpha1`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! From scoder at codespeak.net Thu Mar 27 17:42:18 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:18 +0100 (CET) Subject: [Lxml-checkins] r53003 - in lxml/trunk: . doc Message-ID: <20080327164218.8C07C169FA3@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:17 2008 New Revision: 53003 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/xpathxslt.txt Log: r3857 at delle: sbehnel | 2008-03-27 16:51:28 +0100 doc fix Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Thu Mar 27 17:42:17 2008 @@ -18,9 +18,10 @@ 2 XSLT 2.1 XSLT result objects 2.2 Stylesheet parameters - 2.3 The ``xslt()`` tree method - 2.4 Dealing with stylesheet diversity - 2.5 Profiling + 2.3 Extension elements + 2.4 The ``xslt()`` tree method + 2.5 Dealing with stylesheet complexity + 2.6 Profiling The usual setup procedure: @@ -569,7 +570,7 @@ ... </xsl:template> ... </xsl:stylesheet>''') -To register the extension, add its name and namespace to the extension +To register the extension, add its namespace and name to the extension mapping of the XSLT object: .. sourcecode:: pycon @@ -591,10 +592,10 @@ '<?xml version="1.0"?>\n<foo>I did it!<child>XYZ</child></foo>\n' XSLT extensions are a very powerful feature that allows you to -interact directly with the XSLT processor. You have full access to -the input document and the stylesheet, and you can even call back into -the XSLT processor to process templates. Here is an example that -passes an Element into the ``.apply_templates()`` method of the +interact directly with the XSLT processor. You have full read-only +access to the input document and the stylesheet, and you can even call +back into the XSLT processor to process templates. Here is an example +that passes an Element into the ``.apply_templates()`` method of the ``XSLTExtension`` instance: .. sourcecode:: pycon From scoder at codespeak.net Thu Mar 27 17:42:21 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:21 +0100 (CET) Subject: [Lxml-checkins] r53004 - in lxml/trunk: . doc Message-ID: <20080327164221.09470169FA3@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:21 2008 New Revision: 53004 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r3858 at delle: sbehnel | 2008-03-27 16:57:33 +0100 prepare release Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Thu Mar 27 17:42:21 2008 @@ -269,6 +269,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz .. _`lxml 2.0.3`: lxml-2.0.3.tgz .. _`lxml 2.0.2`: lxml-2.0.2.tgz .. _`lxml 2.0.1`: lxml-2.0.1.tgz @@ -298,6 +299,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.1alpha1`: changes-2.1alpha1.html .. _`changes for 2.0.3`: changes-2.0.3.html .. _`changes for 2.0.2`: changes-2.0.2.html .. _`changes for 2.0.1`: changes-2.0.1.html From scoder at codespeak.net Thu Mar 27 17:42:25 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:25 +0100 (CET) Subject: [Lxml-checkins] r53005 - lxml/trunk Message-ID: <20080327164225.5A4CE169FA4@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:25 2008 New Revision: 53005 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3859 at delle: sbehnel | 2008-03-27 17:00:56 +0100 include Makefile Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu Mar 27 17:42:25 2008 @@ -2,7 +2,7 @@ include setup.py ez_setup.py setupinfo.py versioninfo.py include test.py selftest.py selftest2.py include update-error-constants.py -include MANIFEST.in version.txt +include MANIFEST.in Makefile version.txt include CHANGES.txt CREDITS.txt INSTALL.txt LICENSES.txt README.txt TODO.txt recursive-include src *.pyx *.pxd *.pxi *.py recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c From scoder at codespeak.net Thu Mar 27 17:42:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:28 +0100 (CET) Subject: [Lxml-checkins] r53006 - lxml/trunk Message-ID: <20080327164228.B9BE1169FA6@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:28 2008 New Revision: 53006 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3860 at delle: sbehnel | 2008-03-27 17:08:57 +0100 include lxml.html test data in dist Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu Mar 27 17:42:28 2008 @@ -8,6 +8,7 @@ recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd +recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png recursive-include fake_pyrex *.py From scoder at codespeak.net Thu Mar 27 17:42:32 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:32 +0100 (CET) Subject: [Lxml-checkins] r53007 - lxml/trunk Message-ID: <20080327164232.1C69A169FA6@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:31 2008 New Revision: 53007 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3861 at delle: sbehnel | 2008-03-27 17:13:11 +0100 include .txt test files in dist Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu Mar 27 17:42:31 2008 @@ -7,7 +7,7 @@ recursive-include src *.pyx *.pxd *.pxi *.py recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h -recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd +recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.txt recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png From scoder at codespeak.net Thu Mar 27 17:42:35 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:35 +0100 (CET) Subject: [Lxml-checkins] r53008 - in lxml/trunk: . src/lxml/tests Message-ID: <20080327164235.A8E46169F9E@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:35 2008 New Revision: 53008 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_objectify.py Log: r3862 at delle: sbehnel | 2008-03-27 17:13:40 +0100 Py2.3 fix Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Thu Mar 27 17:42:35 2008 @@ -1869,9 +1869,9 @@ return date.strftime("%Y%m%d%H%M%S") class DatetimeElement(objectify.ObjectifiedDataElement): - @property def pyval(self): return parse_date(self.text) + pyval = property(pyval) datetime_type = objectify.PyType( "datetime", parse_date, DatetimeElement, stringify_date) From scoder at codespeak.net Thu Mar 27 17:42:38 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:42:38 +0100 (CET) Subject: [Lxml-checkins] r53009 - lxml/trunk Message-ID: <20080327164238.DCAFD169FA1@codespeak.net> Author: scoder Date: Thu Mar 27 17:42:38 2008 New Revision: 53009 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3863 at delle: sbehnel | 2008-03-27 17:25:07 +0100 more missing files Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu Mar 27 17:42:38 2008 @@ -7,7 +7,7 @@ recursive-include src *.pyx *.pxd *.pxi *.py recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h -recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.txt +recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png From scoder at codespeak.net Thu Mar 27 17:43:14 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:43:14 +0100 (CET) Subject: [Lxml-checkins] r53010 - lxml/trunk Message-ID: <20080327164314.E78D3169F9E@codespeak.net> Author: scoder Date: Thu Mar 27 17:43:14 2008 New Revision: 53010 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3864 at delle: sbehnel | 2008-03-27 17:29:01 +0100 another MANIFEST fix Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu Mar 27 17:43:14 2008 @@ -5,7 +5,7 @@ include MANIFEST.in Makefile version.txt include CHANGES.txt CREDITS.txt INSTALL.txt LICENSES.txt README.txt TODO.txt recursive-include src *.pyx *.pxd *.pxi *.py -recursive-include src/lxml lxml.etree.c lxml.objectify.c lxml.pyclasslookup.c +recursive-include src/lxml lxml.etree.c lxml.objectify.c recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt recursive-include src/lxml/html/tests *.data *.txt From scoder at codespeak.net Thu Mar 27 17:43:18 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:43:18 +0100 (CET) Subject: [Lxml-checkins] r53011 - lxml/trunk Message-ID: <20080327164318.7A73A169F9F@codespeak.net> Author: scoder Date: Thu Mar 27 17:43:17 2008 New Revision: 53011 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r3865 at delle: sbehnel | 2008-03-27 17:30:03 +0100 another MANIFEST fix Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu Mar 27 17:43:17 2008 @@ -9,6 +9,7 @@ recursive-include src/lxml lxml.etree.h lxml.etree_api.h etree_defs.h recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt recursive-include src/lxml/html/tests *.data *.txt +recursive-include samples *.xml recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png recursive-include fake_pyrex *.py From scoder at codespeak.net Thu Mar 27 17:43:21 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 17:43:21 +0100 (CET) Subject: [Lxml-checkins] r53012 - lxml/trunk Message-ID: <20080327164321.E0A2A169FA0@codespeak.net> Author: scoder Date: Thu Mar 27 17:43:21 2008 New Revision: 53012 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3866 at delle: sbehnel | 2008-03-27 17:40:13 +0100 switch off dependency tracking for now Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Thu Mar 27 17:43:21 2008 @@ -83,7 +83,8 @@ if not CYTHON_INSTALLED: return [] from Cython.Compiler.Version import version - if split_version(version) <= (0,9,6,12): + # currently, no official Cython release supports this ... + if True or split_version(version) <= (0,9,6,12): return [] package_dir = os.path.join(get_base_dir(), PACKAGE_PATH) From scoder at codespeak.net Thu Mar 27 20:26:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 27 Mar 2008 20:26:28 +0100 (CET) Subject: [Lxml-checkins] r53016 - lxml/tag/lxml-2.1alpha1 Message-ID: <20080327192628.D76C4169F87@codespeak.net> Author: scoder Date: Thu Mar 27 20:26:27 2008 New Revision: 53016 Added: lxml/tag/lxml-2.1alpha1/ - copied from r53012, lxml/trunk/ Log: tag for 2.1alpha1 From scoder at codespeak.net Fri Mar 28 20:20:08 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 28 Mar 2008 20:20:08 +0100 (CET) Subject: [Lxml-checkins] r53061 - lxml/trunk Message-ID: <20080328192008.A0FFE169FCC@codespeak.net> Author: scoder Date: Fri Mar 28 20:20:06 2008 New Revision: 53061 Modified: lxml/trunk/ (props changed) lxml/trunk/TODO.txt Log: r3896 at delle: sbehnel | 2008-03-28 09:18:03 +0100 todo Modified: lxml/trunk/TODO.txt ============================================================================== --- lxml/trunk/TODO.txt (original) +++ lxml/trunk/TODO.txt Fri Mar 28 20:20:06 2008 @@ -23,6 +23,9 @@ * follow PEP 8 in API naming (avoidCamelCase in_favour_of_underscores) +* use per-call or per-thread error logs in XSLT/XPath/etc. to keep the + messages separate, especially in exceptions + QName ----- From scoder at codespeak.net Fri Mar 28 20:20:15 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 28 Mar 2008 20:20:15 +0100 (CET) Subject: [Lxml-checkins] r53062 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080328192015.C8CF4169FE6@codespeak.net> Author: scoder Date: Fri Mar 28 20:20:15 2008 New Revision: 53062 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_etree.py Log: r3897 at delle: sbehnel | 2008-03-28 16:57:01 +0100 reject non well-formed namespace prefixes Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Mar 28 20:20:15 2008 @@ -2,6 +2,21 @@ lxml changelog ============== +Under development +================= + +Features added +-------------- + +Bugs fixed +---------- + +* lxml.etree accepted non well-formed namespace prefix names. + +Other changes +------------- + + 2.1alpha1 (2008-03-27) ====================== Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Mar 28 20:20:15 2008 @@ -1088,6 +1088,12 @@ python.PyUnicode_FromEncodedObject(name_utf, 'UTF-8', 'strict')) return 0 +cdef int _prefixValidOrRaise(tag_utf) except -1: + if not _pyXmlNameIsValid(tag_utf): + raise ValueError("Invalid namespace prefix %r" % \ + python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict')) + return 0 + cdef object _namespacedName(xmlNode* c_node): return _namespacedNameFromNsName(_getNs(c_node), c_node.name) Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Mar 28 20:20:15 2008 @@ -409,8 +409,9 @@ for prefix, href in nsmap.items(): href_utf = _utf8(href) c_href = _cstr(href_utf) - if prefix is not None and prefix: + if prefix is not None: prefix_utf = _utf8(prefix) + _prefixValidOrRaise(prefix_utf) c_prefix = _cstr(prefix_utf) else: c_prefix = NULL Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Fri Mar 28 20:20:15 2008 @@ -162,6 +162,15 @@ self.assertEquals("p:a", a.text) + def test_nsmap_prefix_invalid(self): + etree = self.etree + self.assertRaises(ValueError, + etree.Element, "root", nsmap={'' : 'testns'}) + self.assertRaises(ValueError, + etree.Element, "root", nsmap={'"' : 'testns'}) + self.assertRaises(ValueError, + etree.Element, "root", nsmap={'&' : 'testns'}) + def test_attribute_set(self): Element = self.etree.Element root = Element("root") From scoder at codespeak.net Fri Mar 28 20:20:19 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 28 Mar 2008 20:20:19 +0100 (CET) Subject: [Lxml-checkins] r53063 - lxml/trunk Message-ID: <20080328192019.BD00E169FEA@codespeak.net> Author: scoder Date: Fri Mar 28 20:20:19 2008 New Revision: 53063 Modified: lxml/trunk/ (props changed) lxml/trunk/setup.py Log: r3898 at delle: sbehnel | 2008-03-28 16:57:37 +0100 cleanup Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Fri Mar 28 20:20:19 2008 @@ -102,6 +102,6 @@ package_dir = {'': 'src'}, packages = ['lxml', 'lxml.html'], ext_modules = setupinfo.ext_modules( - STATIC_INCLUDE_DIRS, STATIC_LIBRARY_DIRS, STATIC_CFLAGS), + STATIC_INCLUDE_DIRS, STATIC_LIBRARY_DIRS, STATIC_CFLAGS), **extra_options ) From scoder at codespeak.net Fri Mar 28 20:20:23 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 28 Mar 2008 20:20:23 +0100 (CET) Subject: [Lxml-checkins] r53064 - in lxml/trunk: . src/lxml Message-ID: <20080328192023.A8A39169FEC@codespeak.net> Author: scoder Date: Fri Mar 28 20:20:23 2008 New Revision: 53064 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r3899 at delle: sbehnel | 2008-03-28 16:57:41 +0100 cleanup Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Mar 28 20:20:23 2008 @@ -107,8 +107,9 @@ def __init__(self, message, error_log=None): _initError(self, message) if error_log is None: - error_log = __copyGlobalErrorLog() - self.error_log = error_log.copy() + self.error_log = __copyGlobalErrorLog() + else: + self.error_log = error_log.copy() cdef object _LxmlError _LxmlError = LxmlError From scoder at codespeak.net Fri Mar 28 20:20:29 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 28 Mar 2008 20:20:29 +0100 (CET) Subject: [Lxml-checkins] r53065 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080328192029.3EC5F169FCC@codespeak.net> Author: scoder Date: Fri Mar 28 20:20:28 2008 New Revision: 53065 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_etree.py Log: r3900 at delle: sbehnel | 2008-03-28 17:42:35 +0100 cleanup node namespace initialisation, re-allow empty '' namespace prefix Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Mar 28 20:20:28 2008 @@ -98,28 +98,28 @@ If 'c_doc' is also NULL, a new xmlDoc will be created. """ cdef xmlNode* c_node + if doc is not None: + c_doc = doc._c_doc ns_utf, name_utf = _getNsTag(tag) if parser is not None and parser._for_html: _htmlTagValidOrRaise(name_utf) else: _tagValidOrRaise(name_utf) - if doc is not None: - c_doc = doc._c_doc - elif c_doc is NULL: + if c_doc is NULL: c_doc = _newDoc() c_node = _createElement(c_doc, name_utf) if c_node is NULL: return python.PyErr_NoMemory() try: + if doc is None: + tree.xmlDocSetRootElement(c_doc, c_node) + doc = _documentFactory(c_doc, parser) if text is not None: _setNodeText(c_node, text) if tail is not None: _setTailText(c_node, tail) - if doc is None: - tree.xmlDocSetRootElement(c_doc, c_node) - doc = _documentFactory(c_doc, parser) # add namespaces to node if necessary - doc._setNodeNamespaces(c_node, ns_utf, nsmap) + _initNodeNamespaces(c_node, doc, ns_utf, nsmap) _initNodeAttributes(c_node, doc, attrib, extra_attrs) return _elementFactory(doc, c_node) except: @@ -162,10 +162,46 @@ _setTailText(c_node, tail) # add namespaces to node if necessary - parent._doc._setNodeNamespaces(c_node, ns_utf, nsmap) + _initNodeNamespaces(c_node, parent._doc, ns_utf, nsmap) _initNodeAttributes(c_node, parent._doc, attrib, extra_attrs) return _elementFactory(parent._doc, c_node) +cdef int _initNodeNamespaces(xmlNode* c_node, _Document doc, + object node_ns_utf, object nsmap) except -1: + """Lookup current namespace prefixes, then set namespace structure for + node and register new ns-prefix mappings. + + This only works for a newly created node! + """ + cdef xmlNs* c_ns + cdef char* c_prefix + cdef char* c_href + if not nsmap: + if node_ns_utf is not None: + doc._setNodeNs(c_node, _cstr(node_ns_utf)) + return 0 + + for prefix, href in nsmap.items(): + href_utf = _utf8(href) + c_href = _cstr(href_utf) + if prefix is not None and prefix: + prefix_utf = _utf8(prefix) + _prefixValidOrRaise(prefix_utf) + c_prefix = _cstr(prefix_utf) + else: + c_prefix = NULL + # add namespace with prefix if ns is not already known + c_ns = tree.xmlSearchNsByHref(doc._c_doc, c_node, c_href) + if c_ns is NULL: + c_ns = tree.xmlNewNs(c_node, c_href, c_prefix) + if href_utf == node_ns_utf: + tree.xmlSetNs(c_node, c_ns) + node_ns_utf = None + + if node_ns_utf is not None: + doc._setNodeNs(c_node, _cstr(node_ns_utf)) + return 0 + cdef _initNodeAttributes(xmlNode* c_node, _Document doc, attrib, extra): """Initialise the attributes of an element node. """ Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Mar 28 20:20:28 2008 @@ -390,44 +390,6 @@ c_ns = self._findOrBuildNodeNs(c_node, href, NULL) tree.xmlSetNs(c_node, c_ns) - cdef int _setNodeNamespaces(self, xmlNode* c_node, - object node_ns_utf, object nsmap) except -1: - """Lookup current namespace prefixes, then set namespace structure for - node and register new ns-prefix mappings. - - This only works for a newly created node! - """ - cdef xmlNs* c_ns - cdef xmlDoc* c_doc - cdef char* c_prefix - cdef char* c_href - if not nsmap: - if node_ns_utf is not None: - self._setNodeNs(c_node, _cstr(node_ns_utf)) - return 0 - - c_doc = self._c_doc - for prefix, href in nsmap.items(): - href_utf = _utf8(href) - c_href = _cstr(href_utf) - if prefix is not None: - prefix_utf = _utf8(prefix) - _prefixValidOrRaise(prefix_utf) - c_prefix = _cstr(prefix_utf) - else: - c_prefix = NULL - # add namespace with prefix if ns is not already known - c_ns = tree.xmlSearchNsByHref(c_doc, c_node, c_href) - if c_ns is NULL: - c_ns = tree.xmlNewNs(c_node, c_href, c_prefix) - if href_utf == node_ns_utf: - tree.xmlSetNs(c_node, c_ns) - node_ns_utf = None - - if node_ns_utf is not None: - self._setNodeNs(c_node, _cstr(node_ns_utf)) - return 0 - cdef __initPrefixCache(): cdef int i return tuple([ python.PyString_FromFormat("ns%d", i) Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Fri Mar 28 20:20:28 2008 @@ -165,11 +165,11 @@ def test_nsmap_prefix_invalid(self): etree = self.etree self.assertRaises(ValueError, - etree.Element, "root", nsmap={'' : 'testns'}) - self.assertRaises(ValueError, etree.Element, "root", nsmap={'"' : 'testns'}) self.assertRaises(ValueError, etree.Element, "root", nsmap={'&' : 'testns'}) + self.assertRaises(ValueError, + etree.Element, "root", nsmap={'a:b' : 'testns'}) def test_attribute_set(self): Element = self.etree.Element From scoder at codespeak.net Fri Mar 28 20:20:36 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 28 Mar 2008 20:20:36 +0100 (CET) Subject: [Lxml-checkins] r53066 - in lxml/trunk: . doc src/lxml Message-ID: <20080328192036.80E0C169FE6@codespeak.net> Author: scoder Date: Fri Mar 28 20:20:35 2008 New Revision: 53066 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/elementsoup.txt lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/tree.pxd lxml/trunk/src/lxml/xslt.pxi Log: r3901 at delle: sbehnel | 2008-03-28 18:11:29 +0100 create HTML documents instead of XML docs from html_parser.makeelement(), directly impacts lxml.html Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Mar 28 20:20:35 2008 @@ -16,6 +16,11 @@ Other changes ------------- +* New Elements created through the ``makeelement()`` method of an HTML + parser or through lxml.html now end up in a new HTML document + instead of a generic XML document. This mostly impacts the default + serialisation. + 2.1alpha1 (2008-03-27) ====================== Modified: lxml/trunk/doc/elementsoup.txt ============================================================================== --- lxml/trunk/doc/elementsoup.txt (original) +++ lxml/trunk/doc/elementsoup.txt Fri Mar 28 20:20:35 2008 @@ -3,11 +3,12 @@ ==================== BeautifulSoup_ is a Python package that parses broken HTML. While libxml2 -(and thus lxml) can also parse broken HTML, BeautifulSoup is much more +(and thus lxml) can also parse broken HTML, BeautifulSoup is somewhat more forgiving and has superiour `support for encoding detection`_. .. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/ .. _`support for encoding detection`: http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit +.. _ElementSoup: http://effbot.org/zone/element-soup.htm lxml can benefit from the parsing capabilities of BeautifulSoup through the ``lxml.html.soupparser`` module. It provides three main @@ -19,6 +20,10 @@ ElementTree. The first returns a root Element, the latter returns an ElementTree. +There is also a legacy module called ``lxml.html.ElementSoup``, which +mimics the interface provided by ElementTree's own ElementSoup_ +module. + Here is a document full of tag soup, similar to, but not quite like, HTML: .. sourcecode:: pycon @@ -63,36 +68,29 @@ >>> body.text u'\xa9\u20ac-\xf5\u01bd' -If you want them back on the way out, you can serialise with the -'html' method, which will always use escaping for safety reasons: +If you want them back on the way out, you can just serialise with the +default encoding, which is 'US-ASCII'. The 'html' method .. sourcecode:: pycon - >>> tostring(body, method="html") - '<body>©€-õƽ<p></p></body>' - - >>> tostring(body, method="html", encoding="utf-8") - '<body>©€-õƽ<p></p></body>' + >>> tostring(body) + '<body>©€-õƽ<p/></body>' - >>> tostring(body, method="html", encoding=unicode) - u'<body>©€-õƽ<p></p></body>' + >>> tostring(body, method="html") + '<body>©€-õƽ<p></p></body>' -Otherwise, when serialising to XML, only the plain ASCII encoding will -escape non-ASCII characters: +Any other encoding will output the respective byte sequences. .. sourcecode:: pycon - >>> tostring(body) - '<body>©€-õƽ<p/></body>' - >>> tostring(body, encoding="utf-8") '<body>\xc2\xa9\xe2\x82\xac-\xc3\xb5\xc6\xbd<p/></body>' + >>> tostring(body, method="html", encoding="utf-8") + '<body>\xc2\xa9\xe2\x82\xac-\xc3\xb5\xc6\xbd<p></p></body>' + >>> tostring(body, encoding=unicode) u'<body>\xa9\u20ac-\xf5\u01bd<p/></body>' -There is also a legacy module called ``lxml.html.ElementSoup``, which -mimics the interface provided by ElementTree's own ElementSoup_ -module. - -.. _ElementSoup: http://effbot.org/zone/element-soup.htm + >>> tostring(body, method="html", encoding=unicode) + u'<body>\xa9\u20ac-\xf5\u01bd<p></p></body>' Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Mar 28 20:20:35 2008 @@ -103,12 +103,16 @@ ns_utf, name_utf = _getNsTag(tag) if parser is not None and parser._for_html: _htmlTagValidOrRaise(name_utf) + if c_doc is NULL: + c_doc = _newHTMLDoc() else: _tagValidOrRaise(name_utf) - if c_doc is NULL: - c_doc = _newDoc() + if c_doc is NULL: + c_doc = _newXMLDoc() c_node = _createElement(c_doc, name_utf) if c_node is NULL: + if doc is None and c_doc is not NULL: + tree.xmlFreeDoc(c_doc) return python.PyErr_NoMemory() try: if doc is None: Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Mar 28 20:20:35 2008 @@ -2236,7 +2236,7 @@ text = '' else: text = _utf8(text) - c_doc = _newDoc() + c_doc = _newXMLDoc() doc = _documentFactory(c_doc, None) c_node = _createComment(c_doc, _cstr(text)) tree.xmlAddChild(<xmlNode*>c_doc, c_node) @@ -2256,7 +2256,7 @@ text = '' else: text = _utf8(text) - c_doc = _newDoc() + c_doc = _newXMLDoc() doc = _documentFactory(c_doc, None) c_node = _createPI(c_doc, _cstr(target), _cstr(text)) tree.xmlAddChild(<xmlNode*>c_doc, c_node) @@ -2284,7 +2284,7 @@ raise ValueError("Invalid character reference: '%s'" % name) elif not _xmlNameIsValid(c_name): raise ValueError("Invalid entity reference: '%s'" % name) - c_doc = _newDoc() + c_doc = _newXMLDoc() doc = _documentFactory(c_doc, None) c_node = _createEntity(c_doc, c_name) tree.xmlAddChild(<xmlNode*>c_doc, c_node) @@ -2319,7 +2319,7 @@ except _TargetParserResult, result_container: return result_container.result else: - c_doc = _newDoc() + c_doc = _newXMLDoc() doc = _documentFactory(c_doc, parser) return _elementTreeFactory(doc, element) Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Fri Mar 28 20:20:35 2008 @@ -1246,9 +1246,17 @@ parser = __GLOBAL_PARSER_CONTEXT.getDefaultParser() return (<_BaseParser>parser)._parseDocFromFilelike(source, filename) -cdef xmlDoc* _newDoc() except NULL: +cdef xmlDoc* _newXMLDoc() except NULL: cdef xmlDoc* result - result = tree.xmlNewDoc("1.0") + result = tree.xmlNewDoc(NULL) + if result is NULL: + python.PyErr_NoMemory() + __GLOBAL_PARSER_CONTEXT.initDocDict(result) + return result + +cdef xmlDoc* _newHTMLDoc() except NULL: + cdef xmlDoc* result + result = tree.htmlNewDoc(NULL, NULL) if result is NULL: python.PyErr_NoMemory() __GLOBAL_PARSER_CONTEXT.initDocDict(result) Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Fri Mar 28 20:20:35 2008 @@ -245,6 +245,7 @@ cdef void htmlNodeDumpFormatOutput(xmlOutputBuffer* buf, xmlDoc* doc, xmlNode* cur, char* encoding, int format) nogil + cdef xmlDoc* htmlNewDoc(char* uri, char* externalID) nogil cdef extern from "libxml/valid.h": cdef xmlAttr* xmlGetID(xmlDoc* doc, char* ID) nogil Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Fri Mar 28 20:20:35 2008 @@ -105,7 +105,7 @@ c_doc = _parseDocFromFilelike( doc_ref._file, doc_ref._filename, context._parser) elif doc_ref._type == PARSER_DATA_EMPTY: - c_doc = _newDoc() + c_doc = _newXMLDoc() if c_doc is not NULL and c_doc.URL is NULL: c_doc.URL = tree.xmlStrdup(c_uri) return c_doc From scoder at codespeak.net Sat Mar 29 11:33:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 29 Mar 2008 11:33:09 +0100 (CET) Subject: [Lxml-checkins] r53080 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080329103309.0C3D0169F94@codespeak.net> Author: scoder Date: Sat Mar 29 11:33:08 2008 New Revision: 53080 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_etree.py Log: r3908 at delle: sbehnel | 2008-03-29 11:31:42 +0100 allow QName objects in ElementTree.find*() Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Sat Mar 29 11:33:08 2008 @@ -1657,7 +1657,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.find(path) @@ -1669,7 +1669,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.findtext(path, default) @@ -1681,7 +1681,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.findall(path) @@ -1693,7 +1693,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.iterfind(path) Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Sat Mar 29 11:33:08 2008 @@ -1479,6 +1479,29 @@ [a, b, c], list(a.getiterator('*'))) + def test_elementtree_find_qname(self): + XML = self.etree.XML + ElementTree = self.etree.ElementTree + QName = self.etree.QName + tree = ElementTree(XML('<a><b><c/></b><b/><c><b/></c></a>')) + self.assertEquals(tree.find(QName("c")), tree.getroot()[2]) + + def test_elementtree_findall_qname(self): + XML = self.etree.XML + ElementTree = self.etree.ElementTree + QName = self.etree.QName + tree = ElementTree(XML('<a><b><c/></b><b/><c><b/></c></a>')) + self.assertEquals(len(list(tree.findall(QName("c")))), 1) + + def test_elementtree_findall_ns_qname(self): + XML = self.etree.XML + ElementTree = self.etree.ElementTree + QName = self.etree.QName + tree = ElementTree(XML( + '<a xmlns:x="X" xmlns:y="Y"><x:b><c/></x:b><b/><c><x:b/><b/></c><b/></a>')) + self.assertEquals(len(list(tree.findall(QName("b")))), 2) + self.assertEquals(len(list(tree.findall(QName("X", "b")))), 1) + def test_findall_ns(self): XML = self.etree.XML root = XML('<a xmlns:x="X" xmlns:y="Y"><x:b><c/></x:b><b/><c><x:b/><b/></c><b/></a>') From scoder at codespeak.net Sat Mar 29 11:33:13 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 29 Mar 2008 11:33:13 +0100 (CET) Subject: [Lxml-checkins] r53081 - lxml/trunk Message-ID: <20080329103313.37408169FA6@codespeak.net> Author: scoder Date: Sat Mar 29 11:33:12 2008 New Revision: 53081 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3909 at delle: sbehnel | 2008-03-29 11:31:53 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Mar 29 11:33:12 2008 @@ -11,6 +11,8 @@ Bugs fixed ---------- +* ``ElementTree.find*()`` didn't accept QName objects. + * lxml.etree accepted non well-formed namespace prefix names. Other changes @@ -18,8 +20,9 @@ * New Elements created through the ``makeelement()`` method of an HTML parser or through lxml.html now end up in a new HTML document - instead of a generic XML document. This mostly impacts the default - serialisation. + (doctype HTML 4.01 Transitional) instead of a generic XML document. + This mostly impacts the serialisation and the availability of a DTD + context. 2.1alpha1 (2008-03-27) From scoder at codespeak.net Sat Mar 29 11:40:14 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 29 Mar 2008 11:40:14 +0100 (CET) Subject: [Lxml-checkins] r53083 - in lxml/branch/lxml-2.0: . src/lxml src/lxml/tests Message-ID: <20080329104014.25817169F94@codespeak.net> Author: scoder Date: Sat Mar 29 11:40:13 2008 New Revision: 53083 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx lxml/branch/lxml-2.0/src/lxml/tests/test_etree.py Log: merge of trunk rev 53080: fix for passing QName objects to ET.find*() Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Sat Mar 29 11:40:13 2008 @@ -2,6 +2,21 @@ lxml changelog ============== +Under development +================= + +Features added +-------------- + +Bugs fixed +---------- + +* ``ElementTree.find*()`` didn't accept QName objects. + +Other changes +------------- + + 2.0.3 (2008-03-26) ================== Modified: lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx (original) +++ lxml/branch/lxml-2.0/src/lxml/lxml.etree.pyx Sat Mar 29 11:40:13 2008 @@ -1643,7 +1643,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.find(path) @@ -1655,7 +1655,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.findtext(path, default) @@ -1667,7 +1667,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.findall(path) @@ -1679,7 +1679,7 @@ """ self._assertHasRoot() root = self.getroot() - if path[:1] == "/": + if _isString(path) and path[:1] == "/": path = "." + path return root.iterfind(path) Modified: lxml/branch/lxml-2.0/src/lxml/tests/test_etree.py ============================================================================== --- lxml/branch/lxml-2.0/src/lxml/tests/test_etree.py (original) +++ lxml/branch/lxml-2.0/src/lxml/tests/test_etree.py Sat Mar 29 11:40:13 2008 @@ -1498,6 +1498,29 @@ self.assertEquals(["CTEXT"], text) + def test_elementtree_find_qname(self): + XML = self.etree.XML + ElementTree = self.etree.ElementTree + QName = self.etree.QName + tree = ElementTree(XML('<a><b><c/></b><b/><c><b/></c></a>')) + self.assertEquals(tree.find(QName("c")), tree.getroot()[2]) + + def test_elementtree_findall_qname(self): + XML = self.etree.XML + ElementTree = self.etree.ElementTree + QName = self.etree.QName + tree = ElementTree(XML('<a><b><c/></b><b/><c><b/></c></a>')) + self.assertEquals(len(list(tree.findall(QName("c")))), 1) + + def test_elementtree_findall_ns_qname(self): + XML = self.etree.XML + ElementTree = self.etree.ElementTree + QName = self.etree.QName + tree = ElementTree(XML( + '<a xmlns:x="X" xmlns:y="Y"><x:b><c/></x:b><b/><c><x:b/><b/></c><b/></a>')) + self.assertEquals(len(list(tree.findall(QName("b")))), 2) + self.assertEquals(len(list(tree.findall(QName("X", "b")))), 1) + def test_findall_ns(self): XML = self.etree.XML root = XML('<a xmlns:x="X" xmlns:y="Y"><x:b><c/></x:b><b/><c><x:b/><b/></c><b/></a>')