From scoder at codespeak.net Wed Aug 1 09:22:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 1 Aug 2007 09:22:29 +0200 (CEST) Subject: [Lxml-checkins] r45445 - lxml/trunk/doc Message-ID: <20070801072229.C7B85807F@code0.codespeak.net> Author: scoder Date: Wed Aug 1 09:22:28 2007 New Revision: 45445 Modified: lxml/trunk/doc/main.txt Log: cheeseshop -> pypi Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Wed Aug 1 09:22:28 2007 @@ -129,10 +129,10 @@ -------- The best way to download binary versions is to visit `lxml at the Python -cheeseshop`_. It has the source, eggs and installers for various platforms. +Package Index`_. It has the source, eggs and installers for various platforms. The source distribution is signed with `this key`_. -.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/ +.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc The latest version is `lxml 1.3.2`_, released 2007-07-03 (`changes for 1.3.2`_). From scoder at codespeak.net Wed Aug 1 09:23:21 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 1 Aug 2007 09:23:21 +0200 (CEST) Subject: [Lxml-checkins] r45446 - lxml/branch/lxml-1.3/doc Message-ID: <20070801072321.B27F580DA@code0.codespeak.net> Author: scoder Date: Wed Aug 1 09:23:20 2007 New Revision: 45446 Modified: lxml/branch/lxml-1.3/doc/main.txt Log: cheeseshop -> pypi Modified: lxml/branch/lxml-1.3/doc/main.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/main.txt (original) +++ lxml/branch/lxml-1.3/doc/main.txt Wed Aug 1 09:23:20 2007 @@ -124,10 +124,10 @@ -------- The best way to download binary versions is to visit `lxml at the Python -cheeseshop`_. It has the source, eggs and installers for various platforms. +Package Index`_. It has the source, eggs and installers for various platforms. The source distribution is signed with `this key`_. -.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/ +.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc The latest version is `lxml 1.3.3`_, released 2007-07-26 (`changes for 1.3.3`_). From scoder at codespeak.net Fri Aug 3 00:10:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 3 Aug 2007 00:10:29 +0200 (CEST) Subject: [Lxml-checkins] r45470 - lxml/trunk/doc Message-ID: <20070802221029.60BCD8103@code0.codespeak.net> Author: scoder Date: Fri Aug 3 00:10:28 2007 New Revision: 45470 Modified: lxml/trunk/doc/build.txt Log: Cython instead of Pyrex Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Fri Aug 3 00:10:28 2007 @@ -11,7 +11,7 @@ .. contents:: .. - 1 Pyrex + 1 Cython 2 Subversion 3 Setuptools 4 Running the tests and reporting errors @@ -20,53 +20,22 @@ 7 Building Debian packages from SVN sources -Pyrex ------ +Cython +------ -The lxml.etree and lxml.objectify modules are written in Pyrex_. Since we -distribute the Pyrex-generated .c files with lxml releases, however, you do -not need Pyrex to build lxml from the normal release sources. +The lxml.etree and lxml.objectify modules are written in Cython_. Since we +distribute the Cython-generated .c files with lxml releases, however, you do +not need Cython to build lxml from the normal release sources. -.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ +.. _Cython: http://www.cython.org If you are interested in building lxml from a Subversion checkout or want to -be an lxml developer, you do need a working Pyrex installation. +be an lxml developer, you do need a working Cython installation. You can use +EasyInstall_ to install it:: -* lxml 1.1 and later + easy_install Cython - Newer versions of lxml depend on features and bug fixes that are not yet - available in an official Pyrex release. This includes support for the - external C-API of lxml.etree, for Python 2.5 and for 64 bit architectures. - - To build lxml 1.1 and later from non-release or modified sources, you must - therefore use an updated Pyrex version from here: - - http://codespeak.net/svn/lxml/pyrex/ - - A subversion checkout of lxml will automatically retrieve the latest Pyrex - as external project source (``svn:externals``). Look for the ``Pyrex`` - directory in the source tree. - - Since version 1.1.2, the lxml source distribution also includes this Pyrex - version. It will be used if the ``Pyrex`` directory is available in the - lxml root directory. If you install from SVN or delete this directory from - the unpacked distribution directory, the normally installed Pyrex version - will be used. - -* lxml 1.0 and earlier - - The 1.0 series build with a standard installation of Pyrex 0.9.4.1. Note - that Pyrex up to and including version 0.9.4 has known problems when - compiling lxml with gcc 4.x or Python 2.4. Do not use it. If you want to - build lxml from non-release sources, please install Pyrex version 0.9.4.1 or - later. - - Pyrex now supports EasyInstall_, so you can install it by running the - following command as super-user:: - - easy_install Pyrex - - .. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall +.. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall Subversion @@ -167,7 +136,8 @@ This is the procedure to make an lxml egg for your platform: * Download the lxml-x.y.tar.gz release. This contains the pregenerated C so - that you don't run into any Pyrex issues. Unpack it and cd into it. + that you can be sure you build exactly from the release sources. Unpack + them and cd into the resulting directory. * python setup.py build From lxml-checkins at codespeak.net Fri Aug 3 13:43:42 2007 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 3 Aug 2007 13:43:42 +0200 (CEST) Subject: [Lxml-checkins] Save an extra 25-50% on men's clearance! Notification-id : 6901594117 Message-ID: <20070803064450.6590.qmail@spb-62-141-121-154.sovintel.spb.ru> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20070803/5bb9f2e6/attachment.htm From scoder at codespeak.net Sat Aug 11 19:13:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 11 Aug 2007 19:13:17 +0200 (CEST) Subject: [Lxml-checkins] r45603 - lxml/trunk Message-ID: <20070811171317.6DE2A81B6@code0.codespeak.net> Author: scoder Date: Sat Aug 11 19:13:15 2007 New Revision: 45603 Modified: lxml/trunk/setupinfo.py Log: cleanup Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Sat Aug 11 19:13:15 2007 @@ -22,7 +22,6 @@ ("pyclasslookup", "lxml.pyclasslookup") ] - def env_var(name): value = os.getenv(name, '') return value.split(os.pathsep) From scoder at codespeak.net Mon Aug 13 14:53:19 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 14:53:19 +0200 (CEST) Subject: [Lxml-checkins] r45622 - in lxml/trunk/src/lxml: . tests Message-ID: <20070813125319.ACAD18185@code0.codespeak.net> Author: scoder Date: Mon Aug 13 14:53:17 2007 New Revision: 45622 Modified: lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/tree.pxd Log: let DTDs that get parsed in also go out if serialising an ElementTree Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Mon Aug 13 14:53:17 2007 @@ -267,22 +267,22 @@ return _elementFactory(self, c_node) cdef getdoctype(self): - cdef tree.xmlDtd* dtd + cdef tree.xmlDtd* c_dtd cdef xmlNode* c_root_node public_id = None sys_url = None - dtd = self._c_doc.intSubset - if dtd is not NULL: - if dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) - dtd = self._c_doc.extSubset - if dtd is not NULL: - if not public_id and dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if not sys_url and dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) + c_dtd = self._c_doc.intSubset + if c_dtd is not NULL: + if c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) + c_dtd = self._c_doc.extSubset + if c_dtd is not NULL: + if not public_id and c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if not sys_url and c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) c_root_node = tree.xmlDocGetRootElement(self._c_doc) if c_root_node is NULL: root_name = None @@ -1329,7 +1329,7 @@ c_write_declaration = encoding not in \ ('US-ASCII', 'ASCII', 'UTF8', 'UTF-8') _tofilelike(file, self._context_node, encoding, - c_write_declaration, bool(pretty_print)) + c_write_declaration, 1, bool(pretty_print)) def getpath(self, _Element element not None): """Returns a structural, absolute XPath expression to find that element. @@ -2061,10 +2061,10 @@ if isinstance(element_or_tree, _Element): return _tostring(<_Element>element_or_tree, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tostring((<_ElementTree>element_or_tree)._context_node, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) @@ -2081,10 +2081,10 @@ cdef int c_pretty_print c_pretty_print = bool(pretty_print) if isinstance(element_or_tree, _Element): - return _tounicode(<_Element>element_or_tree, c_pretty_print) + return _tounicode(<_Element>element_or_tree, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tounicode((<_ElementTree>element_or_tree)._context_node, - c_pretty_print) + 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Mon Aug 13 14:53:17 2007 @@ -1,7 +1,7 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +29,8 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, + pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -43,7 +44,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int pretty_print): +cdef _tounicode(_Element element, int write_doctype, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -55,7 +56,8 @@ raise LxmlError, "Failed to create output buffer" try: state = python.PyEval_SaveThread() - _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, pretty_print) + _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, + write_doctype, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -72,12 +74,15 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) + if write_doctype: + _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) @@ -93,6 +98,41 @@ tree.xmlOutputBufferWriteString(c_buffer, encoding) tree.xmlOutputBufferWriteString(c_buffer, "'?>\n") +cdef void _writeDtdToBuffer(tree.xmlOutputBuffer* c_buffer, + xmlDoc* c_doc, char* c_root_name, char* encoding): + cdef tree.xmlDtd* c_dtd + cdef xmlNode* c_node + c_dtd = c_doc.intSubset + if c_dtd == NULL or c_dtd.name == NULL: + return + if c_dtd.ExternalID == NULL and c_dtd.SystemID == NULL: + return + if cstd.strcmp(c_root_name, c_dtd.name) != 0: + return + tree.xmlOutputBufferWrite(c_buffer, 10, "\n') + return + tree.xmlOutputBufferWrite(c_buffer, 4, '" [\n') + if c_dtd.notations != NULL: + tree.xmlDumpNotationTable(c_buffer.buffer, + c_dtd.notations) + c_node = c_dtd.children + while c_node is not NULL: + tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, 0, encoding) + c_node = c_node.next + tree.xmlOutputBufferWrite(c_buffer, 3, "]>\n") + cdef void _writeTail(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, int pretty_print): "Write the element tail." @@ -179,7 +219,8 @@ return (<_FilelikeWriter>ctxt).close() cdef _tofilelike(f, _Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef python.PyThreadState* state cdef _FilelikeWriter writer cdef tree.xmlOutputBuffer* c_buffer @@ -209,7 +250,7 @@ raise TypeError, "File or filename expected, got '%s'" % type(f) _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, pretty_print) tree.xmlOutputBufferClose(c_buffer) tree.xmlCharEncCloseFunc(enchandler) if writer is None: Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Mon Aug 13 14:53:17 2007 @@ -1638,6 +1638,20 @@ self.assertEquals(docinfo.system_url, None) self.assertEquals(docinfo.root_name, 'html') self.assertEquals(docinfo.doctype, '') + + def test_dtd_io(self): + # check that DTDs that go in also go back out + xml = '''\ + + + + ]> + test-test\ + ''' + root = self.etree.parse(StringIO(xml)) + self.assertEqual(self.etree.tostring(root).replace(" ", ""), + xml.replace(" ", "")) def test_byte_zero(self): Element = self.etree.Element Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Mon Aug 13 14:53:17 2007 @@ -58,7 +58,8 @@ ctypedef struct xmlDoc ctypedef struct xmlAttr - + ctypedef struct xmlNotationTable + ctypedef enum xmlElementType: XML_ELEMENT_NODE= 1 XML_ATTRIBUTE_NODE= 2 @@ -103,8 +104,16 @@ unsigned short line ctypedef struct xmlDtd: + char* name char* ExternalID char* SystemID + void* notations + void* entities + void* pentities + void* attributes + void* elements + xmlNode* children + xmlDoc* doc ctypedef struct xmlDoc: xmlElementType type @@ -152,7 +161,7 @@ xmlDoc* doc ctypedef struct xmlBuffer - + ctypedef struct xmlOutputBuffer: xmlBuffer* buffer xmlBuffer* conv @@ -226,9 +235,12 @@ cdef extern from "libxml/valid.h": cdef xmlAttr* xmlGetID(xmlDoc* doc, char* ID) + cdef void xmlDumpNotationTable(xmlBuffer* buffer, xmlNotationTable* table) cdef extern from "libxml/xmlIO.h": + cdef void xmlBufferWriteQuotedString(xmlOutputBuffer* out, char* str) cdef int xmlOutputBufferWriteString(xmlOutputBuffer* out, char* str) + cdef int xmlOutputBufferWrite(xmlOutputBuffer* out, int len, char* str) cdef int xmlOutputBufferFlush(xmlOutputBuffer* out) cdef int xmlOutputBufferClose(xmlOutputBuffer* out) From scoder at codespeak.net Mon Aug 13 15:11:28 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 15:11:28 +0200 (CEST) Subject: [Lxml-checkins] r45623 - in lxml/branch/lxml-1.3/src/lxml: . tests Message-ID: <20070813131128.07A278185@code0.codespeak.net> Author: scoder Date: Mon Aug 13 15:11:28 2007 New Revision: 45623 Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx lxml/branch/lxml-1.3/src/lxml/serializer.pxi lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py lxml/branch/lxml-1.3/src/lxml/tree.pxd Log: trunk merge: let DTDs that get parsed in also go out if serialising an ElementTree Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/etree.pyx (original) +++ lxml/branch/lxml-1.3/src/lxml/etree.pyx Mon Aug 13 15:11:28 2007 @@ -254,22 +254,22 @@ return _elementFactory(self, c_node) cdef getdoctype(self): - cdef tree.xmlDtd* dtd + cdef tree.xmlDtd* c_dtd cdef xmlNode* c_root_node public_id = None sys_url = None - dtd = self._c_doc.intSubset - if dtd is not NULL: - if dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) - dtd = self._c_doc.extSubset - if dtd is not NULL: - if not public_id and dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if not sys_url and dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) + c_dtd = self._c_doc.intSubset + if c_dtd is not NULL: + if c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) + c_dtd = self._c_doc.extSubset + if c_dtd is not NULL: + if not public_id and c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if not sys_url and c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) c_root_node = tree.xmlDocGetRootElement(self._c_doc) if c_root_node is NULL: root_name = None @@ -1278,7 +1278,7 @@ c_write_declaration = encoding not in \ ('US-ASCII', 'ASCII', 'UTF8', 'UTF-8') _tofilelike(file, self._context_node, encoding, - c_write_declaration, bool(pretty_print)) + c_write_declaration, 1, bool(pretty_print)) def getpath(self, _Element element not None): """Returns a structural, absolute XPath expression to find that element. @@ -1967,10 +1967,10 @@ if isinstance(element_or_tree, _Element): return _tostring(<_Element>element_or_tree, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tostring((<_ElementTree>element_or_tree)._context_node, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) @@ -1987,10 +1987,10 @@ cdef int c_pretty_print c_pretty_print = bool(pretty_print) if isinstance(element_or_tree, _Element): - return _tounicode(<_Element>element_or_tree, c_pretty_print) + return _tounicode(<_Element>element_or_tree, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tounicode((<_ElementTree>element_or_tree)._context_node, - c_pretty_print) + 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Mon Aug 13 15:11:28 2007 @@ -1,7 +1,7 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +29,8 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, + pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -43,7 +44,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int pretty_print): +cdef _tounicode(_Element element, int write_doctype, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -55,7 +56,8 @@ raise LxmlError, "Failed to create output buffer" try: state = python.PyEval_SaveThread() - _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, pretty_print) + _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, + write_doctype, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -72,12 +74,15 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) + if write_doctype: + _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) @@ -93,6 +98,41 @@ tree.xmlOutputBufferWriteString(c_buffer, encoding) tree.xmlOutputBufferWriteString(c_buffer, "'?>\n") +cdef void _writeDtdToBuffer(tree.xmlOutputBuffer* c_buffer, + xmlDoc* c_doc, char* c_root_name, char* encoding): + cdef tree.xmlDtd* c_dtd + cdef xmlNode* c_node + c_dtd = c_doc.intSubset + if c_dtd == NULL or c_dtd.name == NULL: + return + if c_dtd.ExternalID == NULL and c_dtd.SystemID == NULL: + return + if cstd.strcmp(c_root_name, c_dtd.name) != 0: + return + tree.xmlOutputBufferWrite(c_buffer, 10, "\n') + return + tree.xmlOutputBufferWrite(c_buffer, 4, '" [\n') + if c_dtd.notations != NULL: + tree.xmlDumpNotationTable(c_buffer.buffer, + c_dtd.notations) + c_node = c_dtd.children + while c_node is not NULL: + tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, 0, encoding) + c_node = c_node.next + tree.xmlOutputBufferWrite(c_buffer, 3, "]>\n") + cdef void _writeTail(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, int pretty_print): "Write the element tail." @@ -179,7 +219,8 @@ return (<_FilelikeWriter>ctxt).close() cdef _tofilelike(f, _Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef python.PyThreadState* state cdef _FilelikeWriter writer cdef tree.xmlOutputBuffer* c_buffer @@ -209,7 +250,7 @@ raise TypeError, "File or filename expected, got '%s'" % type(f) _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, pretty_print) tree.xmlOutputBufferClose(c_buffer) tree.xmlCharEncCloseFunc(enchandler) if writer is None: Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py (original) +++ lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Mon Aug 13 15:11:28 2007 @@ -1502,6 +1502,20 @@ self.assertEquals(docinfo.system_url, None) self.assertEquals(docinfo.root_name, 'html') self.assertEquals(docinfo.doctype, '') + + def test_dtd_io(self): + # check that DTDs that go in also go back out + xml = '''\ + + + + ]> + test-test\ + ''' + root = self.etree.parse(StringIO(xml)) + self.assertEqual(self.etree.tostring(root).replace(" ", ""), + xml.replace(" ", "")) def test_byte_zero(self): Element = self.etree.Element Modified: lxml/branch/lxml-1.3/src/lxml/tree.pxd ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tree.pxd (original) +++ lxml/branch/lxml-1.3/src/lxml/tree.pxd Mon Aug 13 15:11:28 2007 @@ -58,7 +58,8 @@ ctypedef struct xmlDoc ctypedef struct xmlAttr - + ctypedef struct xmlNotationTable + ctypedef enum xmlElementType: XML_ELEMENT_NODE= 1 XML_ATTRIBUTE_NODE= 2 @@ -103,8 +104,16 @@ unsigned short line ctypedef struct xmlDtd: + char* name char* ExternalID char* SystemID + void* notations + void* entities + void* pentities + void* attributes + void* elements + xmlNode* children + xmlDoc* doc ctypedef struct xmlDoc: xmlElementType type @@ -152,7 +161,7 @@ xmlDoc* doc ctypedef struct xmlBuffer - + ctypedef struct xmlOutputBuffer: xmlBuffer* buffer xmlBuffer* conv @@ -223,9 +232,12 @@ cdef extern from "libxml/valid.h": cdef xmlAttr* xmlGetID(xmlDoc* doc, char* ID) + cdef void xmlDumpNotationTable(xmlBuffer* buffer, xmlNotationTable* table) cdef extern from "libxml/xmlIO.h": + cdef void xmlBufferWriteQuotedString(xmlOutputBuffer* out, char* str) cdef int xmlOutputBufferWriteString(xmlOutputBuffer* out, char* str) + cdef int xmlOutputBufferWrite(xmlOutputBuffer* out, int len, char* str) cdef int xmlOutputBufferFlush(xmlOutputBuffer* out) cdef int xmlOutputBufferClose(xmlOutputBuffer* out) From scoder at codespeak.net Mon Aug 13 15:12:54 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 15:12:54 +0200 (CEST) Subject: [Lxml-checkins] r45624 - lxml/trunk/src/lxml Message-ID: <20070813131254.A84BD8189@code0.codespeak.net> Author: scoder Date: Mon Aug 13 15:12:54 2007 New Revision: 45624 Modified: lxml/trunk/src/lxml/serializer.pxi Log: also write comment and PI siblings of the root node only when serialising an ElementTree Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Mon Aug 13 15:12:54 2007 @@ -1,7 +1,8 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int write_doctype, int pretty_print): + int write_xml_declaration, int write_complete_document, + int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +30,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, write_doctype, + write_xml_declaration, write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) @@ -44,7 +45,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int write_doctype, int pretty_print): +cdef _tounicode(_Element element, int write_complete_document, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -57,7 +58,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, - write_doctype, pretty_print) + write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -74,19 +75,21 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int write_doctype, + int write_xml_declaration, + int write_complete_document, int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_xml_declaration: + if write_complete_document: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) - if write_doctype: + if write_complete_document: _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) - _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) + _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) - _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) + if write_complete_document: + _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) cdef void _writeDeclarationToBuffer(tree.xmlOutputBuffer* c_buffer, char* version, char* encoding): From scoder at codespeak.net Mon Aug 13 16:15:07 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:15:07 +0200 (CEST) Subject: [Lxml-checkins] r45626 - lxml/branch/lxml-1.3/src/lxml Message-ID: <20070813141507.988008131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:15:06 2007 New Revision: 45626 Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi Log: also write comment and PI siblings of the root node only when serialising an ElementTree Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Mon Aug 13 16:15:06 2007 @@ -1,7 +1,8 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int write_doctype, int pretty_print): + int write_xml_declaration, int write_complete_document, + int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +30,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, write_doctype, + write_xml_declaration, write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) @@ -44,7 +45,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int write_doctype, int pretty_print): +cdef _tounicode(_Element element, int write_complete_document, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -57,7 +58,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, - write_doctype, pretty_print) + write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -74,19 +75,21 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int write_doctype, + int write_xml_declaration, + int write_complete_document, int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_xml_declaration: + if write_complete_document: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) - if write_doctype: + if write_complete_document: _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) - _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) + _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) - _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) + if write_complete_document: + _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) cdef void _writeDeclarationToBuffer(tree.xmlOutputBuffer* c_buffer, char* version, char* encoding): From scoder at codespeak.net Mon Aug 13 16:15:20 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:15:20 +0200 (CEST) Subject: [Lxml-checkins] r45627 - lxml/trunk Message-ID: <20070813141520.C563F8131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:15:20 2007 New Revision: 45627 Modified: lxml/trunk/CHANGES.txt Log: changelog update Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Aug 13 16:15:20 2007 @@ -8,6 +8,10 @@ Features added -------------- +* Serialising an ElementTree now includes any internal DTD subsets that are + part of the document, as well as comments and PIs that are siblings of the + root node. + * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. @@ -53,6 +57,9 @@ Other changes ------------- +* Serialising an Element no longer includes includes its comment and PI + siblings (only ElementTree serialisation includes them). + * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias From scoder at codespeak.net Mon Aug 13 16:17:35 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:17:35 +0200 (CEST) Subject: [Lxml-checkins] r45628 - lxml/trunk Message-ID: <20070813141735.5D0688131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:17:33 2007 New Revision: 45628 Modified: lxml/trunk/CHANGES.txt Log: typo Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Aug 13 16:17:33 2007 @@ -57,8 +57,8 @@ Other changes ------------- -* Serialising an Element no longer includes includes its comment and PI - siblings (only ElementTree serialisation includes them). +* Serialising an Element no longer includes its comment and PI siblings (only + ElementTree serialisation includes them). * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias From scoder at codespeak.net Mon Aug 13 16:18:48 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:18:48 +0200 (CEST) Subject: [Lxml-checkins] r45629 - lxml/branch/lxml-1.3 Message-ID: <20070813141848.9A1EE8131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:18:47 2007 New Revision: 45629 Modified: lxml/branch/lxml-1.3/CHANGES.txt Log: changelog update Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Mon Aug 13 16:18:47 2007 @@ -8,11 +8,21 @@ Features added -------------- +* Serialising an ElementTree now includes any internal DTD subsets that are + part of the document, as well as comments and PIs that are siblings of the + root node. + Bugs fixed ---------- * Parsing with the ``no_network`` option could fail +Other changes +------------- + +* Serialising an Element no longer includes its comment and PI siblings (only + ElementTree serialisation includes them). + 1.3.3 (2007-07-26) ================== From scoder at codespeak.net Mon Aug 13 17:12:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 17:12:29 +0200 (CEST) Subject: [Lxml-checkins] r45630 - lxml/trunk Message-ID: <20070813151229.68B318163@code0.codespeak.net> Author: scoder Date: Mon Aug 13 17:12:28 2007 New Revision: 45630 Modified: lxml/trunk/CHANGES.txt Log: Changelog cleanup Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Aug 13 17:12:28 2007 @@ -8,10 +8,6 @@ Features added -------------- -* Serialising an ElementTree now includes any internal DTD subsets that are - part of the document, as well as comments and PIs that are siblings of the - root node. - * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. @@ -41,8 +37,6 @@ Bugs fixed ---------- -* Parsing with the ``no_network`` option could fail - * lxml.etree did not check tag/attribute names * The XML parser did not report undefined entities as error @@ -57,8 +51,7 @@ Other changes ------------- -* Serialising an Element no longer includes its comment and PI siblings (only - ElementTree serialisation includes them). +* objectify.PyType for None is now called "NoneType" * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias @@ -71,6 +64,28 @@ * Network access in parsers disabled by default +1.3.4 (???) +================== + +* Serialising an ElementTree now includes any internal DTD subsets that are + part of the document, as well as comments and PIs that are siblings of the + root node. + +Features added +-------------- + +Bugs fixed +---------- + +* Parsing with the ``no_network`` option could fail + +Other changes +------------- + +* Serialising an Element no longer includes its comment and PI siblings (only + ElementTree serialisation includes them). + + 1.3.3 (2007-07-26) ================== From scoder at codespeak.net Tue Aug 14 11:03:43 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 14 Aug 2007 11:03:43 +0200 (CEST) Subject: [Lxml-checkins] r45644 - lxml/trunk/doc Message-ID: <20070814090343.94F598163@code0.codespeak.net> Author: scoder Date: Tue Aug 14 11:03:43 2007 New Revision: 45644 Modified: lxml/trunk/doc/FAQ.txt Log: FAQ entry on trailing .tail's on serialisation Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Tue Aug 14 11:03:43 2007 @@ -141,6 +141,30 @@ .. _threading: #threading +What about that trailing text on serialised Elements? +----------------------------------------------------- + +The ElementTree tree model defines an Element as a container with a tag name, +contained text, child Elements and a tail text. This means that whenever you +serialise an Element, you will get all parts of that Element:: + + >>> from lxml import etree + >>> root = etree.XML("texttail") + >>> print etree.tostring(root[0]) + texttail + +This is a huge simplification for the tree model as it avoids text nodes to +appear in the list of children and makes access to them quick and simple. So +this is a benefit in most applications and simplifies many, many XML tree +algorithms. + +However, in document-like XML (and especially HTML), the above result can be +unexpected to new users and can sometimes require a bit more overhead. A good +way to deal with this is to use helper functions that copy the Element without +its tail. The ``lxml.html`` package also deals with this in a couple of +places, as most HTML algorithms benefit from a tail-free behaviour. + + Installation ============ From scoder at codespeak.net Tue Aug 14 11:04:05 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 14 Aug 2007 11:04:05 +0200 (CEST) Subject: [Lxml-checkins] r45645 - lxml/branch/lxml-1.3/doc Message-ID: <20070814090405.BC23F814F@code0.codespeak.net> Author: scoder Date: Tue Aug 14 11:04:05 2007 New Revision: 45645 Modified: lxml/branch/lxml-1.3/doc/FAQ.txt Log: FAQ entry on trailing .tail's on serialisation Modified: lxml/branch/lxml-1.3/doc/FAQ.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/FAQ.txt (original) +++ lxml/branch/lxml-1.3/doc/FAQ.txt Tue Aug 14 11:04:05 2007 @@ -142,6 +142,30 @@ .. _threading: #threading +What about that trailing text on serialised Elements? +----------------------------------------------------- + +The ElementTree tree model defines an Element as a container with a tag name, +contained text, child Elements and a tail text. This means that whenever you +serialise an Element, you will get all parts of that Element:: + + >>> from lxml import etree + >>> root = etree.XML("texttail") + >>> print etree.tostring(root[0]) + texttail + +This is a huge simplification for the tree model as it avoids text nodes to +appear in the list of children and makes access to them quick and simple. So +this is a benefit in most applications and simplifies many, many XML tree +algorithms. + +However, in document-like XML (and especially HTML), the above result can be +unexpected to new users and can sometimes require a bit more overhead. A good +way to deal with this is to use helper functions that copy the Element without +its tail. The ``lxml.html`` package also deals with this in a couple of +places, as most HTML algorithms benefit from a tail-free behaviour. + + Installation ============ From scoder at codespeak.net Thu Aug 16 08:35:34 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 08:35:34 +0200 (CEST) Subject: [Lxml-checkins] r45692 - lxml/branch/lxml-1.3/doc Message-ID: <20070816063534.1BEA680FC@code0.codespeak.net> Author: scoder Date: Thu Aug 16 08:35:32 2007 New Revision: 45692 Added: lxml/branch/lxml-1.3/doc/tutorial.txt - copied, changed from r45630, lxml/trunk/doc/tutorial.txt Log: updated Tutorial from trunk Copied: lxml/branch/lxml-1.3/doc/tutorial.txt (from r45630, lxml/trunk/doc/tutorial.txt) ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/branch/lxml-1.3/doc/tutorial.txt Thu Aug 16 08:35:32 2007 @@ -484,8 +484,8 @@ -One such example is the module ``lxml.html.builder``, which provides a -vocabulary for HTML. +One such example is the module ``lxml.html.builder`` in lxml 2.0, which +provides a vocabulary for HTML. ElementPath From scoder at codespeak.net Thu Aug 16 08:46:04 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 08:46:04 +0200 (CEST) Subject: [Lxml-checkins] r45693 - lxml/trunk/src/lxml Message-ID: <20070816064604.7B53E8108@code0.codespeak.net> Author: scoder Date: Thu Aug 16 08:46:03 2007 New Revision: 45693 Modified: lxml/trunk/src/lxml/serializer.pxi Log: fix for DTD serialisation Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Thu Aug 16 08:46:03 2007 @@ -80,7 +80,7 @@ int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_complete_document: + if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) if write_complete_document: From scoder at codespeak.net Thu Aug 16 08:47:28 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 08:47:28 +0200 (CEST) Subject: [Lxml-checkins] r45694 - lxml/branch/lxml-1.3/src/lxml Message-ID: <20070816064728.9D1B1810D@code0.codespeak.net> Author: scoder Date: Thu Aug 16 08:47:28 2007 New Revision: 45694 Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi Log: fix for DTD serialisation Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Thu Aug 16 08:47:28 2007 @@ -80,7 +80,7 @@ int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_complete_document: + if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) if write_complete_document: From scoder at codespeak.net Thu Aug 16 09:05:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 09:05:57 +0200 (CEST) Subject: [Lxml-checkins] r45695 - lxml/trunk/doc Message-ID: <20070816070557.1FF5D8105@code0.codespeak.net> Author: scoder Date: Thu Aug 16 09:05:56 2007 New Revision: 45695 Modified: lxml/trunk/doc/tutorial.txt Log: extended section on ElementTree serialisation Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Thu Aug 16 09:05:56 2007 @@ -332,7 +332,52 @@ The ElementTree class ===================== -An ``ElementTree`` is mainly a wrapper around a tree with a root node. +An ``ElementTree`` is mainly a document wrapper around a tree with a root +node. It provides a couple of methods for parsing, serialisation and general +document handling. One of the bigger differences is that it serialises as a +complete document, as opposed to a single Element. This includes top-level +processing instructions and comments, as well as a DOCTYPE and other DTD +content in the document:: + + >>> from StringIO import StringIO + >>> tree = etree.parse(StringIO('''\ + ... + ... ]> + ... + ... &tasty; + ... + ... ''')) + + >>> print tree.docinfo.doctype + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(tree) + + ]> + + eggs + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(etree.ElementTree(tree.getroot())) + + ]> + + eggs + + + >>> # ElementTree and lxml <= 1.3.3 + >>> print etree.tostring(tree.getroot()) + + eggs + + +Note that this has changed in lxml 1.3.4 to match the behaviour of the +upcoming lxml 2.0. Before, both would serialise without DTD content, which +made lxml loose DTD information in an input-output cycle. Parsing files and XML literals From scoder at codespeak.net Thu Aug 16 09:06:26 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 09:06:26 +0200 (CEST) Subject: [Lxml-checkins] r45696 - lxml/branch/lxml-1.3/doc Message-ID: <20070816070626.D2CEA8105@code0.codespeak.net> Author: scoder Date: Thu Aug 16 09:06:26 2007 New Revision: 45696 Modified: lxml/branch/lxml-1.3/doc/tutorial.txt Log: extended section on ElementTree serialisation Modified: lxml/branch/lxml-1.3/doc/tutorial.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/tutorial.txt (original) +++ lxml/branch/lxml-1.3/doc/tutorial.txt Thu Aug 16 09:06:26 2007 @@ -332,7 +332,52 @@ The ElementTree class ===================== -An ``ElementTree`` is mainly a wrapper around a tree with a root node. +An ``ElementTree`` is mainly a document wrapper around a tree with a root +node. It provides a couple of methods for parsing, serialisation and general +document handling. One of the bigger differences is that it serialises as a +complete document, as opposed to a single Element. This includes top-level +processing instructions and comments, as well as a DOCTYPE and other DTD +content in the document:: + + >>> from StringIO import StringIO + >>> tree = etree.parse(StringIO('''\ + ... + ... ]> + ... + ... &tasty; + ... + ... ''')) + + >>> print tree.docinfo.doctype + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(tree) + + ]> + + eggs + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(etree.ElementTree(tree.getroot())) + + ]> + + eggs + + + >>> # ElementTree and lxml <= 1.3.3 + >>> print etree.tostring(tree.getroot()) + + eggs + + +Note that this has changed in lxml 1.3.4 to match the behaviour of the +upcoming lxml 2.0. Before, both would serialise without DTD content, which +made lxml loose DTD information in an input-output cycle. Parsing files and XML literals From scoder at codespeak.net Thu Aug 16 22:38:39 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 22:38:39 +0200 (CEST) Subject: [Lxml-checkins] r45755 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20070816203839.8A6388175@code0.codespeak.net> Author: scoder Date: Thu Aug 16 22:38:39 2007 New Revision: 45755 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/dtd.pxi lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/tests/test_dtd.py lxml/trunk/src/lxml/tree.pxd Log: support for retrieving the DTD defined internally in a document for validation Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Aug 16 22:38:39 2007 @@ -8,6 +8,10 @@ Features added -------------- +* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` + and ``externalDTD`` that return a DTD object for the internal or external + subset of the document respectively. + * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. Modified: lxml/trunk/src/lxml/dtd.pxi ============================================================================== --- lxml/trunk/src/lxml/dtd.pxi (original) +++ lxml/trunk/src/lxml/dtd.pxi Thu Aug 16 22:38:39 2007 @@ -99,3 +99,19 @@ if c_dtd is NULL: raise DTDParseError, "error parsing DTD" return c_dtd + +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for fast instantiation + cdef DTD NEW_DTD "PY_NEW" (object t) + +cdef DTD _dtdFactory(tree.xmlDtd* c_dtd): + # do not run through DTD.__init__()! + cdef DTD dtd + if c_dtd is NULL: + return None + dtd = NEW_DTD(DTD) + dtd._c_dtd = tree.xmlCopyDtd(c_dtd) + if dtd._c_dtd is NULL: + python.PyErr_NoMemory() + _Validator.__init__(dtd) + return dtd Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Thu Aug 16 22:38:39 2007 @@ -397,37 +397,76 @@ cdef class DocInfo: "Document information provided by parser and DTD." - cdef readonly object root_name - cdef readonly object public_id - cdef readonly object system_url - cdef readonly object xml_version - cdef readonly object encoding - cdef readonly object URL + cdef _Document _doc def __init__(self, tree): "Create a DocInfo object for an ElementTree object or root Element." - cdef _Document doc - doc = _documentOrRaise(tree) - self.root_name, self.public_id, self.system_url = doc.getdoctype() - if not self.root_name and (self.public_id or self.system_url): + self._doc = _documentOrRaise(tree) + root_name, public_id, system_url = self._doc.getdoctype() + if not root_name and (public_id or system_url): raise ValueError, "Could not find root node" - self.xml_version, self.encoding = doc.getxmlinfo() - self.URL = doc.getURL() + + property root_name: + "Returns the name of the root node as defined by the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return root_name + + property public_id: + "Returns the public ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return public_id + + property system_url: + "Returns the system ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return system_url + + property xml_version: + "Returns the XML version as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return xml_version + + property encoding: + "Returns the encoding name as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return encoding + + property URL: + "Returns the source URL of the document (or None if unknown)." + def __get__(self): + return self._doc.getURL() property doctype: + "Returns a DOCTYPE declaration string for the document." def __get__(self): - if self.public_id: - if self.system_url: + root_name, public_id, system_url = self._doc.getdoctype() + if public_id: + if system_url: return '' % ( - self.root_name, self.public_id, self.system_url) + root_name, public_id, system_url) else: return '' % ( - self.root_name, self.public_id) - elif self.system_url: + root_name, public_id) + elif system_url: return '' % ( - self.root_name, self.system_url) + root_name, system_url) else: return "" + property internalDTD: + "Returns a DTD validator based on the internal subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.intSubset) + + property externalDTD: + "Returns a DTD validator based on the external subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.extSubset) + cdef public class _Element [ type LxmlElementType, object LxmlElement ]: """Element class. References a document object and a libxml node. Modified: lxml/trunk/src/lxml/tests/test_dtd.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_dtd.py (original) +++ lxml/trunk/src/lxml/tests/test_dtd.py Thu Aug 16 22:38:39 2007 @@ -36,6 +36,31 @@ dtd = etree.DTD(StringIO("")) dtd.assertValid(root) + def test_dtd_internal(self): + root = etree.XML(''' + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + dtd.assertValid(root) + + def test_dtd_internal_invalid(self): + root = etree.XML(''' + + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + self.assertFalse(dtd.validate(root)) + def test_dtd_broken(self): self.assertRaises(etree.DTDParseError, etree.DTD, StringIO("")) Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Thu Aug 16 22:38:39 2007 @@ -219,6 +219,7 @@ int format, char* encoding) cdef void xmlNodeSetName(xmlNode* cur, char* name) cdef void xmlNodeSetContent(xmlNode* cur, char* content) + cdef xmlDtd* xmlCopyDtd(xmlDtd* dtd) cdef xmlDoc* xmlCopyDoc(xmlDoc* doc, int recursive) cdef xmlNode* xmlCopyNode(xmlNode* node, int extended) cdef xmlNode* xmlDocCopyNode(xmlNode* node, xmlDoc* doc, int extended) From scoder at codespeak.net Thu Aug 16 22:41:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 22:41:17 +0200 (CEST) Subject: [Lxml-checkins] r45756 - in lxml/branch/lxml-1.3: . src/lxml src/lxml/tests Message-ID: <20070816204117.21EA9815D@code0.codespeak.net> Author: scoder Date: Thu Aug 16 22:41:16 2007 New Revision: 45756 Modified: lxml/branch/lxml-1.3/CHANGES.txt lxml/branch/lxml-1.3/src/lxml/dtd.pxi lxml/branch/lxml-1.3/src/lxml/etree.pyx lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py lxml/branch/lxml-1.3/src/lxml/tree.pxd Log: trunk merge: support for retrieving the DTD defined internally in a document for validation Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Thu Aug 16 22:41:16 2007 @@ -8,6 +8,10 @@ Features added -------------- +* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` + and ``externalDTD`` that return a DTD object for the internal or external + subset of the document respectively. + * Serialising an ElementTree now includes any internal DTD subsets that are part of the document, as well as comments and PIs that are siblings of the root node. Modified: lxml/branch/lxml-1.3/src/lxml/dtd.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/dtd.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/dtd.pxi Thu Aug 16 22:41:16 2007 @@ -96,3 +96,19 @@ if c_dtd is NULL: raise DTDParseError, "error parsing DTD" return c_dtd + +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for fast instantiation + cdef DTD NEW_DTD "PY_NEW" (object t) + +cdef DTD _dtdFactory(tree.xmlDtd* c_dtd): + # do not run through DTD.__init__()! + cdef DTD dtd + if c_dtd is NULL: + return None + dtd = NEW_DTD(DTD) + dtd._c_dtd = tree.xmlCopyDtd(c_dtd) + if dtd._c_dtd is NULL: + python.PyErr_NoMemory() + _Validator.__init__(dtd) + return dtd Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/etree.pyx (original) +++ lxml/branch/lxml-1.3/src/lxml/etree.pyx Thu Aug 16 22:41:16 2007 @@ -384,37 +384,76 @@ cdef class DocInfo: "Document information provided by parser and DTD." - cdef readonly object root_name - cdef readonly object public_id - cdef readonly object system_url - cdef readonly object xml_version - cdef readonly object encoding - cdef readonly object URL + cdef _Document _doc def __init__(self, tree): "Create a DocInfo object for an ElementTree object or root Element." - cdef _Document doc - doc = _documentOrRaise(tree) - self.root_name, self.public_id, self.system_url = doc.getdoctype() - if not self.root_name and (self.public_id or self.system_url): + self._doc = _documentOrRaise(tree) + root_name, public_id, system_url = self._doc.getdoctype() + if not root_name and (public_id or system_url): raise ValueError, "Could not find root node" - self.xml_version, self.encoding = doc.getxmlinfo() - self.URL = doc.getURL() + + property root_name: + "Returns the name of the root node as defined by the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return root_name + + property public_id: + "Returns the public ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return public_id + + property system_url: + "Returns the system ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return system_url + + property xml_version: + "Returns the XML version as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return xml_version + + property encoding: + "Returns the encoding name as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return encoding + + property URL: + "Returns the source URL of the document (or None if unknown)." + def __get__(self): + return self._doc.getURL() property doctype: + "Returns a DOCTYPE declaration string for the document." def __get__(self): - if self.public_id: - if self.system_url: + root_name, public_id, system_url = self._doc.getdoctype() + if public_id: + if system_url: return '' % ( - self.root_name, self.public_id, self.system_url) + root_name, public_id, system_url) else: return '' % ( - self.root_name, self.public_id) - elif self.system_url: + root_name, public_id) + elif system_url: return '' % ( - self.root_name, self.system_url) + root_name, system_url) else: return "" + property internalDTD: + "Returns a DTD validator based on the internal subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.intSubset) + + property externalDTD: + "Returns a DTD validator based on the external subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.extSubset) + cdef public class _Element [ type LxmlElementType, object LxmlElement ]: """Element class. References a document object and a libxml node. Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py (original) +++ lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py Thu Aug 16 22:41:16 2007 @@ -36,6 +36,31 @@ dtd = etree.DTD(StringIO("")) dtd.assertValid(root) + def test_dtd_internal(self): + root = etree.XML(''' + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + dtd.assertValid(root) + + def test_dtd_internal_invalid(self): + root = etree.XML(''' + + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + self.assertFalse(dtd.validate(root)) + def test_dtd_broken(self): self.assertRaises(etree.DTDParseError, etree.DTD, StringIO("")) Modified: lxml/branch/lxml-1.3/src/lxml/tree.pxd ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tree.pxd (original) +++ lxml/branch/lxml-1.3/src/lxml/tree.pxd Thu Aug 16 22:41:16 2007 @@ -218,6 +218,7 @@ int format, char* encoding) cdef void xmlNodeSetName(xmlNode* cur, char* name) cdef void xmlNodeSetContent(xmlNode* cur, char* content) + cdef xmlDtd* xmlCopyDtd(xmlDtd* dtd) cdef xmlDoc* xmlCopyDoc(xmlDoc* doc, int recursive) cdef xmlNode* xmlCopyNode(xmlNode* node, int extended) cdef xmlNode* xmlDocCopyNode(xmlNode* node, xmlDoc* doc, int extended) From scoder at codespeak.net Sat Aug 18 11:37:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:37:57 +0200 (CEST) Subject: [Lxml-checkins] r45834 - lxml/branch/html/src/lxml/html Message-ID: <20070818093757.AD4AE81C7@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:37:55 2007 New Revision: 45834 Modified: lxml/branch/html/src/lxml/html/__init__.py Log: raise KeyError if no default is passed to a failed get_element_by_id() Modified: lxml/branch/html/src/lxml/html/__init__.py ============================================================================== --- lxml/branch/html/src/lxml/html/__init__.py (original) +++ lxml/branch/html/src/lxml/html/__init__.py Sat Aug 18 11:37:55 2007 @@ -152,23 +152,26 @@ """ return _class_xpath(self, class_name=class_name) - def get_element_by_id(self, id, default=None): + def get_element_by_id(self, id, *default): """ - Get the first element in a document with the given id. If - none are found, return default (None). + Get the first element in a document with the given id. If none is + found, return the default argument if provided or raise KeyError + otherwise. Note that there can be more than one element with the same id, and this isn't uncommon in HTML documents found in the wild. Browsers return only the first match, and this function does the same. """ - # FIXME: should this raise an exception when something isn't found? try: # FIXME: should this check for multiple matches? # browsers just return the first one return _id_xpath(self, id=id)[0] except IndexError: - return default + if default: + return default[0] + else: + raise KeyError, id def text_content(self): """ From scoder at codespeak.net Sat Aug 18 11:46:14 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:46:14 +0200 (CEST) Subject: [Lxml-checkins] r45835 - lxml/trunk/src/lxml/html Message-ID: <20070818094614.1F98C81D0@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:46:12 2007 New Revision: 45835 Added: lxml/trunk/src/lxml/html/ - copied from r45834, lxml/branch/html/src/lxml/html/ Log: copied lxml.html from html branch From scoder at codespeak.net Sat Aug 18 11:48:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:48:17 +0200 (CEST) Subject: [Lxml-checkins] r45836 - lxml/trunk/src/lxml Message-ID: <20070818094817.D2F3381E7@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:48:17 2007 New Revision: 45836 Added: lxml/trunk/src/lxml/cssselect.py - copied unchanged from r45835, lxml/branch/html/src/lxml/cssselect.py Log: copied lxml.cssselect from html branch From scoder at codespeak.net Sat Aug 18 11:48:41 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:48:41 +0200 (CEST) Subject: [Lxml-checkins] r45837 - lxml/trunk/src/lxml Message-ID: <20070818094841.26BDB81E8@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:48:40 2007 New Revision: 45837 Added: lxml/trunk/src/lxml/doctestcompare.py - copied unchanged from r45836, lxml/branch/html/src/lxml/doctestcompare.py Log: copied lxml.doctestcompare from html branch From scoder at codespeak.net Sat Aug 18 11:52:39 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:52:39 +0200 (CEST) Subject: [Lxml-checkins] r45838 - lxml/trunk/doc Message-ID: <20070818095239.252F881E8@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:52:38 2007 New Revision: 45838 Added: lxml/trunk/doc/cssselect.txt - copied unchanged from r45837, lxml/branch/html/doc/cssselect.txt Log: copied docs from html branch From scoder at codespeak.net Sat Aug 18 11:52:53 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:52:53 +0200 (CEST) Subject: [Lxml-checkins] r45839 - lxml/trunk/doc Message-ID: <20070818095253.221D881E8@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:52:52 2007 New Revision: 45839 Added: lxml/trunk/doc/elementsoup.txt - copied unchanged from r45838, lxml/branch/html/doc/elementsoup.txt Log: copied docs from html branch From scoder at codespeak.net Sat Aug 18 11:54:14 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:54:14 +0200 (CEST) Subject: [Lxml-checkins] r45840 - lxml/trunk/doc Message-ID: <20070818095414.A6F2381A6@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:54:14 2007 New Revision: 45840 Added: lxml/trunk/doc/lxmlhtml.txt - copied unchanged from r45839, lxml/branch/html/doc/lxmlhtml.txt Log: copied docs from html branch From scoder at codespeak.net Sat Aug 18 12:18:30 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:18:30 +0200 (CEST) Subject: [Lxml-checkins] r45841 - lxml/trunk/src/lxml Message-ID: <20070818101830.365EE81F2@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:18:29 2007 New Revision: 45841 Added: lxml/trunk/src/lxml/usedoctest.py - copied unchanged from r45840, lxml/branch/html/src/lxml/usedoctest.py Log: copied lxml.usedoctest from html branch From scoder at codespeak.net Sat Aug 18 12:28:59 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:28:59 +0200 (CEST) Subject: [Lxml-checkins] r45842 - in lxml/trunk: . doc Message-ID: <20070818102859.2689B81EC@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:28:57 2007 New Revision: 45842 Modified: lxml/trunk/CHANGES.txt lxml/trunk/doc/mkhtml.py lxml/trunk/setup.py Log: integrated lxml.html Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Aug 18 12:28:57 2007 @@ -8,14 +8,22 @@ Features added -------------- -* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` - and ``externalDTD`` that return a DTD object for the internal or external - subset of the document respectively. +* HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup`` + +* New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified + doctests based on XML/HTML output. Use by importing ``lxml.usedoctest`` or + ``lxml.html.usedoctest`` from within a doctest. + +* New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS + selectors. + +* New package ``lxml.html`` written by Ian Bicking for sophisticated HTML + handling. * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. -* Schematron validation +* Schematron validation (incomplete in libxml2) * Extended type support for ``objectify.E`` based on registered PyTypes. Supports an additional argument to ``PyType()`` that takes a conversion @@ -71,6 +79,10 @@ 1.3.4 (???) ================== +* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` + and ``externalDTD`` that return a DTD object for the internal or external + subset of the document respectively. + * Serialising an ElementTree now includes any internal DTD subsets that are part of the document, as well as comments and PIs that are siblings of the root node. Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Sat Aug 18 12:28:57 2007 @@ -6,7 +6,8 @@ 'performance.txt', 'build.txt')), ('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt', 'validation.txt', 'xpathxslt.txt', - 'objectify.txt')), + 'objectify.txt', 'lxmlhtml.txt', + 'cssselect.txt', 'elementsoup.txt')), ('Extending lxml', ('resolvers.txt', 'extensions.txt', 'element_classes.txt', 'sax.txt', 'capi.txt')), ] Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Sat Aug 18 12:28:57 2007 @@ -85,7 +85,7 @@ ], package_dir = {'': 'src'}, - packages = ['lxml'], + packages = ['lxml', 'lxml.html'], zip_safe = False, ext_modules = setupinfo.ext_modules( STATIC_INCLUDE_DIRS, STATIC_LIBRARY_DIRS, STATIC_CFLAGS), From scoder at codespeak.net Sat Aug 18 12:31:51 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:31:51 +0200 (CEST) Subject: [Lxml-checkins] r45843 - lxml/trunk Message-ID: <20070818103151.BF9228202@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:31:51 2007 New Revision: 45843 Modified: lxml/trunk/CHANGES.txt Log: cleanup Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Aug 18 12:31:51 2007 @@ -17,8 +17,8 @@ * New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS selectors. -* New package ``lxml.html`` written by Ian Bicking for sophisticated HTML - handling. +* New package ``lxml.html`` written by Ian Bicking for advanced HTML + treatment. * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. From scoder at codespeak.net Sat Aug 18 12:47:47 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:47:47 +0200 (CEST) Subject: [Lxml-checkins] r45845 - lxml/trunk/doc Message-ID: <20070818104747.02AE881E5@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:47:46 2007 New Revision: 45845 Modified: lxml/trunk/doc/lxmlhtml.txt Log: doc cleanup Modified: lxml/trunk/doc/lxmlhtml.txt ============================================================================== --- lxml/trunk/doc/lxmlhtml.txt (original) +++ lxml/trunk/doc/lxmlhtml.txt Sat Aug 18 12:47:46 2007 @@ -351,9 +351,11 @@ In addition to cleaning up malicious HTML, ``lxml.html.clean`` contains functions to do other things to your HTML. This includes -autolinking: +autolinking:: - ``autolink(doc, ...)`` and ``autolink_html(html, ...)`` + autolink(doc, ...) + + autolink_html(html, ...) This finds anything that looks like a link (e.g., ``http://example.com``) in the *text* of an HTML document, and @@ -378,9 +380,11 @@ wordwrap -------- -You can also wrap long words in your html: +You can also wrap long words in your html:: + + word_break(doc, max_width=40, ...) - ``word_break(doc, max_width=40, ...)`` and ``word_break_html(html, ...)`` + word_break_html(html, ...) This finds any long words in the text of the document and inserts ``​`` in the document (which is the Unicode zero-width space). @@ -416,7 +420,7 @@ >>> doc = HTML(content) >>> doc.make_links_absolute(url) -Then we create some objects to put the information in: +Then we create some objects to put the information in:: >>> class Card(object): ... def __init__(self, **kw): @@ -426,7 +430,7 @@ ... def __init__(self, phone, types=()): ... self.phone, self.types = phone, types -And some generally handy functions for microformats: +And some generally handy functions for microformats:: >>> def get_text(el, class_name): ... els = el.find_class(class_name) @@ -442,7 +446,7 @@ ... # Ideally this would parse street, etc. ... return el.find_class('adr') -Then the parsing: +Then the parsing:: >>> for el in doc.find_class('hcard'): ... card = Card() From scoder at codespeak.net Mon Aug 20 11:14:58 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 20 Aug 2007 11:14:58 +0200 (CEST) Subject: [Lxml-checkins] r45875 - in lxml/branch/lxml-1.3: . src/lxml src/lxml/tests Message-ID: <20070820091458.51C45819C@code0.codespeak.net> Author: scoder Date: Mon Aug 20 11:14:56 2007 New Revision: 45875 Modified: lxml/branch/lxml-1.3/CHANGES.txt lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Log: raise Warning instead of Error for ':' tag names Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Mon Aug 20 11:14:56 2007 @@ -24,6 +24,15 @@ Other changes ------------- +* lxml now raises a TagNameWarning about tag names containing ':' instead of + an Error as 1.3.3 did. The reason is that a number of projects currently + misuse the previous lack of tag name validation to generate namespace + prefixes without declaring namespaces. Apart from the danger of generating + broken XML this way, it also breaks most of the namespace-aware tools in + XML, including XPath, XSLT and validation. lxml 1.3.x will continue to + support this bug with a Warning, while lxml 2.0 will be strict about + well-formed tag names (not only regarding ':'). + * Serialising an Element no longer includes its comment and PI siblings (only ElementTree serialisation includes them). Modified: lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi Mon Aug 20 11:14:56 2007 @@ -691,6 +691,17 @@ else: raise TypeError, "Argument must be string or unicode." +cdef object warnings +import warnings +class TagNameWarning(SyntaxWarning): + pass + +cdef int warnAboutTagName() except -1: + warnings.warn("Tag names must not contain ':', " + "lxml 2.0 will enforce well-formed tag names " + "as required by the XML specification.", + TagNameWarning) + cdef _getNsTag(tag): """Given a tag, find namespace URI and tag name. Return None for NS uri if no namespace URI available. @@ -709,7 +720,7 @@ if c_ns_end is NULL: raise ValueError, "Invalid tag name" if cstd.strchr(c_ns_end, c':') is not NULL: - raise ValueError, "Invalid tag name" + warnAboutTagName() nslen = c_ns_end - c_tag taglen = python.PyString_GET_SIZE(tag) - nslen - 2 if taglen == 0: @@ -720,7 +731,7 @@ elif python.PyString_GET_SIZE(tag) == 0: raise ValueError, "Empty tag name" elif cstd.strchr(c_tag, c':') is not NULL: - raise ValueError, "Invalid tag name" + warnAboutTagName() return ns, tag cdef object _namespacedName(xmlNode* c_node): Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py (original) +++ lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Mon Aug 20 11:14:56 2007 @@ -8,7 +8,7 @@ """ -import unittest, copy, sys +import unittest, copy, sys, warnings from common_imports import etree, StringIO, HelperTestCase, fileInTestDir from common_imports import SillyFileLike, canonicalize, doctest @@ -32,6 +32,8 @@ seq.sort() return seq +warnings.simplefilter("error", etree.TagNameWarning) + class ETreeOnlyTestCase(HelperTestCase): """Tests only for etree, not ElementTree""" etree = etree @@ -68,11 +70,14 @@ def test_element_name_colon(self): Element = self.etree.Element - self.assertRaises(ValueError, Element, 'p:name') - self.assertRaises(ValueError, Element, '{test}p:name') + self.assertRaises(self.etree.TagNameWarning, + Element, 'p:name') + self.assertRaises(self.etree.TagNameWarning, + Element, '{test}p:name') el = Element('name') - self.assertRaises(ValueError, setattr, el, 'tag', 'p:name') + self.assertRaises(self.etree.TagNameWarning, + setattr, el, 'tag', 'p:name') def test_attribute_set(self): # ElementTree accepts arbitrary attribute values From scoder at codespeak.net Mon Aug 20 12:15:34 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 20 Aug 2007 12:15:34 +0200 (CEST) Subject: [Lxml-checkins] r45876 - in lxml/trunk/src/lxml: . tests Message-ID: <20070820101534.C5F348173@code0.codespeak.net> Author: scoder Date: Mon Aug 20 12:15:32 2007 New Revision: 45876 Modified: lxml/trunk/src/lxml/cstd.pxd lxml/trunk/src/lxml/etree_defs.h lxml/trunk/src/lxml/objectify.pyx lxml/trunk/src/lxml/python.pxd lxml/trunk/src/lxml/tests/test_objectify.py Log: objectify updates by Holger, support passing ObjectifiedElement objects into DateElement() Modified: lxml/trunk/src/lxml/cstd.pxd ============================================================================== --- lxml/trunk/src/lxml/cstd.pxd (original) +++ lxml/trunk/src/lxml/cstd.pxd Mon Aug 20 12:15:32 2007 @@ -9,6 +9,7 @@ cdef int strlen(char* s) cdef char* strstr(char* haystack, char* needle) cdef char* strchr(char* haystack, int needle) + cdef char* strrchr(char* haystack, int needle) cdef int strcmp(char* s1, char* s2) cdef int strncmp(char* s1, char* s2, size_t len) cdef void* memcpy(void* dest, void* src, size_t len) Modified: lxml/trunk/src/lxml/etree_defs.h ============================================================================== --- lxml/trunk/src/lxml/etree_defs.h (original) +++ lxml/trunk/src/lxml/etree_defs.h Mon Aug 20 12:15:32 2007 @@ -99,6 +99,7 @@ #define repr(o) PyObject_Repr(o) #define iter(o) PyObject_GetIter(o) #define _cstr(s) PyString_AS_STRING(s) +#define _fqtypename(o) (((PyTypeObject*)o)->ob_type->tp_name) static PyObject* __PY_NEW_GLOBAL_EMPTY_TUPLE = NULL; Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Mon Aug 20 12:15:32 2007 @@ -70,6 +70,16 @@ cdef object _ElementMaker from builder import ElementMaker as _ElementMaker +cdef object _typename(object t): + cdef char* c_name + cdef char* s + c_name = python._fqtypename(t) + s = cstd.strrchr(c_name, c'.') + if s == NULL: + return c_name + else: + return (s+1) + # namespace/name for "pytype" hint attribute cdef object PYTYPE_NAMESPACE cdef char* _PYTYPE_NAMESPACE @@ -232,7 +242,7 @@ if tag == 'text' or tag == 'pyval': # read-only ! raise TypeError, "attribute '%s' of '%s' objects is not writable"% \ - (tag, type(self).__name__) + (tag, _typename(self)) elif tag == 'tail': cetree.setTailText(self._c_node, value) return @@ -916,6 +926,15 @@ def __lower_bool(b): return _lower_bool(b) +cdef _get_pytypename(obj): + if python.PyUnicode_Check(obj): + return "str" + else: + return _typename(obj) + +def __get_pytypename(obj): + return _get_pytypename(obj) + cdef _registerPyTypes(): pytype = PyType('int', int, IntElement) pytype.xmlSchemaTypes = ("int", "short", "byte", "unsignedShort", @@ -1020,7 +1039,6 @@ """Type map for the ElementMaker. """ cdef object _typemap - cdef object _typemap_get def __init__(self, initial=None): if initial is None: @@ -1132,7 +1150,7 @@ else: value = repr(value) result = "%s%s = %s [%s]\n" % (indentstr, element.tag, - value, type(element).__name__) + value, _typename(element)) xsi_ns = "{%s}" % XML_SCHEMA_INSTANCE_NS pytype_ns = "{%s}" % PYTYPE_NAMESPACE for name, value in cetree.iterattributes(element, 3): @@ -2019,6 +2037,13 @@ attrib = dict(attrib) attrib.update(_attributes) _attributes = attrib + if isinstance(_value, ObjectifiedElement): + if _pytype is None: + if _xsi is None and not _attributes and nsmap is _DEFAULT_NSMAP: + # special case: no change! + return _value.__copy__() + elif PYTYPE_ATTRIBUTE not in _attributes: + _pytype = _get_pytypename(_value) if isinstance(_value, ObjectifiedDataElement): # reuse existing nsmap unless redefined in nsmap parameter temp = _value.nsmap @@ -2070,9 +2095,9 @@ if dict_result is not NULL: _pytype = (dict_result).name - if _value is None: + if _value is None and _pytype != "str": + _pytype = _pytype or "NoneType" strval = None - _pytype = "NoneType" elif python._isString(_value): strval = _value elif python.PyBool_Check(_value): @@ -2102,7 +2127,7 @@ type_check(strval) if _pytype is not None: - if _pytype == "NoneType": + if _pytype == "NoneType" or _pytype == "none": strval = None python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") else: Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Mon Aug 20 12:15:32 2007 @@ -107,6 +107,7 @@ cdef int _isString(object obj) cdef int isinstance(object instance, object classes) cdef int issubclass(object derived, object superclasses) + cdef char* _fqtypename(object t) cdef int hasattr(object obj, object attr) cdef object getattr(object obj, object attr) cdef int callable(object obj) Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Mon Aug 20 12:15:32 2007 @@ -658,6 +658,18 @@ self.assertEquals(value.text, None) self.assertEquals(value.pyval, None) + def test_data_element_pytype_none_compat(self): + # pre-2.0 lxml called NoneElement "none" + pyval = 1 + pytype = "none" + objclass = objectify.NoneElement + value = objectify.DataElement(pyval, _pytype=pytype) + self.assert_(isinstance(value, objclass), + "DataElement(%s, _pytype='%s') returns %s, expected %s" + % (pyval, pytype, type(value), objclass)) + self.assertEquals(value.text, None) + self.assertEquals(value.pyval, None) + def test_schema_types(self): XML = self.XML root = XML('''\ From scoder at codespeak.net Tue Aug 21 17:20:46 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 21 Aug 2007 17:20:46 +0200 (CEST) Subject: [Lxml-checkins] r45900 - lxml/trunk/src/lxml/html Message-ID: <20070821152046.99B898156@code0.codespeak.net> Author: scoder Date: Tue Aug 21 17:20:46 2007 New Revision: 45900 Modified: lxml/trunk/src/lxml/html/__init__.py Log: some cleanup in iterlinks() Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Tue Aug 21 17:20:46 2007 @@ -242,16 +242,17 @@ """ link_attrs = defs.link_attrs for el in self.getiterator(): + attribs = el.attrib for attrib in link_attrs: - if attrib in el.attrib: - yield (el, attrib, el.attrib[attrib], 0) + if attrib in attribs: + yield (el, attrib, attribs[attrib], 0) if el.tag == 'style' and el.text: for match in _css_url_re.finditer(el.text): yield (el, None, match.group(1), match.start(1)) for match in _css_import_re.finditer(el.text): yield (el, None, match.group(1), match.start(1)) - if 'style' in el.attrib: - for match in _css_url_re.finditer(el.attrib['style']): + if 'style' in attribs: + for match in _css_url_re.finditer(attribs['style']): yield (el, 'style', match.group(1), match.start(1)) def rewrite_links(self, link_repl_func, resolve_base_href=True, From scoder at codespeak.net Wed Aug 22 22:22:56 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:22:56 +0200 (CEST) Subject: [Lxml-checkins] r45918 - lxml/trunk/src/lxml Message-ID: <20070822202256.116548172@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:22:55 2007 New Revision: 45918 Modified: lxml/trunk/src/lxml/objectify.pyx Log: cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:22:55 2007 @@ -814,7 +814,7 @@ """ cdef readonly object name cdef readonly object type_check - cdef object _stringify + cdef object _add_text cdef object _type cdef object _schema_types def __init__(self, name, type_check, type_class, stringify=None): @@ -831,9 +831,9 @@ self._type = type_class self.type_check = type_check if stringify is None: - self._stringify = _StringValueSetter(__builtin__.str) + self._add_text = _StringValueSetter(__builtin__.str) else: - self._stringify = _StringValueSetter(stringify) + self._add_text = _StringValueSetter(stringify) self._schema_types = [] def __repr__(self): @@ -1081,7 +1081,7 @@ result = python.PyDict_GetItem(_PYTYPE_DICT, name) if result is NULL: return None - return (result)._stringify + return (result)._add_text return result def __contains__(self, type): @@ -1798,110 +1798,6 @@ tree.xmlSetNsProp(c_node, c_ns, "nil", "true") tree.END_FOR_EACH_ELEMENT_FROM(c_node) -def __xsiannotate(element_or_tree, ignore_old=True): - """Recursively annotates the elements of an XML tree with 'xsi:type' - attributes. - - If the 'ignore_old' keyword argument is True (the default), current - 'xsi:type' attributes will be ignored and replaced. Otherwise, they will be - checked and only replaced if they no longer fit the current text value. - - Note that tha mapping from Python types to XSI types is usually ambiguous. - Currently, only the first XSI type name in the corresponding PyType - definition will be used for annotation. Thus, you should consider naming - the widest type first here if you define additional types. - """ - cdef _Element element - cdef _Document doc - cdef int ignore - cdef int istree - cdef tree.xmlNode* c_node - cdef tree.xmlNs* c_ns - cdef python.PyObject* dict_result - cdef PyType pytype - element = cetree.rootNodeOrRaise(element_or_tree) - doc = element._doc - ignore = bool(ignore_old) - - StrType = _PYTYPE_DICT.get('str') - c_node = element._c_node - tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) - if c_node.type == tree.XML_ELEMENT_NODE: - typename = None - pytype = None - value = None - istree = 0 - if not ignore: - # check that old value is valid - typename = cetree.attributeValueFromNsName( - c_node, _XML_SCHEMA_INSTANCE_NS, "type") - if typename is not None: - dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, typename) - if dict_result is NULL and ':' in typename: - prefix, typename = typename.split(':', 1) - dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, typename) - if dict_result is not NULL: - pytype = dict_result - if pytype is not StrType: - # StrType does not have a typecheck but is the default anyway, - # so just accept it if given as type information - pytype = _check_type(c_node, pytype) - if pytype is None: - typename = None - - if typename is None: - if pytype is None: - # check for pytype hint - value = cetree.attributeValueFromNsName( - c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) - - if value is not None: - if value == TREE_PYTYPE: - istree = 1 - else: - dict_result = python.PyDict_GetItem(_PYTYPE_DICT, value) - if dict_result is not NULL: - pytype = dict_result - if pytype is not StrType: - pytype = _check_type(c_node, pytype) - - if not istree and pytype is None: - # try to guess type - if cetree.findChildForwards(c_node, 0) is NULL: - # element has no children => data class - pytype = _guessPyType(textOf(c_node), StrType) - else: - istree = 1 - - if typename is None and not istree and pytype is not None: - if python.PyList_GET_SIZE(pytype._schema_types) > 0: - # pytype->xsi:type is a 1:n mapping so simply take the first - typename = pytype._schema_types[0] - - if typename is None or istree: - # delete attribute if it exists - cetree.delAttributeFromNsName(c_node, _XML_SCHEMA_INSTANCE_NS, "type") - else: - # update or create attribute - c_ns = cetree.findOrBuildNodeNsPrefix( - doc, c_node, _XML_SCHEMA_NS, 'xsd') - if c_ns is not NULL: - if ':' in typename: - prefix, name = typename.split(':', 1) - if c_ns.prefix is NULL or c_ns.prefix[0] == c'\0': - typename = name - elif cstd.strcmp(_cstr(prefix), c_ns.prefix) != 0: - prefix = c_ns.prefix - typename = prefix + ':' + name - elif c_ns.prefix is not NULL or c_ns.prefix[0] != c'\0': - prefix = c_ns.prefix - typename = prefix + ':' + typename - c_ns = cetree.findOrBuildNodeNsPrefix( - doc, c_node, _XML_SCHEMA_INSTANCE_NS, 'xsi') - tree.xmlSetNsProp(c_node, c_ns, "type", _cstr(typename)) - tree.END_FOR_EACH_ELEMENT_FROM(c_node) - - def deannotate(element_or_tree, pytype=True, xsi=True): """Recursively de-annotate the elements of an XML tree by removing 'pytype' and/or 'type' attributes. @@ -2042,19 +1938,17 @@ if _xsi is None and not _attributes and nsmap is _DEFAULT_NSMAP: # special case: no change! return _value.__copy__() - elif PYTYPE_ATTRIBUTE not in _attributes: - _pytype = _get_pytypename(_value) if isinstance(_value, ObjectifiedDataElement): # reuse existing nsmap unless redefined in nsmap parameter temp = _value.nsmap if temp is not None and temp: - temp = dict(_value.nsmap) + temp = dict(temp) temp.update(nsmap) nsmap = temp # reuse existing attributes unless redefined in attrib/_attributes temp = _value.attrib if temp is not None and temp: - temp = dict(_value.attrib) + temp = dict(temp) temp.update(_attributes) _attributes = temp # reuse existing xsi:type or py:pytype attributes, unless provided as From scoder at codespeak.net Wed Aug 22 22:24:51 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:24:51 +0200 (CEST) Subject: [Lxml-checkins] r45919 - lxml/trunk/src/lxml Message-ID: <20070822202451.2C721816C@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:24:49 2007 New Revision: 45919 Modified: lxml/trunk/src/lxml/objectify.pyx Log: new ElementMaker implementation specifically for objectify Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:24:49 2007 @@ -1030,6 +1030,78 @@ ################################################################################ # adapted ElementMaker supports registered PyTypes +cdef class ElementMaker: + cdef object _makeelement + cdef object _namespace + cdef object _nsmap + def __init__(self, namespace=None, nsmap=None, makeelement=None): + self._nsmap = nsmap + if namespace is None: + self._namespace = None + else: + self._namespace = "{%s}" % namespace + if makeelement is not None: + assert callable(makeelement) + self._makeelement = makeelement + else: + self._makeelement = None + + def __getattr__(self, tag): + if tag[0] != "{" and self._namespace is not None: + tag = self._namespace + tag + return _ObjectifyElementMakerCaller( + self._makeelement, tag, self._nsmap) + +cdef class _ObjectifyElementMakerCaller: + cdef object _tag + cdef object _nsmap + cdef object _element_factory + def __init__(self, element_factory, tag, nsmap): + self._element_factory = element_factory + self._tag = tag + self._nsmap = nsmap + + def __call__(self, *children, **attrib): + cdef _ObjectifyElementMakerCaller elementMaker + cdef python.PyObject* pytype + cdef _Element element + if self._element_factory is None: + element = cetree.makeElement( + self._tag, None, objectify_parser, + None, None, attrib, self._nsmap) + else: + element = self._element_factory(self._tag, attrib, self._nsmap) + + for child in children: + if child is None: + if len(children) == 1: + cetree.setAttributeValue( + element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") + elif python._isString(child): + _add_text(element, child) + elif isinstance(child, _Element): + cetree.appendChild(element, child) + elif isinstance(child, _ObjectifyElementMakerCaller): + elementMaker = <_ObjectifyElementMakerCaller>child + if elementMaker._element_factory is None: + child = cetree.makeElement( + elementMaker._tag, element._doc, objectify_parser, + None, None, None, None) + else: + child = elementMaker._element_factory( + (<_ObjectifyElementMakerCaller>child)._tag) + cetree.appendChild(element, child) + else: + pytype = python.PyDict_GetItem( + _PYTYPE_DICT, _typename(child)) + if pytype is not NULL: + (pytype)._add_text(element, child) + else: + child = str(child) + _add_text(element, child) + + return element + class ElementMaker(_ElementMaker): def __init__(self, typemap=None): typemap = _ObjectifyTypemap(typemap) From scoder at codespeak.net Wed Aug 22 22:25:50 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:25:50 +0200 (CEST) Subject: [Lxml-checkins] r45920 - lxml/trunk/src/lxml Message-ID: <20070822202550.D556A816C@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:25:50 2007 New Revision: 45920 Modified: lxml/trunk/src/lxml/objectify.pyx Log: removed old ElementMaker implementation Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:25:50 2007 @@ -1102,82 +1102,6 @@ return element -class ElementMaker(_ElementMaker): - def __init__(self, typemap=None): - typemap = _ObjectifyTypemap(typemap) - _ElementMaker.__init__(self, typemap, objectify_parser.makeelement) - -cdef class _ObjectifyTypemap: - """Type map for the ElementMaker. - """ - cdef object _typemap - - def __init__(self, initial=None): - if initial is None: - self._typemap = {} - else: - self._typemap = dict(initial) - - self._typemap[__builtin__.str] = __add_text - self._typemap[__builtin__.str.__name__] = __add_text - - self._typemap[__builtin__.unicode] = __add_text - self._typemap[__builtin__.unicode.__name__] = __add_text - - self._typemap[__builtin__.int] = __add_stringifiable - self._typemap[__builtin__.int.__name__] = __add_stringifiable - - self._typemap[__builtin__.long] = __add_stringifiable - self._typemap[__builtin__.long.__name__] = __add_stringifiable - - self._typemap[__builtin__.float] = __add_stringifiable - self._typemap[__builtin__.float.__name__] = __add_stringifiable - - self._typemap[__builtin__.bool] = __add_bool - self._typemap[__builtin__.bool.__name__] = __add_bool - - NoneType = type(None) - self._typemap[NoneType] = __add_none - self._typemap[NoneType.__name__] = __add_none - - def copy(self): - return self - - def get(self, type): - cdef python.PyObject* result - result = python.PyDict_GetItem(self._typemap, type) - if result is NULL: - name = type.__name__ - result = python.PyDict_GetItem(self._typemap, name) - if result is NULL: - result = python.PyDict_GetItem(_PYTYPE_DICT, name) - if result is NULL: - return None - return (result)._add_text - return result - - def __contains__(self, type): - return type in self._typemap or type.__name__ in self._typemap - - def __getitem__(self, key): - return self._typemap[key] - - def __setitem__(self, key, value): - self._typemap[key] = value - self._typemap[key.__name__] = value - -def __add_stringifiable(_Element elem not None, number): - _add_text(elem, str(number)) - -def __add_bool(_Element elem not None, bool_val): - _add_text(elem, _lower_bool(bool_val)) - -def __add_text(_Element elem not None, text): - _add_text(elem, text) - -def __add_none(_Element elem not None, none_val): - pass - cdef _add_text(_Element elem, text): cdef tree.xmlNode* c_child c_child = cetree.findChildBackwards(elem._c_node, 0) From scoder at codespeak.net Wed Aug 22 22:44:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:44:57 +0200 (CEST) Subject: [Lxml-checkins] r45921 - lxml/trunk/src/lxml Message-ID: <20070822204457.91AEB81FF@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:44:55 2007 New Revision: 45921 Modified: lxml/trunk/src/lxml/objectify.pyx Log: cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:44:55 2007 @@ -1062,7 +1062,6 @@ self._nsmap = nsmap def __call__(self, *children, **attrib): - cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element if self._element_factory is None: @@ -1074,7 +1073,7 @@ for child in children: if child is None: - if len(children) == 1: + if python.PyTuple_GET_SIZE(children) == 1: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif python._isString(child): @@ -1082,14 +1081,14 @@ elif isinstance(child, _Element): cetree.appendChild(element, child) elif isinstance(child, _ObjectifyElementMakerCaller): - elementMaker = <_ObjectifyElementMakerCaller>child - if elementMaker._element_factory is None: + if (<_ObjectifyElementMakerCaller>child)._element_factory is None: child = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: - child = elementMaker._element_factory( - (<_ObjectifyElementMakerCaller>child)._tag) + child = (<_ObjectifyElementMakerCaller>child). + _element_factory(( + <_ObjectifyElementMakerCaller>child)._tag) cetree.appendChild(element, child) else: pytype = python.PyDict_GetItem( From scoder at codespeak.net Wed Aug 22 22:51:53 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:51:53 +0200 (CEST) Subject: [Lxml-checkins] r45922 - lxml/trunk/src/lxml Message-ID: <20070822205153.97CC7812F@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:51:53 2007 New Revision: 45922 Modified: lxml/trunk/src/lxml/objectify.pyx Log: more cleanup, small fix for last commit Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:51:53 2007 @@ -1062,8 +1062,10 @@ self._nsmap = nsmap def __call__(self, *children, **attrib): + cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element + cdef _Element childElement if self._element_factory is None: element = cetree.makeElement( self._tag, None, objectify_parser, @@ -1079,17 +1081,17 @@ elif python._isString(child): _add_text(element, child) elif isinstance(child, _Element): - cetree.appendChild(element, child) + cetree.appendChild(element, <_Element>child) elif isinstance(child, _ObjectifyElementMakerCaller): - if (<_ObjectifyElementMakerCaller>child)._element_factory is None: - child = cetree.makeElement( + elementMaker = <_ObjectifyElementMakerCaller>child + if elementMaker._element_factory is None: + childElement = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: - child = (<_ObjectifyElementMakerCaller>child). - _element_factory(( - <_ObjectifyElementMakerCaller>child)._tag) - cetree.appendChild(element, child) + childElement = elementMaker._element_factory( + elementMaker._tag) + cetree.appendChild(element, childElement) else: pytype = python.PyDict_GetItem( _PYTYPE_DICT, _typename(child)) From scoder at codespeak.net Fri Aug 24 08:34:59 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 08:34:59 +0200 (CEST) Subject: [Lxml-checkins] r45940 - lxml/trunk/src/lxml Message-ID: <20070824063459.8A0B481B6@code0.codespeak.net> Author: scoder Date: Fri Aug 24 08:34:57 2007 New Revision: 45940 Modified: lxml/trunk/src/lxml/proxy.pxi lxml/trunk/src/lxml/python.pxd Log: avoid incref/decref around decrefing Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri Aug 24 08:34:57 2007 @@ -38,7 +38,7 @@ c_node = proxy._c_node assert c_node._private is proxy, "Tried to unregister unknown proxy" c_node._private = NULL - python.Py_DECREF(proxy._gc_doc) + python._Py_DECREF(proxy._gc_doc) ################################################################################ # temporarily make a node the root node of its document Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Aug 24 08:34:57 2007 @@ -10,6 +10,7 @@ cdef void Py_INCREF(object o) cdef void Py_DECREF(object o) + cdef void _Py_DECREF "Py_DECREF" (PyObject* o) cdef FILE* PyFile_AsFile(object p) cdef int PyFile_Check(object p) From scoder at codespeak.net Fri Aug 24 08:35:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 08:35:17 +0200 (CEST) Subject: [Lxml-checkins] r45941 - lxml/trunk/src/lxml Message-ID: <20070824063517.81CA481B6@code0.codespeak.net> Author: scoder Date: Fri Aug 24 08:35:17 2007 New Revision: 45941 Modified: lxml/trunk/src/lxml/etree.pyx Log: comment Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Fri Aug 24 08:35:17 2007 @@ -728,7 +728,7 @@ property attrib: """Element attribute dictionary. Where possible, use get(), set(), - keys() and items() to access element attributes. + keys(), values() and items() to access element attributes. """ def __get__(self): if self._attrib is None: From scoder at codespeak.net Fri Aug 24 10:11:48 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 10:11:48 +0200 (CEST) Subject: [Lxml-checkins] r45942 - lxml/trunk/src/lxml Message-ID: <20070824081148.8A16181BF@code0.codespeak.net> Author: scoder Date: Fri Aug 24 10:11:47 2007 New Revision: 45942 Modified: lxml/trunk/src/lxml/objectify.pyx lxml/trunk/src/lxml/pyclasslookup.pyx Log: docstring cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 24 10:11:47 2007 @@ -806,7 +806,7 @@ string value. It may be None in which case it is not considered for type guessing. - Example: + Example:: PyType('int', int, MyIntClass).register() Note that the order in which types are registered matters. The first Modified: lxml/trunk/src/lxml/pyclasslookup.pyx ============================================================================== --- lxml/trunk/src/lxml/pyclasslookup.pyx (original) +++ lxml/trunk/src/lxml/pyclasslookup.pyx Fri Aug 24 10:11:47 2007 @@ -246,15 +246,15 @@ cdef class PythonElementClassLookup(FallbackElementClassLookup): """Element class lookup based on a subclass method. - To use it, inherit from this class and override the method + To use it, inherit from this class and override the lookup method to + lookup the element class for a node:: lookup(self, document, node_proxy) - to lookup the element class for a node. The first argument is the opaque - document instance that contains the Element. The second arguments is a - lightweight Element proxy implementation that is only valid during the - lookup. Do not try to keep a reference to it. Once the lookup is done, the - proxy will be invalid. + The first argument is the opaque document instance that contains the + Element. The second arguments is a lightweight Element proxy + implementation that is only valid during the lookup. Do not try to keep a + reference to it. Once the lookup is done, the proxy will be invalid. If you return None from this method, the fallback will be called. """ From scoder at codespeak.net Fri Aug 24 10:12:26 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 10:12:26 +0200 (CEST) Subject: [Lxml-checkins] r45943 - in lxml/trunk: . doc src/lxml src/lxml/tests Message-ID: <20070824081226.A82EE81BF@code0.codespeak.net> Author: scoder Date: Fri Aug 24 10:12:26 2007 New Revision: 45943 Modified: lxml/trunk/CHANGES.txt lxml/trunk/doc/extensions.txt lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/tests/test_xpathevaluator.py Log: provide the context node and a propagated evaluation context dict to XPath functions Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Aug 24 10:12:26 2007 @@ -8,6 +8,11 @@ Features added -------------- +* XPath extension functions can now access the current context node + (``context.context_node``) and use a context dictionary + (``context.eval_context``) from the context provided in their first + parameter + * HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup`` * New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Fri Aug 24 10:12:26 2007 @@ -7,10 +7,9 @@ Here is how such a function looks like. As the first argument, it always -receives a dummy object. It is currently None, but do not rely on this as it -may become meaningful in later versions of lxml. The other arguments are -provided by the respective call in the XPath expression, one in the following -examples. Any number of arguments is allowed:: +receives a context object (see below). The other arguments are provided by +the respective call in the XPath expression, one in the following examples. +Any number of arguments is allowed:: >>> def hello(dummy, a): ... return "Hello %s" % a @@ -100,6 +99,40 @@ would rather complicate things than be of any help. +The XPath context +----------------- + +Functions get a context object as first parameter. In lxml 1.x, this value +was None, but since lxml 2.0 it provides two properties: ``eval_context`` and +``context_node``. The context node is the Element where the current function +is called:: + + >>> def print_tag(context, nodes): + ... print context.context_node.tag, [ n.tag for n in nodes ] + + >>> ns = etree.FunctionNamespace('http://mydomain.org/printtag') + >>> ns.prefix = "pt" + >>> ns["print_tag"] = print_tag + + >>> ignore = root.xpath("//*[pt:print_tag(.//*)]") + a ['b'] + b [] + +The ``eval_context`` is a dictionary that is local to the evaluation. It +allows functions to keep state:: + + >>> def print_context(context): + ... context.eval_context[context.context_node.tag] = "done" + ... entries = context.eval_context.items() + ... entries.sort() + ... print entries + >>> ns["print_context"] = print_context + + >>> ignore = root.xpath("//*[pt:print_context()]") + [('a', 'done')] + [('a', 'done'), ('b', 'done')] + + Evaluators and XSLT ------------------- @@ -238,9 +271,12 @@ What to return from a function ------------------------------ +.. _`XPath return values`: xpathxslt.html#xpath-return-values + Extension functions can return any data type for which there is an XPath -equivalent. This includes numbers, boolean values, elements and lists of -elements. Note that integers will also be returned as floats:: +equivalent (see the documentation on `XPath return values`). This includes +numbers, boolean values, elements and lists of elements. Note that integers +will also be returned as floats:: >>> def returnsFloat(_): ... return 1.7 Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Fri Aug 24 10:12:26 2007 @@ -36,6 +36,7 @@ cdef object _global_namespaces cdef object _utf_refs cdef object _function_cache + cdef object _eval_context_dict # for exception handling and temporary reference keeping: cdef _TempStore _temp_refs cdef _ExceptionContext _exc @@ -45,6 +46,7 @@ self._utf_refs = {} self._global_namespaces = [] self._function_cache = {} + self._eval_context_dict = None if extensions is not None: # convert extensions to UTF-8 @@ -123,6 +125,7 @@ #xpath.xmlXPathRegisteredNsCleanup(self._xpathCtxt) #self.unregisterGlobalNamespaces() python.PyDict_Clear(self._utf_refs) + self._eval_context_dict = None self._doc = None cdef _release_context(self): @@ -268,6 +271,31 @@ return dict_result return None + # Python access to the XPath context for extension functions + + property context_node: + def __get__(self): + cdef xmlNode* c_node + if self._xpathCtxt is NULL: + raise XPathError, \ + "XPath context is only usable during the evaluation" + c_node = self._xpathCtxt.node + if c_node is NULL: + raise XPathError, "no context node" + if c_node.doc != self._xpathCtxt.doc: + raise XPathError, \ + "document-external context nodes are not supported" + if self._doc is None: + raise XPathError, \ + "document context is missing" + return _elementFactory(self._doc, c_node) + + property eval_context: + def __get__(self): + if self._eval_context_dict is None: + self._eval_context_dict = {} + return self._eval_context_dict + # Python reference keeping during XPath function evaluation cdef _release_temp_refs(self): @@ -538,7 +566,7 @@ python.PyList_Append(args, o) python.PyList_Reverse(args) - res = function(None, *args) + res = function(context, *args) # wrap result for XPath consumption obj = _wrapXPathObject(res) # prevent Python from deallocating elements handed to libxml2 Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Fri Aug 24 10:12:26 2007 @@ -265,6 +265,72 @@ self.assertEquals('Dag', r[1].text) self.assertEquals('Honk', r[2].text) + def test_xpath_context_node(self): + tree = self.parse('') + + check_call = [] + def check_context(ctxt, nodes): + self.assertEquals(len(nodes), 1) + check_call.append(nodes[0].tag) + self.assertEquals(ctxt.context_node, nodes[0]) + return True + + find = etree.XPath("//*[p:foo(.)]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + find(tree) + + check_call.sort() + self.assertEquals(check_call, ["a", "b", "c", "root"]) + + def test_xpath_eval_context_propagation(self): + tree = self.parse('') + + check_call = {} + def check_context(ctxt, nodes): + self.assertEquals(len(nodes), 1) + tag = nodes[0].tag + # empty during the "b" call, a "b" during the "c" call + check_call[tag] = ctxt.eval_context.get("b") + ctxt.eval_context[tag] = tag + return True + + find = etree.XPath("//b[p:foo(.)]/c[p:foo(.)]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + result = find(tree) + + self.assertEquals(result, [tree.getroot()[1][0]]) + self.assertEquals(check_call, {'b':None, 'c':'b'}) + + def test_xpath_eval_context_clear(self): + tree = self.parse('') + + check_call = {} + def check_context(ctxt): + check_call["done"] = True + # context must be empty for each new evaluation + self.assertEquals(len(ctxt.eval_context), 0) + ctxt.eval_context["test"] = True + return True + + find = etree.XPath("//b[p:foo()]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + result = find(tree) + + self.assertEquals(result, [tree.getroot()[1]]) + self.assertEquals(check_call["done"], True) + + check_call.clear() + find = etree.XPath("//b[p:foo()]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + result = find(tree) + + self.assertEquals(result, [tree.getroot()[1]]) + self.assertEquals(check_call["done"], True) + def test_xpath_variables(self): x = self.parse('') e = etree.XPathEvaluator(x) From scoder at codespeak.net Fri Aug 24 10:33:06 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 10:33:06 +0200 (CEST) Subject: [Lxml-checkins] r45944 - lxml/trunk/src/lxml Message-ID: <20070824083306.9EBDF81D5@code0.codespeak.net> Author: scoder Date: Fri Aug 24 10:33:05 2007 New Revision: 45944 Modified: lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/proxy.pxi Log: another deallocation bug: order matters ... Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Fri Aug 24 10:33:05 2007 @@ -489,8 +489,9 @@ #print "trying to free node:", self._c_node #displayNode(self._c_node, 0) if self._c_node is not NULL: - unregisterProxy(self) + _unregisterProxy(self) attemptDeallocation(self._c_node) + _releaseProxy(self) # MANIPULATORS @@ -1157,7 +1158,7 @@ result = element_class() result._doc = doc result._c_node = c_node - registerProxy(result) + _registerProxy(result) if config.ENABLE_THREADING: python.PyThread_release_lock(ELEMENT_CREATION_LOCK) Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri Aug 24 10:33:05 2007 @@ -16,7 +16,7 @@ cdef int hasProxy(xmlNode* c_node): return c_node._private is not NULL -cdef registerProxy(_Element proxy): +cdef _registerProxy(_Element proxy): """Register a proxy and type for the node it's proxying for. """ cdef xmlNode* c_node @@ -31,14 +31,19 @@ proxy._gc_doc = proxy._doc python.Py_INCREF(proxy._doc) -cdef unregisterProxy(_Element proxy): +cdef _unregisterProxy(_Element proxy): """Unregister a proxy for the node it's proxying for. """ cdef xmlNode* c_node c_node = proxy._c_node assert c_node._private is proxy, "Tried to unregister unknown proxy" c_node._private = NULL - python._Py_DECREF(proxy._gc_doc) + +cdef _releaseProxy(_Element proxy): + """An additional DECREF for the document. + """ + if proxy._gc_doc is not NULL: + python._Py_DECREF(proxy._gc_doc) ################################################################################ # temporarily make a node the root node of its document From scoder at codespeak.net Fri Aug 24 23:02:31 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 23:02:31 +0200 (CEST) Subject: [Lxml-checkins] r45969 - lxml/trunk/src/lxml Message-ID: <20070824210231.613D7818A@code0.codespeak.net> Author: scoder Date: Fri Aug 24 23:02:30 2007 New Revision: 45969 Modified: lxml/trunk/src/lxml/proxy.pxi lxml/trunk/src/lxml/python.pxd Log: some cleanup Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri Aug 24 23:02:30 2007 @@ -16,14 +16,14 @@ cdef int hasProxy(xmlNode* c_node): return c_node._private is not NULL -cdef _registerProxy(_Element proxy): +cdef int _registerProxy(_Element proxy) except -1: """Register a proxy and type for the node it's proxying for. """ cdef xmlNode* c_node # cannot register for NULL c_node = proxy._c_node if c_node is NULL: - return + return 0 #print "registering for:", proxy._c_node assert c_node._private is NULL, "double registering proxy!" c_node._private = proxy @@ -31,7 +31,7 @@ proxy._gc_doc = proxy._doc python.Py_INCREF(proxy._doc) -cdef _unregisterProxy(_Element proxy): +cdef int _unregisterProxy(_Element proxy) except -1: """Unregister a proxy for the node it's proxying for. """ cdef xmlNode* c_node @@ -42,8 +42,8 @@ cdef _releaseProxy(_Element proxy): """An additional DECREF for the document. """ - if proxy._gc_doc is not NULL: - python._Py_DECREF(proxy._gc_doc) + python.Py_XDECREF(proxy._gc_doc) + proxy._gc_doc = NULL ################################################################################ # temporarily make a node the root node of its document Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Aug 24 23:02:30 2007 @@ -10,7 +10,7 @@ cdef void Py_INCREF(object o) cdef void Py_DECREF(object o) - cdef void _Py_DECREF "Py_DECREF" (PyObject* o) + cdef void Py_XDECREF(PyObject* o) cdef FILE* PyFile_AsFile(object p) cdef int PyFile_Check(object p) From scoder at codespeak.net Fri Aug 24 23:13:44 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 23:13:44 +0200 (CEST) Subject: [Lxml-checkins] r45970 - lxml/trunk/src/lxml Message-ID: <20070824211344.A76E0818A@code0.codespeak.net> Author: scoder Date: Fri Aug 24 23:13:44 2007 New Revision: 45970 Modified: lxml/trunk/src/lxml/etree.pyx Log: a little more threading robustness in Element factory Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Fri Aug 24 23:13:44 2007 @@ -1156,6 +1156,10 @@ result = NEW_ELEMENT(_Element) else: result = element_class() + if hasProxy(c_node): + # prevent re-entry race condition - we just called into Python + result._c_node = NULL + return getProxy(c_node) result._doc = doc result._c_node = c_node _registerProxy(result) From scoder at codespeak.net Mon Aug 27 20:05:35 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 27 Aug 2007 20:05:35 +0200 (CEST) Subject: [Lxml-checkins] r46056 - in lxml/trunk: doc src/lxml Message-ID: <20070827180535.EC23D8135@code0.codespeak.net> Author: scoder Date: Mon Aug 27 20:05:35 2007 New Revision: 46056 Modified: lxml/trunk/doc/objectify.txt lxml/trunk/src/lxml/objectify.pyx Log: always py-annotate when setting objectify values from Python types (not sure about bool strings yet) Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Aug 27 20:05:35 2007 @@ -643,25 +643,33 @@ >>> print objectify.dump(root) root = None [ObjectifiedElement] a = 'nice string!' [StringElement] + * py:pytype = 'str' >>> root.a = True >>> print objectify.dump(root) root = None [ObjectifiedElement] a = True [BoolElement] + * py:pytype = 'bool' >>> root.a = [1, 2, 3] >>> print objectify.dump(root) root = None [ObjectifiedElement] a = 1 [IntElement] + * py:pytype = 'int' a = 2 [IntElement] + * py:pytype = 'int' a = 3 [IntElement] + * py:pytype = 'int' >>> root.a = (1, 2, 3) >>> print objectify.dump(root) root = None [ObjectifiedElement] a = 1 [IntElement] + * py:pytype = 'int' a = 2 [IntElement] + * py:pytype = 'int' a = 3 [IntElement] + * py:pytype = 'int' Recursive string representation of elements @@ -887,34 +895,6 @@ * py:pytype = 'str' * myattr = 'someval' - >>> root.x = objectify.DataElement(5, _xsi="integer") - >>> print objectify.dump(root) - root = None [ObjectifiedElement] - x = 5L [LongElement] - * py:pytype = 'long' - * xsi:type = 'xsd:integer' - -There is a side effect of the type lookup. If you assign a string value using -attribute assignment and that string value turns out to be valid for any of -the type checks, you will end up with the resolved type instead of a -StringElement:: - - >>> root = objectify.Element("root") - >>> root.s = "5" - >>> print objectify.dump(root) - root = None [ObjectifiedElement] - s = 5 [IntElement] - -You can use the ``DataElement()`` factory to avoid this behaviour and thus -provide the type of a data element by hand:: - - >>> root = objectify.Element("root") - >>> root.s = objectify.DataElement(5, _pytype="str") - >>> print objectify.dump(root) - root = None [ObjectifiedElement] - s = '5' [StringElement] - * py:pytype = 'str' - Likewise, the data type can be provided as an XML Schema type using the _xsi argument of ``DataElement()``:: Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Mon Aug 27 20:05:35 2007 @@ -507,11 +507,15 @@ else: cetree.delAttributeFromNsName( element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") - if not python._isString(value): + if python._isString(value): + pytype_name = "str" + else: + pytype_name = _typename(value) if isinstance(value, bool): value = _lower_bool(value) else: value = str(value) + cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) cetree.setNodeText(element._c_node, value) ################################################################################ From scoder at codespeak.net Mon Aug 27 20:14:04 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 27 Aug 2007 20:14:04 +0200 (CEST) Subject: [Lxml-checkins] r46057 - lxml/trunk/doc Message-ID: <20070827181404.E25E78139@code0.codespeak.net> Author: scoder Date: Mon Aug 27 20:14:04 2007 New Revision: 46057 Modified: lxml/trunk/doc/objectify.txt Log: doc cleanup Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Aug 27 20:14:04 2007 @@ -895,16 +895,12 @@ * py:pytype = 'str' * myattr = 'someval' -Likewise, the data type can be provided as an XML Schema type using the _xsi -argument of ``DataElement()``:: - - >>> root = objectify.Element("root") - >>> root.s = objectify.DataElement(5, _xsi="string") + >>> root.x = objectify.DataElement(5, _xsi="integer") >>> print objectify.dump(root) root = None [ObjectifiedElement] - s = '5' [StringElement] - * py:pytype = 'str' - * xsi:type = 'xsd:string' + x = 5L [LongElement] + * py:pytype = 'long' + * xsi:type = 'xsd:integer' XML Schema types reside in the XML schema namespace thus ``DataElement()`` tries to correctly prefix the xsi:type attribute value for you:: From scoder at codespeak.net Tue Aug 28 08:33:15 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 28 Aug 2007 08:33:15 +0200 (CEST) Subject: [Lxml-checkins] r46066 - lxml/trunk/src/lxml/tests Message-ID: <20070828063315.AFE4B80A0@code0.codespeak.net> Author: scoder Date: Tue Aug 28 08:33:14 2007 New Revision: 46066 Modified: lxml/trunk/src/lxml/tests/test_objectify.py Log: changed objectify bool test case to reflect type persistance change Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Tue Aug 28 08:33:14 2007 @@ -539,13 +539,13 @@ Element = self.Element SubElement = self.etree.SubElement root = Element("{objectified}root") - root.bool = 'true' - self.assert_(isinstance(root.bool, objectify.BoolElement)) + root.bool = True self.assertEquals(root.bool, True) - - root.bool = 'false' self.assert_(isinstance(root.bool, objectify.BoolElement)) + + root.bool = False self.assertEquals(root.bool, False) + self.assert_(isinstance(root.bool, objectify.BoolElement)) def test_data_element_bool(self): value = objectify.DataElement(True) From scoder at codespeak.net Tue Aug 28 09:05:33 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 28 Aug 2007 09:05:33 +0200 (CEST) Subject: [Lxml-checkins] r46067 - lxml/trunk/src/lxml Message-ID: <20070828070533.4C4958146@code0.codespeak.net> Author: scoder Date: Tue Aug 28 09:05:32 2007 New Revision: 46067 Modified: lxml/trunk/src/lxml/objectify.pyx Log: only store pytype attributes for registered types Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Tue Aug 28 09:05:32 2007 @@ -499,6 +499,7 @@ _setElementValue(new_element, value) cdef _setElementValue(_Element element, value): + cdef python.PyObject* dict_result if value is None: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") @@ -515,7 +516,9 @@ value = _lower_bool(value) else: value = str(value) - cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) + dict_result = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) + if dict_result is not NULL: + cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) cetree.setNodeText(element._c_node, value) ################################################################################ From scoder at codespeak.net Tue Aug 28 09:07:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 28 Aug 2007 09:07:57 +0200 (CEST) Subject: [Lxml-checkins] r46068 - lxml/trunk/src/lxml Message-ID: <20070828070757.D994E8140@code0.codespeak.net> Author: scoder Date: Tue Aug 28 09:07:57 2007 New Revision: 46068 Modified: lxml/trunk/src/lxml/objectify.pyx Log: discard pytype attributes of unknown types when setting a new value Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Tue Aug 28 09:07:57 2007 @@ -519,6 +519,8 @@ dict_result = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) if dict_result is not NULL: cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) + else: + cetree.delAttribute(element, PYTYPE_ATTRIBUTE) cetree.setNodeText(element._c_node, value) ################################################################################ From lxml-checkins at codespeak.net Wed Aug 29 08:36:15 2007 From: lxml-checkins at codespeak.net (Hershel Egan) Date: Wed, 29 Aug 2007 08:36:15 +0200 Subject: [Lxml-checkins] (no subject) Message-ID: <01c7ea07$cbffd440$0ad0e3d5@lxml-checkins> New pharmacy shop: http://qfpexu.thoughpose.com/?626095362779 From scoder at codespeak.net Wed Aug 29 14:27:36 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 14:27:36 +0200 (CEST) Subject: [Lxml-checkins] r46155 - in lxml/trunk: . doc Message-ID: <20070829122736.C469D815A@code0.codespeak.net> Author: scoder Date: Wed Aug 29 14:27:34 2007 New Revision: 46155 Modified: lxml/trunk/CHANGES.txt lxml/trunk/doc/cssselect.txt lxml/trunk/doc/lxmlhtml.txt Log: doc updates Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Aug 29 14:27:34 2007 @@ -84,6 +84,9 @@ 1.3.4 (???) ================== +Features added +-------------- + * The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` and ``externalDTD`` that return a DTD object for the internal or external subset of the document respectively. @@ -92,9 +95,6 @@ part of the document, as well as comments and PIs that are siblings of the root node. -Features added --------------- - Bugs fixed ---------- Modified: lxml/trunk/doc/cssselect.txt ============================================================================== --- lxml/trunk/doc/cssselect.txt (original) +++ lxml/trunk/doc/cssselect.txt Wed Aug 29 14:27:34 2007 @@ -5,11 +5,12 @@ lxml supports a number of interesting languages for tree traversal and element selection. The most important is obviously XPath_, but there is also ObjectPath_ in the `lxml.objectify`_ module. The newest child of this family -is CSS selection, which is implemented in the new ``lxml.cssselect`` module. +is `CSS selection`_, which is implemented in the new ``lxml.cssselect`` module. .. _XPath: xpathxslt.html#xpath .. _ObjectPath: objectify.html#objectpath .. _`lxml.objectify`: objectify.html +.. _`CSS selection`: http://www.w3.org/TR/CSS21/selector.html .. contents:: .. Modified: lxml/trunk/doc/lxmlhtml.txt ============================================================================== --- lxml/trunk/doc/lxmlhtml.txt (original) +++ lxml/trunk/doc/lxmlhtml.txt Wed Aug 29 14:27:34 2007 @@ -88,7 +88,7 @@ readable diff in the output when a test fails. The HTML comparison is most easily used by importing the ``usedoctest`` module in a doctest:: - >>> from lxml.html import usedoctest + >>> import lxml.html.usedoctest Now, if you have a HTML document and want to compare it to an expected result document in a doctest, you can do the following:: From scoder at codespeak.net Wed Aug 29 14:28:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 14:28:29 +0200 (CEST) Subject: [Lxml-checkins] r46156 - lxml/trunk/doc Message-ID: <20070829122829.DAD97815A@code0.codespeak.net> Author: scoder Date: Wed Aug 29 14:28:29 2007 New Revision: 46156 Added: lxml/trunk/doc/lxml2.txt Modified: lxml/trunk/doc/mkhtml.py Log: new doc file: what's new in lxml 2.0 Added: lxml/trunk/doc/lxml2.txt ============================================================================== --- (empty file) +++ lxml/trunk/doc/lxml2.txt Wed Aug 29 14:28:29 2007 @@ -0,0 +1,150 @@ +======================= +What's new in lxml 2.0? +======================= + +.. contents:: +.. + 1 Changes in etree and objectify + 1.1 Incompatible changes + 1.2 Enhancements + 1.3 Other changes + 2 New modules + 2.1 lxml.html + 2.2 lxml.cssselect + 2.3 lxml.doctestcompare + + +During the development of the lxml 1.x series, a couple of quirks were +discovered in the design that made the API less obvious and its future +extensions harder than necessary. lxml 2.0 is a soft evolution of lxml 1.x +towards a simpler, more consistent and more powerful API - with some major +extensions. Wherever possible, lxml 1.3 comes close to the semantics of lxml +2.0, so that migrating should be easier for code that currently runs with 1.3. + + +Changes in etree and objectify +============================== + +A graduation towards a more consistent API cannot go without a certain amount +of incompatible changes. The following is a list of those differences that +applications need to take into account when migrating from lxml 1.x to lxml +2.0. + +Incompatible changes +-------------------- + +* lxml 0.9 introduced a feature called `namespace implementation`_. The + global ``Namespace`` factory was added to register custom element classes + and have lxml.etree look them up automatically. However, the later + development of further class lookup mechanisms made it appear less and less + adequate to register this mapping at a global level, so lxml 1.1 first + removed the namespace based lookup from the default setup and lxml 2.0 + finally removes the global namespace registry completely. As all other + lookup mechanisms, the namespace lookup is now local to a parser, including + the registry itself. Applications that use a module-level parser can easily + map its ``get_namespace()`` method to a global ``Namespace`` function to + mimic the old behaviour. + + .. _`namespace implementation`: element_classes.html#implementing-namespaces + +* XPath now raises exceptions specific to the part of the execution that + failed: ``XPathSyntaxError`` for parser errors and ``XPathEvalError`` for + errors that occurred during the evaluation. Note that the distinction only + works for the ``XPath()`` class. The other two evaluators only have a + single evaluation call that includes the parsing step, and will therefore + only raise an ``XPathEvalError``. Applications can catch both exceptions + through the common base class ``XPathError`` (which also exists in earlier + lxml versions). + +* Network access in parsers is now disabled by default, i.e. the + ``no_network`` option defaults to True. Due to a somewhat 'interesting' + implementation in libxml2, this does not affect the first document (i.e. the + URL that is parsed), but only subsequent documents, such as a DTD when + parsing with validation. This means that you will have to check the URL you + pass, instead of relying on lxml to prevent *any* access to external + resources. As this can be helpful in some use cases, lxml does not work + around it. + +* The type annotations in lxml.objectify (the ``pytype`` attribute) now use + ``NoneType`` for the None value as this is the correct Python type name. + Previously, lxml 1.x used a lower case ``?one``. + +* Another change in objectify regards the way it deals with ambiguous types. + Previously, setting a value like the string ``"3"`` through normal attribute + access would let it come back as an integer when reading the object + attribute. lxml 2.0 prevents this by always setting the ``pytype`` + attribute to the type the user passed in, so ``"3"`` will come back as a + string, while the number ``3`` will come back as a number. To remove the + type annotation on serialisation, you can use the ``deannotate()`` function. + +* The C-API function ``findOrBuildNodeNs()`` was replaced by the more generic + ``findOrBuildNodeNsPrefix()`` + + +Enhancements +------------ + +Most of the enhancements of lxml 2.0 were made under the hood. Most people +won't even notice them, but they make the maintenance of lxml easier and thus +facilitate further enhancements and an improved integration between lxml's +features. + +* lxml.objectify now has its own implementation of the ``E factory``. It uses + the built-in type lookup mechanism of lxml.objectify, thus removing the need + for an additional type registry mechanism (as previously available through + the ``typemap`` parameter). + +* XML entities are supported through the ``Entity()`` factory, an Entity + element class and a parser option ``resolve_entities`` that allows to keep + entities in the element tree when set to False. Also, the parser will now + report undefined entities as errors if it needs to resolve them (which is + still the default, as in lxml 1.x). + +* A major part of the XPath code was rewritten and can now benefit from a + bigger overlap with the XSLT code. The main benefits are improved thread + safety in the XPath evaluators and Python RegExp support in standard XPath. + + +New modules +=========== + +The most visible changes in lxml 2.0 regard the new modules that were added. + + +lxml.usedoctest +--------------- + +A very useful module for doctests based on XML or HTML is +``lxml.doctestcompare``. It provides a relaxed comparison mechanism for XML +and HTML in doctests. Using it is as simple as:: + + >>> import lxml.usedoctest + +for XML comparisons and:: + + >>> import lxml.html.usedoctest + +for HTML comparisons. + + +lxml.html +--------- + +The largest new package that was added to lxml 2.0 is `lxml.html`_. It +contains various tools and modules for HTML handling. The major features +include support for cleaning up HTML (removing unwanted content), a readable +HTML diff and various tools for working with links. + +.. _`lxml.html`: lxmlhtml.html + + +lxml.cssselect +-------------- + +The Cascading Stylesheet Language (CSS_) has a very short and generic path +language for pointing at elements in XML/HTML trees (`CSS selectors`_). The module +lxml.cssselect_ provides an implementation based on XPath. + +.. _lxml.cssselect: cssselect.html +.. _CSS: http://www.w3.org/Style/CSS/ +.. _`CSS selectors`: http://www.w3.org/TR/CSS21/selector.html Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Wed Aug 29 14:28:29 2007 @@ -2,8 +2,8 @@ import os, shutil, re, sys, copy, time SITE_STRUCTURE = [ - ('lxml', ('main.txt', 'intro.txt', 'FAQ.txt', 'compatibility.txt', - 'performance.txt', 'build.txt')), + ('lxml', ('main.txt', 'intro.txt', 'lxml2.txt', 'FAQ.txt', + 'compatibility.txt', 'performance.txt', 'build.txt')), ('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt', 'validation.txt', 'xpathxslt.txt', 'objectify.txt', 'lxmlhtml.txt', From scoder at codespeak.net Wed Aug 29 14:28:59 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 14:28:59 +0200 (CEST) Subject: [Lxml-checkins] r46157 - lxml/trunk/src/lxml Message-ID: <20070829122859.B15BE815A@code0.codespeak.net> Author: scoder Date: Wed Aug 29 14:28:58 2007 New Revision: 46157 Modified: lxml/trunk/src/lxml/objectify.pyx Log: cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 29 14:28:58 2007 @@ -67,9 +67,6 @@ cdef object islice from itertools import islice -cdef object _ElementMaker -from builder import ElementMaker as _ElementMaker - cdef object _typename(object t): cdef char* c_name cdef char* s From scoder at codespeak.net Wed Aug 29 14:29:42 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 14:29:42 +0200 (CEST) Subject: [Lxml-checkins] r46158 - in lxml/trunk: . src/lxml Message-ID: <20070829122942.2F0CE8162@code0.codespeak.net> Author: scoder Date: Wed Aug 29 14:29:41 2007 New Revision: 46158 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/builder.py Log: let ElementMaker accept 'namespace' and 'nsmap' keywords as in objectify Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Aug 29 14:29:41 2007 @@ -87,6 +87,10 @@ Features added -------------- +* The ``ElementMaker`` in ``lxml.builder`` now accepts the keyword arguments + ``namespace`` and ``nsmap`` to set a namespace and nsmap for the Elements it + creates. + * The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` and ``externalDTD`` that return a DTD object for the internal or external subset of the document respectively. Modified: lxml/trunk/src/lxml/builder.py ============================================================================== --- lxml/trunk/src/lxml/builder.py (original) +++ lxml/trunk/src/lxml/builder.py Wed Aug 29 14:29:41 2007 @@ -121,7 +121,18 @@ """ - def __init__(self, typemap=None, makeelement=None): + def __init__(self, typemap=None, + namespace=None, nsmap=None, makeelement=None): + if namespace is not None: + self._namespace = '{' + namespace + '}' + else: + self._namespace = None + + if nsmap: + self._nsmap = dict(nsmap) + else: + self._nsmap = None + if makeelement is not None: assert callable(makeelement) self._makeelement = makeelement @@ -160,7 +171,9 @@ def __call__(self, tag, *children, **attrib): get = self._typemap.get - elem = self._makeelement(tag) + if self._namespace is not None and tag[0] != '{': + tag = self._namespace + tag + elem = self._makeelement(tag, nsmap=self._nsmap) if attrib: get(dict)(elem, attrib) From scoder at codespeak.net Wed Aug 29 14:30:15 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 14:30:15 +0200 (CEST) Subject: [Lxml-checkins] r46159 - in lxml/branch/lxml-1.3: . src/lxml Message-ID: <20070829123015.BBF578084@code0.codespeak.net> Author: scoder Date: Wed Aug 29 14:30:14 2007 New Revision: 46159 Modified: lxml/branch/lxml-1.3/CHANGES.txt lxml/branch/lxml-1.3/src/lxml/builder.py Log: let ElementMaker accept 'namespace' and 'nsmap' keywords as in objectify Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Wed Aug 29 14:30:14 2007 @@ -8,6 +8,10 @@ Features added -------------- +* The ``ElementMaker`` in ``lxml.builder`` now accepts the keyword arguments + ``namespace`` and ``nsmap`` to set a namespace and nsmap for the Elements it + creates. + * The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` and ``externalDTD`` that return a DTD object for the internal or external subset of the document respectively. Modified: lxml/branch/lxml-1.3/src/lxml/builder.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/builder.py (original) +++ lxml/branch/lxml-1.3/src/lxml/builder.py Wed Aug 29 14:30:14 2007 @@ -121,7 +121,18 @@ """ - def __init__(self, typemap=None, makeelement=None): + def __init__(self, typemap=None, + namespace=None, nsmap=None, makeelement=None): + if namespace is not None: + self._namespace = '{' + namespace + '}' + else: + self._namespace = None + + if nsmap: + self._nsmap = dict(nsmap) + else: + self._nsmap = None + if makeelement is not None: assert callable(makeelement) self._makeelement = makeelement @@ -160,7 +171,9 @@ def __call__(self, tag, *children, **attrib): get = self._typemap.get - elem = self._makeelement(tag) + if self._namespace is not None and tag[0] != '{': + tag = self._namespace + tag + elem = self._makeelement(tag, nsmap=self._nsmap) if attrib: get(dict)(elem, attrib) From scoder at codespeak.net Wed Aug 29 18:17:01 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 18:17:01 +0200 (CEST) Subject: [Lxml-checkins] r46165 - lxml/trunk/doc Message-ID: <20070829161701.206A48121@code0.codespeak.net> Author: scoder Date: Wed Aug 29 18:16:59 2007 New Revision: 46165 Modified: lxml/trunk/doc/tutorial.txt Log: show how to use namespaces in ElementMaker Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Wed Aug 29 18:16:59 2007 @@ -497,6 +497,11 @@ The Element creation based on attribute access makes it easy to build up a simple vocabulary for an XML language:: + >>> from lxml.builder import ElementMaker + + >>> E = ElementMaker(namespace="http://my.de/fault/namespace", + ... nsmap={'p' : "http://my.de/fault/namespace"}) + >>> DOC = E.doc >>> TITLE = E.title >>> SECTION = E.section @@ -516,18 +521,18 @@ ... ) >>> print etree.tostring(my_doc, pretty_print=True) - - The dog and the hog -
- The dog - Once upon a time, ... - And then ... -
-
- The hog - Sooner or later ... -
-
+ + The dog and the hog + + The dog + Once upon a time, ... + And then ... + + + The hog + Sooner or later ... + + One such example is the module ``lxml.html.builder``, which provides a vocabulary for HTML. From scoder at codespeak.net Wed Aug 29 19:59:26 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 19:59:26 +0200 (CEST) Subject: [Lxml-checkins] r46168 - lxml/trunk Message-ID: <20070829175926.892C9816E@code0.codespeak.net> Author: scoder Date: Wed Aug 29 19:59:24 2007 New Revision: 46168 Modified: lxml/trunk/Makefile Log: remove generated docs on 'make clean' Modified: lxml/trunk/Makefile ============================================================================== --- lxml/trunk/Makefile (original) +++ lxml/trunk/Makefile Wed Aug 29 19:59:24 2007 @@ -52,7 +52,7 @@ clean: find . \( -name '*.o' -o -name '*.c' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; - rm -rf build + rm -rf build doc/html/api realclean: clean rm -f TAGS From scoder at codespeak.net Wed Aug 29 20:06:56 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 20:06:56 +0200 (CEST) Subject: [Lxml-checkins] r46169 - lxml/trunk/doc Message-ID: <20070829180656.012B0817C@code0.codespeak.net> Author: scoder Date: Wed Aug 29 20:06:55 2007 New Revision: 46169 Modified: lxml/trunk/doc/main.txt Log: link to API docs Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Wed Aug 29 20:06:55 2007 @@ -37,8 +37,8 @@ .. _FAQ: FAQ.html -This page describes the current in-development version of lxml that will -eventually become lxml 2.0. +**This page describes the current in-development version of lxml that will +become lxml 2.0.** Documentation @@ -58,6 +58,8 @@ * `lxml.etree specific API`_ documentation + * the `generated API documentation`_ + * parsing_ and validating_ XML * `XPath and XSLT`_ support @@ -105,6 +107,7 @@ .. _cElementTree: http://effbot.org/zone/celementtree.htm .. _`lxml.etree Tutorial`: tutorial.html +.. _`generated API documentation`: api/index.html .. _`benchmark results`: performance.html .. _`compatibility`: compatibility.html .. _`lxml.etree specific API`: api.html From scoder at codespeak.net Wed Aug 29 20:47:09 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 20:47:09 +0200 (CEST) Subject: [Lxml-checkins] r46171 - lxml/trunk Message-ID: <20070829184709.4E3A58121@code0.codespeak.net> Author: scoder Date: Wed Aug 29 20:47:07 2007 New Revision: 46171 Modified: lxml/trunk/INSTALL.txt Log: doc updates Modified: lxml/trunk/INSTALL.txt ============================================================================== --- lxml/trunk/INSTALL.txt (original) +++ lxml/trunk/INSTALL.txt Wed Aug 29 20:47:07 2007 @@ -42,9 +42,12 @@ If you want to build lxml from SVN you should read `how to build lxml from source`_ (or the file ``build.txt`` in the ``doc`` directory of the source -tree). Both the subversion sources and the source distribution ship with an -adapted version of Pyrex, so you do not need Pyrex installed. +tree). Building from Subversion sources or from modified distribution sources +requires Cython_ to translate the lxml sources into C code. The source +distribution ships with pre-generated C source files, so you do not need +Cython installed to build from release sources. +.. _Cython: http://www.cython.org .. _`how to build lxml from source`: build.html If you have read these instructions and still cannot manage to install lxml, From scoder at codespeak.net Wed Aug 29 22:25:55 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 22:25:55 +0200 (CEST) Subject: [Lxml-checkins] r46172 - in lxml/branch/lxml-1.3: . doc Message-ID: <20070829202555.5317C8183@code0.codespeak.net> Author: scoder Date: Wed Aug 29 22:25:53 2007 New Revision: 46172 Modified: lxml/branch/lxml-1.3/doc/main.txt lxml/branch/lxml-1.3/version.txt Log: prepare release of 1.3.4 Modified: lxml/branch/lxml-1.3/doc/main.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/main.txt (original) +++ lxml/branch/lxml-1.3/doc/main.txt Wed Aug 29 22:25:53 2007 @@ -130,7 +130,7 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 1.3.3`_, released 2007-07-26 (`changes for 1.3.3`_). +The latest version is `lxml 1.3.4`_, released 2007-08-29 (`changes for 1.3.4`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions @@ -191,6 +191,8 @@ Old Versions ------------ +* `lxml 1.3.3`_, released 2007-07-26 (`changes for 1.3.3`_) + * `lxml 1.3.2`_, released 2007-07-03 (`changes for 1.3.2`_) * lxml 1.3.1, released 2007-07-02 (`changes for 1.3.1`_) @@ -233,6 +235,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 1.3.4`: lxml-1.3.4.tgz .. _`lxml 1.3.3`: lxml-1.3.3.tgz .. _`lxml 1.3.2`: lxml-1.3.2.tgz .. _`lxml 1.3`: lxml-1.3.tgz @@ -255,6 +258,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 1.3.4`: changes-1.3.4.html .. _`changes for 1.3.3`: changes-1.3.3.html .. _`changes for 1.3.2`: changes-1.3.2.html .. _`changes for 1.3.1`: changes-1.3.1.html Modified: lxml/branch/lxml-1.3/version.txt ============================================================================== --- lxml/branch/lxml-1.3/version.txt (original) +++ lxml/branch/lxml-1.3/version.txt Wed Aug 29 22:25:53 2007 @@ -1 +1 @@ -1.3.3 +1.3.4 From scoder at codespeak.net Wed Aug 29 22:27:38 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 22:27:38 +0200 (CEST) Subject: [Lxml-checkins] r46173 - lxml/branch/lxml-1.3 Message-ID: <20070829202738.1549A8183@code0.codespeak.net> Author: scoder Date: Wed Aug 29 22:27:37 2007 New Revision: 46173 Modified: lxml/branch/lxml-1.3/Makefile Log: remove API docs on 'make clean' Modified: lxml/branch/lxml-1.3/Makefile ============================================================================== --- lxml/branch/lxml-1.3/Makefile (original) +++ lxml/branch/lxml-1.3/Makefile Wed Aug 29 22:27:37 2007 @@ -51,7 +51,7 @@ clean: find . \( -name '*.o' -o -name '*.c' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; - rm -rf build + rm -rf build doc/html/api realclean: clean rm -f TAGS From scoder at codespeak.net Wed Aug 29 22:28:25 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 22:28:25 +0200 (CEST) Subject: [Lxml-checkins] r46174 - lxml/branch/lxml-1.3 Message-ID: <20070829202825.57E41818E@code0.codespeak.net> Author: scoder Date: Wed Aug 29 22:28:24 2007 New Revision: 46174 Modified: lxml/branch/lxml-1.3/CHANGES.txt Log: prepare release of 1.3.4 Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Wed Aug 29 22:28:24 2007 @@ -2,8 +2,8 @@ lxml changelog ============== -Under development -================= +1.3.4 (2007-07-29) +================== Features added -------------- From scoder at codespeak.net Wed Aug 29 22:30:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 29 Aug 2007 22:30:29 +0200 (CEST) Subject: [Lxml-checkins] r46175 - lxml/branch/lxml-1.3/src/lxml Message-ID: <20070829203029.5C2D0818E@code0.codespeak.net> Author: scoder Date: Wed Aug 29 22:30:28 2007 New Revision: 46175 Modified: lxml/branch/lxml-1.3/src/lxml/objectify.pyx Log: small fix after last merge Modified: lxml/branch/lxml-1.3/src/lxml/objectify.pyx ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/objectify.pyx (original) +++ lxml/branch/lxml-1.3/src/lxml/objectify.pyx Wed Aug 29 22:30:28 2007 @@ -1651,7 +1651,7 @@ typemap[__builtin__.float] = __add_text typemap[__builtin__.bool] = __add_text - _ElementMaker.__init__(self, typemap, objectify_parser.makeelement) + _ElementMaker.__init__(self, typemap, makeelement=objectify_parser.makeelement) def __add_text(_Element elem not None, text): cdef tree.xmlNode* c_child From scoder at codespeak.net Thu Aug 30 11:17:04 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 11:17:04 +0200 (CEST) Subject: [Lxml-checkins] r46180 - lxml/branch/lxml-1.3 Message-ID: <20070830091704.9055881BF@code0.codespeak.net> Author: scoder Date: Thu Aug 30 11:17:02 2007 New Revision: 46180 Modified: lxml/branch/lxml-1.3/Makefile Log: delete API docs only when regenerating them Modified: lxml/branch/lxml-1.3/Makefile ============================================================================== --- lxml/branch/lxml-1.3/Makefile (original) +++ lxml/branch/lxml-1.3/Makefile Thu Aug 30 11:17:02 2007 @@ -36,6 +36,7 @@ html: inplace mkdir -p doc/html PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . `cat version.txt` + rm -fr doc/html/api [ -x "`which epydoc`" ] \ && (cd src && PYTHONPATH=. epydoc -o ../doc/html/api --name lxml --url http://codespeak.net/lxml/ lxml/) \ || (echo "not generating epydoc API documentation") @@ -51,7 +52,7 @@ clean: find . \( -name '*.o' -o -name '*.c' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; - rm -rf build doc/html/api + rm -rf build realclean: clean rm -f TAGS From scoder at codespeak.net Thu Aug 30 11:17:43 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 11:17:43 +0200 (CEST) Subject: [Lxml-checkins] r46181 - lxml/trunk Message-ID: <20070830091743.5C53081CA@code0.codespeak.net> Author: scoder Date: Thu Aug 30 11:17:42 2007 New Revision: 46181 Modified: lxml/trunk/Makefile Log: delete API docs only when regenerating them Modified: lxml/trunk/Makefile ============================================================================== --- lxml/trunk/Makefile (original) +++ lxml/trunk/Makefile Thu Aug 30 11:17:42 2007 @@ -36,6 +36,7 @@ html: inplace mkdir -p doc/html PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . `cat version.txt` + rm -fr doc/html/api @[ -x "`which epydoc`" ] \ && (cd src && echo "Generating API docs ..." && \ PYTHONPATH=. epydoc -o ../doc/html/api --name lxml --url http://codespeak.net/lxml/ lxml/) \ @@ -52,7 +53,7 @@ clean: find . \( -name '*.o' -o -name '*.c' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; - rm -rf build doc/html/api + rm -rf build realclean: clean rm -f TAGS From scoder at codespeak.net Thu Aug 30 11:33:35 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 11:33:35 +0200 (CEST) Subject: [Lxml-checkins] r46182 - in lxml/branch/lxml-1.3: . doc Message-ID: <20070830093335.85AC881CC@code0.codespeak.net> Author: scoder Date: Thu Aug 30 11:33:34 2007 New Revision: 46182 Modified: lxml/branch/lxml-1.3/CHANGES.txt lxml/branch/lxml-1.3/doc/main.txt Log: release date Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Thu Aug 30 11:33:34 2007 @@ -2,7 +2,7 @@ lxml changelog ============== -1.3.4 (2007-07-29) +1.3.4 (2007-08-30) ================== Features added Modified: lxml/branch/lxml-1.3/doc/main.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/main.txt (original) +++ lxml/branch/lxml-1.3/doc/main.txt Thu Aug 30 11:33:34 2007 @@ -130,7 +130,7 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 1.3.4`_, released 2007-08-29 (`changes for 1.3.4`_). +The latest version is `lxml 1.3.4`_, released 2007-08-30 (`changes for 1.3.4`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions From scoder at codespeak.net Thu Aug 30 14:29:15 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 14:29:15 +0200 (CEST) Subject: [Lxml-checkins] r46190 - lxml/trunk Message-ID: <20070830122915.C972981BC@code0.codespeak.net> Author: scoder Date: Thu Aug 30 14:29:14 2007 New Revision: 46190 Modified: lxml/trunk/CHANGES.txt Log: release updates Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Aug 30 14:29:14 2007 @@ -81,7 +81,7 @@ * Network access in parsers disabled by default -1.3.4 (???) +1.3.4 (2007-08-30) ================== Features added @@ -107,6 +107,15 @@ Other changes ------------- +* lxml now raises a TagNameWarning about tag names containing ':' instead of + an Error as 1.3.3 did. The reason is that a number of projects currently + misuse the previous lack of tag name validation to generate namespace + prefixes without declaring namespaces. Apart from the danger of generating + broken XML this way, it also breaks most of the namespace-aware tools in + XML, including XPath, XSLT and validation. lxml 1.3.x will continue to + support this bug with a Warning, while lxml 2.0 will be strict about + well-formed tag names (not only regarding ':'). + * Serialising an Element no longer includes its comment and PI siblings (only ElementTree serialisation includes them). From scoder at codespeak.net Thu Aug 30 14:31:56 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 14:31:56 +0200 (CEST) Subject: [Lxml-checkins] r46191 - lxml/trunk/doc Message-ID: <20070830123156.DB3A981BC@code0.codespeak.net> Author: scoder Date: Thu Aug 30 14:31:55 2007 New Revision: 46191 Modified: lxml/trunk/doc/build.txt Log: require Cython 0.9.6.5 Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Thu Aug 30 14:31:55 2007 @@ -33,10 +33,13 @@ be an lxml developer, you do need a working Cython installation. You can use EasyInstall_ to install it:: - easy_install Cython + easy_install Cython==0.9.6.5 .. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall +lxml currently requires Cython 0.9.6.5, but it should work with later +versions. + Subversion ---------- From scoder at codespeak.net Thu Aug 30 14:41:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 14:41:57 +0200 (CEST) Subject: [Lxml-checkins] r46194 - lxml/trunk/src/lxml/tests Message-ID: <20070830124157.F289681C4@code0.codespeak.net> Author: scoder Date: Thu Aug 30 14:41:53 2007 New Revision: 46194 Modified: lxml/trunk/src/lxml/tests/test_objectify.py Log: objectify E factory tests Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Thu Aug 30 14:41:53 2007 @@ -670,6 +670,18 @@ self.assertEquals(value.text, None) self.assertEquals(value.pyval, None) + def test_data_element_pytype_none_compat(self): + # pre-2.0 lxml called NoneElement "none" + pyval = 1 + pytype = "none" + objclass = objectify.NoneElement + value = objectify.DataElement(pyval, _pytype=pytype) + self.assert_(isinstance(value, objclass), + "DataElement(%s, _pytype='%s') returns %s, expected %s" + % (pyval, pytype, type(value), objclass)) + self.assertEquals(value.text, None) + self.assertEquals(value.pyval, None) + def test_schema_types(self): XML = self.XML root = XML('''\ @@ -1658,6 +1670,67 @@ etree.tostring(new_root), etree.tostring(root)) + # E-Factory tests, need to use sub-elements as root element is always + # type-looked-up as ObjectifiedElement (no annotations) + def test_efactory_int(self): + E = objectify.E + root = E.root(E.val(23)) + self.assert_(isinstance(root.val, objectify.IntElement)) + + def test_efactory_long(self): + E = objectify.E + root = E.root(E.val(23L)) + self.assert_(isinstance(root.val, objectify.IntElement)) + + def test_efactory_float(self): + E = objectify.E + root = E.root(E.val(233.23)) + self.assert_(isinstance(root.val, objectify.FloatElement)) + + def test_efactory_str(self): + E = objectify.E + root = E.root(E.val("what?")) + self.assert_(isinstance(root.val, objectify.StringElement)) + + def test_efactory_unicode(self): + E = objectify.E + root = E.root(E.val(unicode("bl??dy h?ll", encoding="ISO-8859-1"))) + self.assert_(isinstance(root.val, objectify.StringElement)) + + def test_efactory_bool(self): + E = objectify.E + root = E.root(E.val(True)) + self.assert_(isinstance(root.val, objectify.BoolElement)) + + def test_efactory_none(self): + E = objectify.E + root = E.root(E.val(None)) + self.assert_(isinstance(root.val, objectify.NoneElement)) + + def test_efactory_value_concatenation(self): + E = objectify.E + root = E.root(E.val(1, "foo", 2.0, "bar ", True, None)) + self.assert_(isinstance(root.val, objectify.StringElement)) + + def test_efactory_attrib(self): + E = objectify.E + root = E.root(foo="bar") + self.assertEquals(root.get("foo"), "bar") + + def test_efactory_nested(self): + E = objectify.E + DataElement = objectify.DataElement + root = E.root("text", E.sub(E.subsub()), "tail", DataElement(1), + DataElement(2.0)) + self.assert_(isinstance(root, objectify.ObjectifiedElement)) + self.assertEquals(root.text, "text") + self.assert_(isinstance(root.sub, objectify.ObjectifiedElement)) + self.assertEquals(root.sub.tail, "tail") + self.assert_(isinstance(root.sub.subsub, objectify.StringElement)) + self.assertEquals(len(root.value), 2) + self.assert_(isinstance(root.value[0], objectify.IntElement)) + self.assert_(isinstance(root.value[1], objectify.FloatElement)) + def test_suite(): suite = unittest.TestSuite() suite.addTests([unittest.makeSuite(ObjectifyTestCase)]) From scoder at codespeak.net Thu Aug 30 22:02:19 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 22:02:19 +0200 (CEST) Subject: [Lxml-checkins] r46201 - lxml/trunk/doc Message-ID: <20070830200219.2870381DA@code0.codespeak.net> Author: scoder Date: Thu Aug 30 22:02:17 2007 New Revision: 46201 Modified: lxml/trunk/doc/FAQ.txt Log: section on threading problems in certain environments Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Thu Aug 30 22:02:17 2007 @@ -20,6 +20,7 @@ 1.3 What standards does lxml implement? 1.4 What is the difference between lxml.etree and lxml.objectify? 1.5 How can I make my application run faster? + 1.6 What about that trailing text on serialised Elements? 2 Installation 2.1 Which version of libxml2 and libxslt should I use or require? 2.2 Where are the Windows binaries? @@ -34,6 +35,7 @@ 5.1 Can I use threads to concurrently access the lxml API? 5.2 Does my program run faster if I use threads? 5.3 Would my single-threaded program run faster if I turned off threading? + 5.4 My program crashes when run with mod_python/Pyro/Zope/Plone/... 6 Parsing and Serialisation 6.1 Why doesn't the ``pretty_print`` option reformat my XML output? 6.2 Why can't lxml parse my XML from unicode strings? @@ -318,7 +320,7 @@ really within lxml (or libxml2 or libxslt): a) If your application (or e.g. your web container) uses threads, please see - the FAQ section on threading to check if you touch on one of the + the FAQ section on threading_ to check if you touch on one of the potential pitfalls. b) If you are on Mac-OS X, make sure lxml uses the correct libraries. If you @@ -387,7 +389,7 @@ serialize the access to each of them, so it is better to copy() parsers or to use the default parser. Note that access to the XML() and HTML() functions is always serialized. If you need to parse concurrently from strings, use -``parse()`` with ``StringIO``. +``parse()`` with ``StringIO`` or pass a separate parser to these functions. Due to the way libxslt handles threading, concurrent access to stylesheets is currently only possible if it was parsed in the main thread. Parsing and @@ -430,6 +432,38 @@ lxml from source. +My program crashes when run with mod_python/Pyro/Zope/Plone/... +--------------------------------------------------------------- + +These environments can use threads in a way that may not make it obvious what +happens, and thus make it hard to ensure lxml's threading support is used in a +reliable way. If you encounter crashes in these systems, but your code runs +perfectly when started by hand, try one of the following:: + +* compile lxml without threading support by passing the + ``--without-threading`` option. While this might be a bit slower in certain + scenarios on well equiped systems, it might also keep your application from + crashing, which should be worth more to you than peek performance. lxml is + fast anyway, so concurrency may not even be worth it. + +* avoid doing fancy XSLT stuff like foreign document access or passing in + Elements trough XSLT variables. This might work, but if you try, you're + really pushing it. + +* try copying trees at suspicious places and working with those instead of a + tree shared between threads. A good candidate might be the result of an + XSLT or the stylesheet itself. + +* try keeping thread-local copies of XSLT stylesheets, i.e. one per thread, + instead of sharing one. + +* you can try to serialise suspicious parts of your code with explicit locks, + thus disabling the concurrency of the runtime system. + +* report back on the mailing list to see if there are other ways to work + around your specific problems. + + Parsing and Serialisation ========================= From scoder at codespeak.net Thu Aug 30 22:02:58 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 30 Aug 2007 22:02:58 +0200 (CEST) Subject: [Lxml-checkins] r46202 - lxml/trunk/src/lxml Message-ID: <20070830200258.86A2381E7@code0.codespeak.net> Author: scoder Date: Thu Aug 30 22:02:56 2007 New Revision: 46202 Modified: lxml/trunk/src/lxml/objectify.pyx Log: faster E factory instantiation Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Thu Aug 30 22:02:56 2007 @@ -1036,6 +1036,12 @@ ################################################################################ # adapted ElementMaker supports registered PyTypes +cdef class _ObjectifyElementMakerCaller # forward declaration + +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for fast instantiation + cdef _ObjectifyElementMakerCaller NEW_ELEMENT_MAKER "PY_NEW" (object t) + cdef class ElementMaker: cdef object _makeelement cdef object _namespace @@ -1053,19 +1059,19 @@ self._makeelement = None def __getattr__(self, tag): - if tag[0] != "{" and self._namespace is not None: + cdef _ObjectifyElementMakerCaller element_maker + if self._namespace is not None and tag[0] != "{": tag = self._namespace + tag - return _ObjectifyElementMakerCaller( - self._makeelement, tag, self._nsmap) + element_maker = NEW_ELEMENT_MAKER(_ObjectifyElementMakerCaller) + element_maker._tag = tag + element_maker._nsmap = self._nsmap + element_maker._element_factory = self._makeelement + return element_maker cdef class _ObjectifyElementMakerCaller: cdef object _tag cdef object _nsmap cdef object _element_factory - def __init__(self, element_factory, tag, nsmap): - self._element_factory = element_factory - self._tag = tag - self._nsmap = nsmap def __call__(self, *children, **attrib): cdef _ObjectifyElementMakerCaller elementMaker From scoder at codespeak.net Fri Aug 31 09:41:00 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 09:41:00 +0200 (CEST) Subject: [Lxml-checkins] r46206 - lxml/trunk/doc Message-ID: <20070831074100.0FB0E819B@code0.codespeak.net> Author: scoder Date: Fri Aug 31 09:40:59 2007 New Revision: 46206 Modified: lxml/trunk/doc/FAQ.txt Log: thread crash FAQ update Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Fri Aug 31 09:40:59 2007 @@ -327,8 +327,11 @@ have updated the old system libraries (e.g. through fink), this is best achieved by building lxml statically to prevent the different library versions from interfering. If you choose to use a dynamically linked - version, make sure the ``DYLD_LIBRARY_PATH`` environment variable - contains the directory where you installed the libraries. + version, make sure the ``DYLD_LIBRARY_PATH`` environment variable contains + the directory where you installed the libraries. To make sure the correct + libraries are used, print the module level version numbers that + ``lxml.etree`` provides from *within* your application rather than relying + on what your operating system tells you. In any case, try to reproduce the problem with the latest versions of libxml2 and libxslt. From time to time, bugs and race conditions are found @@ -435,20 +438,36 @@ My program crashes when run with mod_python/Pyro/Zope/Plone/... --------------------------------------------------------------- -These environments can use threads in a way that may not make it obvious what -happens, and thus make it hard to ensure lxml's threading support is used in a -reliable way. If you encounter crashes in these systems, but your code runs -perfectly when started by hand, try one of the following:: - -* compile lxml without threading support by passing the - ``--without-threading`` option. While this might be a bit slower in certain - scenarios on well equiped systems, it might also keep your application from - crashing, which should be worth more to you than peek performance. lxml is - fast anyway, so concurrency may not even be worth it. +These environments can use threads in a way that may not make it obvious when +threads are created and what happens in which thread. This makes it hard to +ensure lxml's threading support is used in a reliable way. Sadly, if problems +arise, they are as diverse as the applications, so it is difficult to provide +any generally applicable solution. Also, these environments are so complex +that problems become hard to debug and even harder to reproduce in a +predictable way. If you encounter crashes in one these systems, but your code +runs perfectly when started by hand, the following gives you a few hints for +possible approaches to solve your specific problem:: + +* make sure you use recent versions of libxml2, libxslt and lxml. The libxml2 + developers keep fixing bugs in each release, and lxml also tries to become + more robust against possible pitfalls. So newer versions might already fix + your problem in a reliable way. + +* make sure the library versions you installed are really used. Do not rely + on what your operating system tells you! Print the version constants in + ``lxml.etree`` from within your runtime environment to make sure it is the + case. This is especially a problem under MacOS-X when newer library + versions were installed in addition to the outdated system libraries. + +* compile lxml without threading support by running ``setup.py`` with the + ``--without-threading`` option. While this might be slower in certain + scenarios on multi-processor systems, it *might* also keep your application + from crashing, which should be worth more to you than peek performance. + Remember that lxml is fast anyway, so concurrency may not even be worth it. * avoid doing fancy XSLT stuff like foreign document access or passing in - Elements trough XSLT variables. This might work, but if you try, you're - really pushing it. + subtrees trough XSLT variables. This might or might not work, depending on + your specific usage. * try copying trees at suspicious places and working with those instead of a tree shared between threads. A good candidate might be the result of an @@ -457,11 +476,12 @@ * try keeping thread-local copies of XSLT stylesheets, i.e. one per thread, instead of sharing one. -* you can try to serialise suspicious parts of your code with explicit locks, - thus disabling the concurrency of the runtime system. +* you can try to serialise suspicious parts of your code with explicit thread + locks, thus disabling the concurrency of the runtime system. * report back on the mailing list to see if there are other ways to work - around your specific problems. + around your specific problems. Do not forget to report the version numbers + of lxml, libxml2 and libxslt you are using. Parsing and Serialisation From scoder at codespeak.net Fri Aug 31 09:43:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 09:43:29 +0200 (CEST) Subject: [Lxml-checkins] r46207 - lxml/trunk/src/lxml Message-ID: <20070831074329.B9684819B@code0.codespeak.net> Author: scoder Date: Fri Aug 31 09:43:28 2007 New Revision: 46207 Modified: lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/etreepublic.pxd lxml/trunk/src/lxml/objectify.pyx lxml/trunk/src/lxml/public-api.pxi Log: new _makeSubElement() C-function to make the SubElement() factory available at the C level and as makeSubElement() in the public C-API Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Aug 31 09:43:28 2007 @@ -105,6 +105,8 @@ elif c_doc is NULL: c_doc = _newDoc() c_node = _createElement(c_doc, name_utf) + if c_node is NULL: + return python.PyErr_NoMemory() try: if text is not None: _setNodeText(c_node, text) @@ -129,6 +131,47 @@ tree.xmlFreeDoc(c_doc) raise +cdef _Element _makeSubElement(_Element parent, tag, text, tail, + attrib, nsmap, extra_attrs): + """Create a new child element and initialize text content, namespaces and + attributes. + + This helper function will reuse as much of the existing document as + possible: + + If 'parser' is None, the parser will be inherited from 'doc' or the + default parser will be used. + + If 'doc' is None, 'c_doc' is used to create a new _Document and the new + element is made its root node. + + If 'c_doc' is also NULL, a new xmlDoc will be created. + """ + cdef _Document doc + cdef xmlNode* c_node + cdef xmlDoc* c_doc + if parent is None or parent._doc is None: + return None + ns_utf, name_utf = _getNsTag(tag) + _tagValidOrRaise(name_utf) + doc = parent._doc + c_doc = doc._c_doc + + c_node = _createElement(c_doc, name_utf) + if c_node is NULL: + return python.PyErr_NoMemory() + tree.xmlAddChild(parent._c_node, c_node) + + if text is not None: + _setNodeText(c_node, text) + if tail is not None: + _setTailText(c_node, tail) + + # add namespaces to node if necessary + doc._setNodeNamespaces(c_node, ns_utf, nsmap) + _initNodeAttributes(c_node, doc, attrib, extra_attrs) + return _elementFactory(doc, c_node) + cdef _initNodeAttributes(xmlNode* c_node, _Document doc, attrib, extra): """Initialise the attributes of an element node. """ Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Fri Aug 31 09:43:28 2007 @@ -1980,17 +1980,7 @@ """Subelement factory. This function creates an element instance, and appends it to an existing element. """ - cdef xmlNode* c_node - cdef _Document doc - ns_utf, name_utf = _getNsTag(_tag) - _tagValidOrRaise(name_utf) - doc = _parent._doc - c_node = _createElement(doc._c_doc, name_utf) - tree.xmlAddChild(_parent._c_node, c_node) - # add namespaces to node if necessary - doc._setNodeNamespaces(c_node, ns_utf, nsmap) - _initNodeAttributes(c_node, doc, attrib, _extra) - return _elementFactory(doc, c_node) + return _makeSubElement(_parent, _tag, None, None, attrib, nsmap, _extra) def ElementTree(_Element element=None, file=None, _BaseParser parser=None): """ElementTree wrapper class. Modified: lxml/trunk/src/lxml/etreepublic.pxd ============================================================================== --- lxml/trunk/src/lxml/etreepublic.pxd (original) +++ lxml/trunk/src/lxml/etreepublic.pxd Fri Aug 31 09:43:28 2007 @@ -63,6 +63,11 @@ cdef _Element makeElement(tag, _Document doc, parser, text, tail, attrib, nsmap) + # create a new SubElement for an existing parent + # builds Python object after setting text, tail, namespaces and attributes + cdef _Element makeSubElement(_Element parent, tag, text, tail, + attrib, nsmap) + # deep copy a node to include it in the Document cdef _Element deepcopyNodeToDocument(_Document doc, tree.xmlNode* c_root) Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 31 09:43:28 2007 @@ -1097,13 +1097,12 @@ elif isinstance(child, _ObjectifyElementMakerCaller): elementMaker = <_ObjectifyElementMakerCaller>child if elementMaker._element_factory is None: - childElement = cetree.makeElement( - elementMaker._tag, element._doc, objectify_parser, - None, None, None, None) + cetree.makeSubElement(element, elementMaker._tag, + None, None, None, None) else: childElement = elementMaker._element_factory( elementMaker._tag) - cetree.appendChild(element, childElement) + cetree.appendChild(element, childElement) else: pytype = python.PyDict_GetItem( _PYTYPE_DICT, _typename(child)) Modified: lxml/trunk/src/lxml/public-api.pxi ============================================================================== --- lxml/trunk/src/lxml/public-api.pxi (original) +++ lxml/trunk/src/lxml/public-api.pxi Fri Aug 31 09:43:28 2007 @@ -25,6 +25,10 @@ text, tail, attrib, nsmap): return _makeElement(tag, NULL, doc, parser, text, tail, attrib, nsmap, None) +cdef public _Element makeSubElement(_Element parent, tag, text, tail, + attrib, nsmap): + return _makeSubElement(parent, tag, text, tail, attrib, nsmap, None) + cdef public void setElementClassLookupFunction( _element_class_lookup_function function, state): _setElementClassLookupFunction(function, state) From scoder at codespeak.net Fri Aug 31 09:45:04 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 09:45:04 +0200 (CEST) Subject: [Lxml-checkins] r46208 - lxml/trunk Message-ID: <20070831074504.1E315819B@code0.codespeak.net> Author: scoder Date: Fri Aug 31 09:45:03 2007 New Revision: 46208 Modified: lxml/trunk/CHANGES.txt Log: changelog update Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Aug 31 09:45:03 2007 @@ -8,6 +8,9 @@ Features added -------------- +* New ``makeSubElement()`` C-API function that allows creating a new + subelement straight with text, tail and attributes. + * XPath extension functions can now access the current context node (``context.context_node``) and use a context dictionary (``context.eval_context``) from the context provided in their first From scoder at codespeak.net Fri Aug 31 09:45:53 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 09:45:53 +0200 (CEST) Subject: [Lxml-checkins] r46209 - lxml/trunk Message-ID: <20070831074553.A0348819B@code0.codespeak.net> Author: scoder Date: Fri Aug 31 09:45:51 2007 New Revision: 46209 Modified: lxml/trunk/setup.py Log: make clear we depend on Cython 0.9.6.5 (no hard build dependency) Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Fri Aug 31 09:45:51 2007 @@ -7,7 +7,7 @@ except pkg_resources.VersionConflict, e: from ez_setup import use_setuptools use_setuptools(version="0.6c5") - #pkg_resources.require("Cython>=0.9.6.4") + #pkg_resources.require("Cython==0.9.6.5") from setuptools import setup except ImportError: # no setuptools installed From scoder at codespeak.net Fri Aug 31 09:54:06 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 09:54:06 +0200 (CEST) Subject: [Lxml-checkins] r46210 - in lxml/trunk: . doc Message-ID: <20070831075406.99DA28192@code0.codespeak.net> Author: scoder Date: Fri Aug 31 09:54:05 2007 New Revision: 46210 Modified: lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt lxml/trunk/version.txt Log: prepare release of 2.0alpha1 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Aug 31 09:54:05 2007 @@ -2,8 +2,8 @@ lxml changelog ============== -Under development -================= +2.0alpha1 (2007-08-31) +====================== Features added -------------- Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Fri Aug 31 09:54:05 2007 @@ -138,8 +138,8 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 1.3.2`_, released 2007-07-03 (`changes for 1.3.2`_). -`Older versions`_ are listed below. +The latest version is `lxml 2.0alpha1`_, released 2007-08-31 +(`changes for 2.0alpha1`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions @@ -199,6 +199,12 @@ Old Versions ------------ +* `lxml 1.3.4`_, released 2007-08-30 (`changes for 1.3.4`_) + +* `lxml 1.3.3`_, released 2007-07-26 (`changes for 1.3.3`_) + +* `lxml 1.3.2`_, released 2007-07-03 (`changes for 1.3.2`_) + * lxml 1.3.1, released 2007-07-02 (`changes for 1.3.1`_) * `lxml 1.3`_, released 2007-06-24 (`changes for 1.3`_) @@ -239,6 +245,9 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.0alpha1`: lxml-2.0alpha1.tgz +.. _`lxml 1.3.4`: lxml-1.3.4.tgz +.. _`lxml 1.3.3`: lxml-1.3.3.tgz .. _`lxml 1.3.2`: lxml-1.3.2.tgz .. _`lxml 1.3`: lxml-1.3.tgz .. _`lxml 1.2.1`: lxml-1.2.1.tgz @@ -260,6 +269,9 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.0alpha1`: changes-2.0alpha1.html +.. _`changes for 1.3.4`: changes-1.3.4.html +.. _`changes for 1.3.3`: changes-1.3.3.html .. _`changes for 1.3.2`: changes-1.3.2.html .. _`changes for 1.3.1`: changes-1.3.1.html .. _`changes for 1.3`: changes-1.3.html Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Fri Aug 31 09:54:05 2007 @@ -1 +1 @@ -2.0dev +2.0alpha1 From scoder at codespeak.net Fri Aug 31 11:20:53 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 11:20:53 +0200 (CEST) Subject: [Lxml-checkins] r46212 - lxml/trunk Message-ID: <20070831092053.6DB8381C4@code0.codespeak.net> Author: scoder Date: Fri Aug 31 11:20:51 2007 New Revision: 46212 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: removed Pyrex from svn:externals and source distribution Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Fri Aug 31 11:20:51 2007 @@ -8,9 +8,6 @@ recursive-include src/lxml etree.c objectify.c pyclasslookup.c etree.h etree_defs.h recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd recursive-include benchmark *.py -recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc -include Pyrex/__init__.py -recursive-include Pyrex/Compiler *.py -recursive-include Pyrex/Distutils *.py +recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png include doc/mkhtml.py doc/rest2html.py exclude doc/pyrex.txt src/lxml/etree.pxi From scoder at codespeak.net Fri Aug 31 19:01:54 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 19:01:54 +0200 (CEST) Subject: [Lxml-checkins] r46219 - in lxml/trunk: . fake_pyrex fake_pyrex/Pyrex fake_pyrex/Pyrex/Distutils Message-ID: <20070831170154.01B3A820A@code0.codespeak.net> Author: scoder Date: Fri Aug 31 19:01:53 2007 New Revision: 46219 Added: lxml/trunk/fake_pyrex/ lxml/trunk/fake_pyrex/Pyrex/ lxml/trunk/fake_pyrex/Pyrex/Distutils/ lxml/trunk/fake_pyrex/Pyrex/Distutils/__init__.py lxml/trunk/fake_pyrex/Pyrex/Distutils/build_ext.py lxml/trunk/fake_pyrex/Pyrex/__init__.py Modified: lxml/trunk/MANIFEST.in lxml/trunk/setup.py lxml/trunk/setupinfo.py Log: switch to Cython completely, currently requires fake Pyrex to satisfy setuptools Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Fri Aug 31 19:01:53 2007 @@ -9,5 +9,6 @@ recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd recursive-include benchmark *.py recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png +recursive-include fake_pyrex *.py include doc/mkhtml.py doc/rest2html.py exclude doc/pyrex.txt src/lxml/etree.pxi Added: lxml/trunk/fake_pyrex/Pyrex/Distutils/__init__.py ============================================================================== --- (empty file) +++ lxml/trunk/fake_pyrex/Pyrex/Distutils/__init__.py Fri Aug 31 19:01:53 2007 @@ -0,0 +1 @@ +# work around broken setuptools monkey patching Added: lxml/trunk/fake_pyrex/Pyrex/Distutils/build_ext.py ============================================================================== --- (empty file) +++ lxml/trunk/fake_pyrex/Pyrex/Distutils/build_ext.py Fri Aug 31 19:01:53 2007 @@ -0,0 +1 @@ +build_ext = "yes, it's there!" Added: lxml/trunk/fake_pyrex/Pyrex/__init__.py ============================================================================== --- (empty file) +++ lxml/trunk/fake_pyrex/Pyrex/__init__.py Fri Aug 31 19:01:53 2007 @@ -0,0 +1 @@ +# work around broken setuptools monkey patching Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Fri Aug 31 19:01:53 2007 @@ -1,25 +1,28 @@ import sys, os +extra_options = {} + +try: + import Cython + # may need to work around setuptools bug by providing a fake Pyrex + sys.path.insert(0, os.path.join(os.path.dirname(__file__), "fake_pyrex")) +except ImportError: + pass + try: import pkg_resources try: pkg_resources.require("setuptools>=0.6c5") - except pkg_resources.VersionConflict, e: + except pkg_resources.VersionConflict: from ez_setup import use_setuptools use_setuptools(version="0.6c5") #pkg_resources.require("Cython==0.9.6.5") from setuptools import setup + extra_options["zip_safe"] = False except ImportError: # no setuptools installed from distutils.core import setup -try: - import Cython -except ImportError: - # need to insert this to python path so we're sure we can import versioninfo, - # setupinfo and Cython/Pyrex (!) even if we start setup.py from another location - # (such as a buildout) - sys.path.insert(0, os.path.dirname(__file__)) import versioninfo import setupinfo @@ -31,11 +34,15 @@ STATIC_LIBRARY_DIRS = [] STATIC_CFLAGS = [] + # create lxml-version.h file svn_version = versioninfo.svn_version() versioninfo.create_version_h(svn_version) print "Building lxml version", svn_version + +extra_options.update(setupinfo.extra_setup_args()) + setup( name = "lxml", version = versioninfo.version(), @@ -86,8 +93,7 @@ package_dir = {'': 'src'}, packages = ['lxml', 'lxml.html'], - zip_safe = False, ext_modules = setupinfo.ext_modules( STATIC_INCLUDE_DIRS, STATIC_LIBRARY_DIRS, STATIC_CFLAGS), - **setupinfo.extra_setup_args() + **extra_options ) Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Fri Aug 31 19:01:53 2007 @@ -1,20 +1,12 @@ import sys, os -try: - from setuptools.extension import Extension -except ImportError: - from distutils.extension import Extension +from distutils.core import Extension try: from Cython.Distutils import build_ext as build_pyx print "Building with Cython." - PYREX_INSTALLED = True + CYTHON_INSTALLED = True except ImportError: - try: - from Pyrex.Distutils import build_ext as build_pyx - print "Trying to build with Pyrex." - PYREX_INSTALLED = True - except ImportError: - PYREX_INSTALLED = False + CYTHON_INSTALLED = False EXT_MODULES = [ ("etree", "lxml.etree"), @@ -27,10 +19,10 @@ return value.split(os.pathsep) def ext_modules(static_include_dirs, static_library_dirs, static_cflags): - if PYREX_INSTALLED: + if CYTHON_INSTALLED: source_extension = ".pyx" else: - print ("NOTE: Trying to build without Pyrex, pre-generated " + print ("NOTE: Trying to build without Cython, pre-generated " "'src/lxml/etree.c' needs to be available.") source_extension = ".c" @@ -67,7 +59,7 @@ def extra_setup_args(): result = {} - if PYREX_INSTALLED: + if CYTHON_INSTALLED: result['cmdclass'] = {'build_ext': build_pyx} return result From scoder at codespeak.net Fri Aug 31 19:06:25 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 19:06:25 +0200 (CEST) Subject: [Lxml-checkins] r46220 - lxml/trunk/src/lxml Message-ID: <20070831170625.43D048209@code0.codespeak.net> Author: scoder Date: Fri Aug 31 19:06:24 2007 New Revision: 46220 Modified: lxml/trunk/src/lxml/apihelpers.pxi Log: cleanup Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Aug 31 19:06:24 2007 @@ -159,7 +159,7 @@ c_node = _createElement(c_doc, name_utf) if c_node is NULL: - return python.PyErr_NoMemory() + python.PyErr_NoMemory() tree.xmlAddChild(parent._c_node, c_node) if text is not None: From scoder at codespeak.net Fri Aug 31 19:07:27 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 19:07:27 +0200 (CEST) Subject: [Lxml-checkins] r46221 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20070831170727.E17998209@code0.codespeak.net> Author: scoder Date: Fri Aug 31 19:07:26 2007 New Revision: 46221 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/tests/test_xslt.py lxml/trunk/src/lxml/xslt.pxi Log: preliminary implementation of deep copy support for XSLT Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Aug 31 19:07:26 2007 @@ -8,6 +8,8 @@ Features added -------------- +* XSLT objects now support deep copying + * New ``makeSubElement()`` C-API function that allows creating a new subelement straight with text, tail and attributes. Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Fri Aug 31 19:07:26 2007 @@ -92,7 +92,7 @@ _regexp = _ExsltRegExp() _regexp._register_in_context(self) - cdef _copy(self): + cdef _BaseContext _copy(self): cdef _BaseContext context if self._namespaces is not None: namespaces = self._namespaces[:] Modified: lxml/trunk/src/lxml/tests/test_xslt.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xslt.py (original) +++ lxml/trunk/src/lxml/tests/test_xslt.py Fri Aug 31 19:07:26 2007 @@ -4,7 +4,7 @@ Test cases related to XSLT processing """ -import unittest +import unittest, copy from common_imports import etree, StringIO, HelperTestCase, fileInTestDir from common_imports import doctest @@ -49,6 +49,41 @@ self.assertRaises( etree.XSLTParseError, etree.XSLT, style) + + def test_xslt_copy(self): + tree = self.parse('
BC') + style = self.parse('''\ + + + + + +''') + + transform = etree.XSLT(style) + res = transform(tree) + self.assertEquals('''\ + +B +''', + str(res)) + + transform_copy = copy.deepcopy(transform) + res = transform_copy(tree) + self.assertEquals('''\ + +B +''', + str(res)) + + transform = etree.XSLT(style) + res = transform(tree) + self.assertEquals('''\ + +B +''', + str(res)) def test_xslt_utf8(self): tree = self.parse(u'\uF8D2\uF8D2') Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Fri Aug 31 19:07:26 2007 @@ -55,6 +55,12 @@ self._parser = parser self._c_style_doc = NULL + cdef _XSLTResolverContext _copy(self): + cdef _XSLTResolverContext context + context = _XSLTResolverContext(self._parser) + context._c_style_doc = _copyDoc(self._c_style_doc, 1) + return context + cdef xmlDoc* _xslt_resolve_stylesheet(char* c_uri, void* context): cdef xmlDoc* c_doc c_doc = (<_XSLTResolverContext>context)._c_style_doc @@ -337,6 +343,26 @@ """ return str(result_tree) + def __deepcopy__(self, memo): + return self.__copy__() + + def __copy__(self): + cdef XSLT new_xslt + cdef xmlDoc* c_doc + new_xslt = NEW_XSLT(XSLT) + new_xslt._access_control = self._access_control + new_xslt._error_log = _ErrorLog() + new_xslt._context = self._context._copy() + new_xslt._xslt_resolver_context = self._xslt_resolver_context._copy() + + c_doc = _copyDoc(self._c_style.doc, 1) + new_xslt._c_style = xslt.xsltParseStylesheetDoc(c_doc) + if new_xslt._c_style is NULL: + tree.xmlFreeDoc(c_doc) + python.PyErr_NoMemory() + + return new_xslt + def __call__(self, _input, profile_run=False, **_kw): cdef _XSLTContext context cdef _Document input_doc @@ -463,6 +489,10 @@ return c_result +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for instantiation without calling __init__() + cdef XSLT NEW_XSLT "PY_NEW" (object t) + cdef class _XSLTResultTree(_ElementTree): cdef XSLT _xslt cdef _Document _profile From scoder at codespeak.net Fri Aug 31 19:09:02 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 19:09:02 +0200 (CEST) Subject: [Lxml-checkins] r46222 - lxml/trunk/src/lxml Message-ID: <20070831170902.58D65820D@code0.codespeak.net> Author: scoder Date: Fri Aug 31 19:09:01 2007 New Revision: 46222 Modified: lxml/trunk/src/lxml/objectify.pyx Log: fix: wrong function call for deleting attribute Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 31 19:09:01 2007 @@ -517,7 +517,8 @@ if dict_result is not NULL: cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) else: - cetree.delAttribute(element, PYTYPE_ATTRIBUTE) + cetree.delAttributeFromNsName(element._c_node, PYTYPE_NAMESPACE, + PYTYPE_ATTRIBUTE_NAME) cetree.setNodeText(element._c_node, value) ################################################################################ From scoder at codespeak.net Fri Aug 31 19:53:04 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 19:53:04 +0200 (CEST) Subject: [Lxml-checkins] r46223 - lxml/trunk/src/lxml Message-ID: <20070831175304.EE4AD81FA@code0.codespeak.net> Author: scoder Date: Fri Aug 31 19:53:03 2007 New Revision: 46223 Modified: lxml/trunk/src/lxml/objectify.pyx Log: apply pytype annotation in objectify.E factory Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 31 19:53:03 2007 @@ -1048,6 +1048,8 @@ cdef object _namespace cdef object _nsmap def __init__(self, namespace=None, nsmap=None, makeelement=None): + if nsmap is None: + nsmap = _DEFAULT_NSMAP self._nsmap = nsmap if namespace is None: self._namespace = None @@ -1079,6 +1081,8 @@ cdef python.PyObject* pytype cdef _Element element cdef _Element childElement + cdef int has_children + cdef int has_string_value if self._element_factory is None: element = cetree.makeElement( self._tag, None, objectify_parser, @@ -1086,6 +1090,8 @@ else: element = self._element_factory(self._tag, attrib, self._nsmap) + has_children = 0 + has_string_value = 0 for child in children: if child is None: if python.PyTuple_GET_SIZE(children) == 1: @@ -1093,26 +1099,40 @@ element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif python._isString(child): _add_text(element, child) + has_string_value = 1 elif isinstance(child, _Element): cetree.appendChild(element, <_Element>child) + has_children = 1 elif isinstance(child, _ObjectifyElementMakerCaller): elementMaker = <_ObjectifyElementMakerCaller>child if elementMaker._element_factory is None: cetree.makeSubElement(element, elementMaker._tag, - None, None, None, None) + None, None, None) else: childElement = elementMaker._element_factory( elementMaker._tag) cetree.appendChild(element, childElement) + has_children = 1 else: - pytype = python.PyDict_GetItem( - _PYTYPE_DICT, _typename(child)) + if pytype_name is not None: + # concatenation makes the result a string + has_string_value = 1 + pytype_name = _typename(child) + pytype = python.PyDict_GetItem(_PYTYPE_DICT, pytype_name) if pytype is not NULL: (pytype)._add_text(element, child) else: + has_string_value = 1 child = str(child) _add_text(element, child) + if not has_children: + if has_string_value: + cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, "str") + elif pytype_name is not None: + cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, + pytype_name) + return element cdef _add_text(_Element elem, text): From scoder at codespeak.net Fri Aug 31 19:56:08 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 19:56:08 +0200 (CEST) Subject: [Lxml-checkins] r46224 - lxml/trunk/src/lxml Message-ID: <20070831175608.B19ED81FA@code0.codespeak.net> Author: scoder Date: Fri Aug 31 19:56:06 2007 New Revision: 46224 Modified: lxml/trunk/src/lxml/objectify.pyx Log: fix Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 31 19:56:06 2007 @@ -1107,7 +1107,7 @@ elementMaker = <_ObjectifyElementMakerCaller>child if elementMaker._element_factory is None: cetree.makeSubElement(element, elementMaker._tag, - None, None, None) + None, None, None, None) else: childElement = elementMaker._element_factory( elementMaker._tag) From scoder at codespeak.net Fri Aug 31 21:32:28 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 21:32:28 +0200 (CEST) Subject: [Lxml-checkins] r46227 - in lxml/trunk/src/lxml: . tests Message-ID: <20070831193228.D156981DB@code0.codespeak.net> Author: scoder Date: Fri Aug 31 21:32:27 2007 New Revision: 46227 Modified: lxml/trunk/src/lxml/objectify.pyx lxml/trunk/src/lxml/tests/test_objectify.py Log: fix namespace setup of objectify.E factory Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 31 21:32:27 2007 @@ -1084,9 +1084,7 @@ cdef int has_children cdef int has_string_value if self._element_factory is None: - element = cetree.makeElement( - self._tag, None, objectify_parser, - None, None, attrib, self._nsmap) + element = _makeElement(self._tag, None, attrib, self._nsmap) else: element = self._element_factory(self._tag, attrib, self._nsmap) @@ -1917,13 +1915,13 @@ parser = objectify_parser return _parse(f, parser) -E = ElementMaker() - cdef object _DEFAULT_NSMAP _DEFAULT_NSMAP = { "py" : PYTYPE_NAMESPACE, "xsi" : XML_SCHEMA_INSTANCE_NS, "xsd" : XML_SCHEMA_NS} +E = ElementMaker() + def Element(_tag, attrib=None, nsmap=None, _pytype=None, **_attributes): """Objectify specific version of the lxml.etree Element() factory that always creates a structural (tree) element. Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Fri Aug 31 21:32:27 2007 @@ -1680,7 +1680,7 @@ def test_efactory_long(self): E = objectify.E root = E.root(E.val(23L)) - self.assert_(isinstance(root.val, objectify.IntElement)) + self.assert_(isinstance(root.val, objectify.LongElement)) def test_efactory_float(self): E = objectify.E From scoder at codespeak.net Fri Aug 31 21:49:11 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 21:49:11 +0200 (CEST) Subject: [Lxml-checkins] r46228 - lxml/trunk/src/lxml Message-ID: <20070831194911.90990820D@code0.codespeak.net> Author: scoder Date: Fri Aug 31 21:49:06 2007 New Revision: 46228 Modified: lxml/trunk/src/lxml/objectify.pyx Log: annotate with the original type in objectify.DataElement if no type name was passed explicitly Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 31 21:49:06 2007 @@ -2030,28 +2030,20 @@ strval = str(_value) if _pytype is None: - for type_check, pytype in _TYPE_CHECKS: - try: - type_check(strval) - _pytype = (pytype).name - break - except IGNORABLE_ERRORS: - pass - if _pytype is None: - _pytype = "str" - else: - # check if type information from arguments is valid - dict_result = python.PyDict_GetItem(_PYTYPE_DICT, _pytype) - if dict_result is not NULL: - type_check = (dict_result).type_check - if type_check is not None: - type_check(strval) - + _pytype = _typename(_value) + if _pytype is not None: if _pytype == "NoneType" or _pytype == "none": strval = None python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") else: - python.PyDict_SetItem(_attributes, PYTYPE_ATTRIBUTE, _pytype) + # check if type information from arguments is valid + dict_result = python.PyDict_GetItem(_PYTYPE_DICT, _pytype) + if dict_result is not NULL: + type_check = (dict_result).type_check + if type_check is not None: + type_check(strval) + + python.PyDict_SetItem(_attributes, PYTYPE_ATTRIBUTE, _pytype) return _makeElement("value", strval, _attributes, nsmap) From scoder at codespeak.net Fri Aug 31 21:49:30 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 31 Aug 2007 21:49:30 +0200 (CEST) Subject: [Lxml-checkins] r46229 - lxml/trunk/src/lxml/tests Message-ID: <20070831194930.01C87820E@code0.codespeak.net> Author: scoder Date: Fri Aug 31 21:49:29 2007 New Revision: 46229 Modified: lxml/trunk/src/lxml/tests/test_objectify.py Log: some more test cases for objectify Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Fri Aug 31 21:49:29 2007 @@ -560,36 +560,60 @@ Element = self.Element SubElement = self.etree.SubElement root = Element("{objectified}root") - root.none = "test" - self.assert_(isinstance(root.none, objectify.StringElement)) + root.s = "test" + self.assert_(isinstance(root.s, objectify.StringElement)) + + def test_type_str_intliteral(self): + Element = self.Element + SubElement = self.etree.SubElement + root = Element("{objectified}root") + root.s = "3" + self.assert_(isinstance(root.s, objectify.StringElement)) + + def test_type_str_floatliteral(self): + Element = self.Element + SubElement = self.etree.SubElement + root = Element("{objectified}root") + root.s = "3.72" + self.assert_(isinstance(root.s, objectify.StringElement)) def test_type_str_mul(self): Element = self.Element SubElement = self.etree.SubElement root = Element("{objectified}root") - root.none = "test" + root.s = "test" - self.assertEquals("test" * 5, root.none * 5) - self.assertEquals(5 * "test", 5 * root.none) + self.assertEquals("test" * 5, root.s * 5) + self.assertEquals(5 * "test", 5 * root.s) - self.assertRaises(TypeError, operator.mul, root.none, "honk") - self.assertRaises(TypeError, operator.mul, "honk", root.none) + self.assertRaises(TypeError, operator.mul, root.s, "honk") + self.assertRaises(TypeError, operator.mul, "honk", root.s) def test_type_str_add(self): Element = self.Element SubElement = self.etree.SubElement root = Element("{objectified}root") - root.none = "test" + root.s = "test" s = "toast" - self.assertEquals("test" + s, root.none + s) - self.assertEquals(s + "test", s + root.none) + self.assertEquals("test" + s, root.s + s) + self.assertEquals(s + "test", s + root.s) def test_data_element_str(self): value = objectify.DataElement("test") self.assert_(isinstance(value, objectify.StringElement)) self.assertEquals(value, "test") + def test_data_element_str_intliteral(self): + value = objectify.DataElement("3") + self.assert_(isinstance(value, objectify.StringElement)) + self.assertEquals(value, "3") + + def test_data_element_str_floatliteral(self): + value = objectify.DataElement("3.20") + self.assert_(isinstance(value, objectify.StringElement)) + self.assertEquals(value, "3.20") + def test_type_int(self): Element = self.Element SubElement = self.etree.SubElement @@ -669,19 +693,25 @@ % (pyval, pytype, type(value), objclass)) self.assertEquals(value.text, None) self.assertEquals(value.pyval, None) - - def test_data_element_pytype_none_compat(self): - # pre-2.0 lxml called NoneElement "none" - pyval = 1 - pytype = "none" - objclass = objectify.NoneElement - value = objectify.DataElement(pyval, _pytype=pytype) - self.assert_(isinstance(value, objclass), - "DataElement(%s, _pytype='%s') returns %s, expected %s" - % (pyval, pytype, type(value), objclass)) - self.assertEquals(value.text, None) - self.assertEquals(value.pyval, None) - + + def test_type_unregistered(self): + Element = self.Element + SubElement = self.etree.SubElement + class MyFloat(float): + pass + root = Element("{objectified}root") + root.myfloat = MyFloat(5.5) + self.assert_(isinstance(root.myfloat, objectify.FloatElement)) + self.assertEquals(root.myfloat.get(objectify.PYTYPE_ATTRIBUTE), None) + + def test_data_element_unregistered(self): + class MyFloat(float): + pass + value = objectify.DataElement(MyFloat(5.5)) + self.assert_(isinstance(value, objectify.FloatElement)) + self.assertEquals(value, 5.5) + self.assertEquals(value.get(objectify.PYTYPE_ATTRIBUTE), None) + def test_schema_types(self): XML = self.XML root = XML('''\