From scoder at codespeak.net Wed Aug 1 09:22:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 1 Aug 2007 09:22:29 +0200 (CEST) Subject: [Lxml-checkins] r45445 - lxml/trunk/doc Message-ID: <20070801072229.C7B85807F@code0.codespeak.net> Author: scoder Date: Wed Aug 1 09:22:28 2007 New Revision: 45445 Modified: lxml/trunk/doc/main.txt Log: cheeseshop -> pypi Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Wed Aug 1 09:22:28 2007 @@ -129,10 +129,10 @@ -------- The best way to download binary versions is to visit `lxml at the Python -cheeseshop`_. It has the source, eggs and installers for various platforms. +Package Index`_. It has the source, eggs and installers for various platforms. The source distribution is signed with `this key`_. -.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/ +.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc The latest version is `lxml 1.3.2`_, released 2007-07-03 (`changes for 1.3.2`_). From scoder at codespeak.net Wed Aug 1 09:23:21 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 1 Aug 2007 09:23:21 +0200 (CEST) Subject: [Lxml-checkins] r45446 - lxml/branch/lxml-1.3/doc Message-ID: <20070801072321.B27F580DA@code0.codespeak.net> Author: scoder Date: Wed Aug 1 09:23:20 2007 New Revision: 45446 Modified: lxml/branch/lxml-1.3/doc/main.txt Log: cheeseshop -> pypi Modified: lxml/branch/lxml-1.3/doc/main.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/main.txt (original) +++ lxml/branch/lxml-1.3/doc/main.txt Wed Aug 1 09:23:20 2007 @@ -124,10 +124,10 @@ -------- The best way to download binary versions is to visit `lxml at the Python -cheeseshop`_. It has the source, eggs and installers for various platforms. +Package Index`_. It has the source, eggs and installers for various platforms. The source distribution is signed with `this key`_. -.. _`lxml at the Python cheeseshop`: http://cheeseshop.python.org/pypi/lxml/ +.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc The latest version is `lxml 1.3.3`_, released 2007-07-26 (`changes for 1.3.3`_). From scoder at codespeak.net Fri Aug 3 00:10:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 3 Aug 2007 00:10:29 +0200 (CEST) Subject: [Lxml-checkins] r45470 - lxml/trunk/doc Message-ID: <20070802221029.60BCD8103@code0.codespeak.net> Author: scoder Date: Fri Aug 3 00:10:28 2007 New Revision: 45470 Modified: lxml/trunk/doc/build.txt Log: Cython instead of Pyrex Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Fri Aug 3 00:10:28 2007 @@ -11,7 +11,7 @@ .. contents:: .. - 1 Pyrex + 1 Cython 2 Subversion 3 Setuptools 4 Running the tests and reporting errors @@ -20,53 +20,22 @@ 7 Building Debian packages from SVN sources -Pyrex ------ +Cython +------ -The lxml.etree and lxml.objectify modules are written in Pyrex_. Since we -distribute the Pyrex-generated .c files with lxml releases, however, you do -not need Pyrex to build lxml from the normal release sources. +The lxml.etree and lxml.objectify modules are written in Cython_. Since we +distribute the Cython-generated .c files with lxml releases, however, you do +not need Cython to build lxml from the normal release sources. -.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ +.. _Cython: http://www.cython.org If you are interested in building lxml from a Subversion checkout or want to -be an lxml developer, you do need a working Pyrex installation. +be an lxml developer, you do need a working Cython installation. You can use +EasyInstall_ to install it:: -* lxml 1.1 and later + easy_install Cython - Newer versions of lxml depend on features and bug fixes that are not yet - available in an official Pyrex release. This includes support for the - external C-API of lxml.etree, for Python 2.5 and for 64 bit architectures. - - To build lxml 1.1 and later from non-release or modified sources, you must - therefore use an updated Pyrex version from here: - - http://codespeak.net/svn/lxml/pyrex/ - - A subversion checkout of lxml will automatically retrieve the latest Pyrex - as external project source (``svn:externals``). Look for the ``Pyrex`` - directory in the source tree. - - Since version 1.1.2, the lxml source distribution also includes this Pyrex - version. It will be used if the ``Pyrex`` directory is available in the - lxml root directory. If you install from SVN or delete this directory from - the unpacked distribution directory, the normally installed Pyrex version - will be used. - -* lxml 1.0 and earlier - - The 1.0 series build with a standard installation of Pyrex 0.9.4.1. Note - that Pyrex up to and including version 0.9.4 has known problems when - compiling lxml with gcc 4.x or Python 2.4. Do not use it. If you want to - build lxml from non-release sources, please install Pyrex version 0.9.4.1 or - later. - - Pyrex now supports EasyInstall_, so you can install it by running the - following command as super-user:: - - easy_install Pyrex - - .. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall +.. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall Subversion @@ -167,7 +136,8 @@ This is the procedure to make an lxml egg for your platform: * Download the lxml-x.y.tar.gz release. This contains the pregenerated C so - that you don't run into any Pyrex issues. Unpack it and cd into it. + that you can be sure you build exactly from the release sources. Unpack + them and cd into the resulting directory. * python setup.py build From lxml-checkins at codespeak.net Fri Aug 3 13:43:42 2007 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 3 Aug 2007 13:43:42 +0200 (CEST) Subject: [Lxml-checkins] Save an extra 25-50% on men's clearance! Notification-id : 6901594117 Message-ID: <20070803064450.6590.qmail@spb-62-141-121-154.sovintel.spb.ru> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20070803/5bb9f2e6/attachment.htm From scoder at codespeak.net Sat Aug 11 19:13:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 11 Aug 2007 19:13:17 +0200 (CEST) Subject: [Lxml-checkins] r45603 - lxml/trunk Message-ID: <20070811171317.6DE2A81B6@code0.codespeak.net> Author: scoder Date: Sat Aug 11 19:13:15 2007 New Revision: 45603 Modified: lxml/trunk/setupinfo.py Log: cleanup Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Sat Aug 11 19:13:15 2007 @@ -22,7 +22,6 @@ ("pyclasslookup", "lxml.pyclasslookup") ] - def env_var(name): value = os.getenv(name, '') return value.split(os.pathsep) From scoder at codespeak.net Mon Aug 13 14:53:19 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 14:53:19 +0200 (CEST) Subject: [Lxml-checkins] r45622 - in lxml/trunk/src/lxml: . tests Message-ID: <20070813125319.ACAD18185@code0.codespeak.net> Author: scoder Date: Mon Aug 13 14:53:17 2007 New Revision: 45622 Modified: lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/tree.pxd Log: let DTDs that get parsed in also go out if serialising an ElementTree Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Mon Aug 13 14:53:17 2007 @@ -267,22 +267,22 @@ return _elementFactory(self, c_node) cdef getdoctype(self): - cdef tree.xmlDtd* dtd + cdef tree.xmlDtd* c_dtd cdef xmlNode* c_root_node public_id = None sys_url = None - dtd = self._c_doc.intSubset - if dtd is not NULL: - if dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) - dtd = self._c_doc.extSubset - if dtd is not NULL: - if not public_id and dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if not sys_url and dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) + c_dtd = self._c_doc.intSubset + if c_dtd is not NULL: + if c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) + c_dtd = self._c_doc.extSubset + if c_dtd is not NULL: + if not public_id and c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if not sys_url and c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) c_root_node = tree.xmlDocGetRootElement(self._c_doc) if c_root_node is NULL: root_name = None @@ -1329,7 +1329,7 @@ c_write_declaration = encoding not in \ ('US-ASCII', 'ASCII', 'UTF8', 'UTF-8') _tofilelike(file, self._context_node, encoding, - c_write_declaration, bool(pretty_print)) + c_write_declaration, 1, bool(pretty_print)) def getpath(self, _Element element not None): """Returns a structural, absolute XPath expression to find that element. @@ -2061,10 +2061,10 @@ if isinstance(element_or_tree, _Element): return _tostring(<_Element>element_or_tree, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tostring((<_ElementTree>element_or_tree)._context_node, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) @@ -2081,10 +2081,10 @@ cdef int c_pretty_print c_pretty_print = bool(pretty_print) if isinstance(element_or_tree, _Element): - return _tounicode(<_Element>element_or_tree, c_pretty_print) + return _tounicode(<_Element>element_or_tree, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tounicode((<_ElementTree>element_or_tree)._context_node, - c_pretty_print) + 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Mon Aug 13 14:53:17 2007 @@ -1,7 +1,7 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +29,8 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, + pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -43,7 +44,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int pretty_print): +cdef _tounicode(_Element element, int write_doctype, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -55,7 +56,8 @@ raise LxmlError, "Failed to create output buffer" try: state = python.PyEval_SaveThread() - _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, pretty_print) + _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, + write_doctype, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -72,12 +74,15 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) + if write_doctype: + _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) @@ -93,6 +98,41 @@ tree.xmlOutputBufferWriteString(c_buffer, encoding) tree.xmlOutputBufferWriteString(c_buffer, "'?>\n") +cdef void _writeDtdToBuffer(tree.xmlOutputBuffer* c_buffer, + xmlDoc* c_doc, char* c_root_name, char* encoding): + cdef tree.xmlDtd* c_dtd + cdef xmlNode* c_node + c_dtd = c_doc.intSubset + if c_dtd == NULL or c_dtd.name == NULL: + return + if c_dtd.ExternalID == NULL and c_dtd.SystemID == NULL: + return + if cstd.strcmp(c_root_name, c_dtd.name) != 0: + return + tree.xmlOutputBufferWrite(c_buffer, 10, "\n') + return + tree.xmlOutputBufferWrite(c_buffer, 4, '" [\n') + if c_dtd.notations != NULL: + tree.xmlDumpNotationTable(c_buffer.buffer, + c_dtd.notations) + c_node = c_dtd.children + while c_node is not NULL: + tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, 0, encoding) + c_node = c_node.next + tree.xmlOutputBufferWrite(c_buffer, 3, "]>\n") + cdef void _writeTail(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, int pretty_print): "Write the element tail." @@ -179,7 +219,8 @@ return (<_FilelikeWriter>ctxt).close() cdef _tofilelike(f, _Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef python.PyThreadState* state cdef _FilelikeWriter writer cdef tree.xmlOutputBuffer* c_buffer @@ -209,7 +250,7 @@ raise TypeError, "File or filename expected, got '%s'" % type(f) _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, pretty_print) tree.xmlOutputBufferClose(c_buffer) tree.xmlCharEncCloseFunc(enchandler) if writer is None: Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Mon Aug 13 14:53:17 2007 @@ -1638,6 +1638,20 @@ self.assertEquals(docinfo.system_url, None) self.assertEquals(docinfo.root_name, 'html') self.assertEquals(docinfo.doctype, '') + + def test_dtd_io(self): + # check that DTDs that go in also go back out + xml = '''\ + + + + ]> + test-test\ + ''' + root = self.etree.parse(StringIO(xml)) + self.assertEqual(self.etree.tostring(root).replace(" ", ""), + xml.replace(" ", "")) def test_byte_zero(self): Element = self.etree.Element Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Mon Aug 13 14:53:17 2007 @@ -58,7 +58,8 @@ ctypedef struct xmlDoc ctypedef struct xmlAttr - + ctypedef struct xmlNotationTable + ctypedef enum xmlElementType: XML_ELEMENT_NODE= 1 XML_ATTRIBUTE_NODE= 2 @@ -103,8 +104,16 @@ unsigned short line ctypedef struct xmlDtd: + char* name char* ExternalID char* SystemID + void* notations + void* entities + void* pentities + void* attributes + void* elements + xmlNode* children + xmlDoc* doc ctypedef struct xmlDoc: xmlElementType type @@ -152,7 +161,7 @@ xmlDoc* doc ctypedef struct xmlBuffer - + ctypedef struct xmlOutputBuffer: xmlBuffer* buffer xmlBuffer* conv @@ -226,9 +235,12 @@ cdef extern from "libxml/valid.h": cdef xmlAttr* xmlGetID(xmlDoc* doc, char* ID) + cdef void xmlDumpNotationTable(xmlBuffer* buffer, xmlNotationTable* table) cdef extern from "libxml/xmlIO.h": + cdef void xmlBufferWriteQuotedString(xmlOutputBuffer* out, char* str) cdef int xmlOutputBufferWriteString(xmlOutputBuffer* out, char* str) + cdef int xmlOutputBufferWrite(xmlOutputBuffer* out, int len, char* str) cdef int xmlOutputBufferFlush(xmlOutputBuffer* out) cdef int xmlOutputBufferClose(xmlOutputBuffer* out) From scoder at codespeak.net Mon Aug 13 15:11:28 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 15:11:28 +0200 (CEST) Subject: [Lxml-checkins] r45623 - in lxml/branch/lxml-1.3/src/lxml: . tests Message-ID: <20070813131128.07A278185@code0.codespeak.net> Author: scoder Date: Mon Aug 13 15:11:28 2007 New Revision: 45623 Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx lxml/branch/lxml-1.3/src/lxml/serializer.pxi lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py lxml/branch/lxml-1.3/src/lxml/tree.pxd Log: trunk merge: let DTDs that get parsed in also go out if serialising an ElementTree Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/etree.pyx (original) +++ lxml/branch/lxml-1.3/src/lxml/etree.pyx Mon Aug 13 15:11:28 2007 @@ -254,22 +254,22 @@ return _elementFactory(self, c_node) cdef getdoctype(self): - cdef tree.xmlDtd* dtd + cdef tree.xmlDtd* c_dtd cdef xmlNode* c_root_node public_id = None sys_url = None - dtd = self._c_doc.intSubset - if dtd is not NULL: - if dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) - dtd = self._c_doc.extSubset - if dtd is not NULL: - if not public_id and dtd.ExternalID is not NULL: - public_id = funicode(dtd.ExternalID) - if not sys_url and dtd.SystemID is not NULL: - sys_url = funicode(dtd.SystemID) + c_dtd = self._c_doc.intSubset + if c_dtd is not NULL: + if c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) + c_dtd = self._c_doc.extSubset + if c_dtd is not NULL: + if not public_id and c_dtd.ExternalID is not NULL: + public_id = funicode(c_dtd.ExternalID) + if not sys_url and c_dtd.SystemID is not NULL: + sys_url = funicode(c_dtd.SystemID) c_root_node = tree.xmlDocGetRootElement(self._c_doc) if c_root_node is NULL: root_name = None @@ -1278,7 +1278,7 @@ c_write_declaration = encoding not in \ ('US-ASCII', 'ASCII', 'UTF8', 'UTF-8') _tofilelike(file, self._context_node, encoding, - c_write_declaration, bool(pretty_print)) + c_write_declaration, 1, bool(pretty_print)) def getpath(self, _Element element not None): """Returns a structural, absolute XPath expression to find that element. @@ -1967,10 +1967,10 @@ if isinstance(element_or_tree, _Element): return _tostring(<_Element>element_or_tree, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tostring((<_ElementTree>element_or_tree)._context_node, - encoding, write_declaration, c_pretty_print) + encoding, write_declaration, 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) @@ -1987,10 +1987,10 @@ cdef int c_pretty_print c_pretty_print = bool(pretty_print) if isinstance(element_or_tree, _Element): - return _tounicode(<_Element>element_or_tree, c_pretty_print) + return _tounicode(<_Element>element_or_tree, 0, c_pretty_print) elif isinstance(element_or_tree, _ElementTree): return _tounicode((<_ElementTree>element_or_tree)._context_node, - c_pretty_print) + 1, c_pretty_print) else: raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Mon Aug 13 15:11:28 2007 @@ -1,7 +1,7 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +29,8 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, + pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -43,7 +44,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int pretty_print): +cdef _tounicode(_Element element, int write_doctype, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -55,7 +56,8 @@ raise LxmlError, "Failed to create output buffer" try: state = python.PyEval_SaveThread() - _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, pretty_print) + _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, + write_doctype, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -72,12 +74,15 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) + if write_doctype: + _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) @@ -93,6 +98,41 @@ tree.xmlOutputBufferWriteString(c_buffer, encoding) tree.xmlOutputBufferWriteString(c_buffer, "'?>\n") +cdef void _writeDtdToBuffer(tree.xmlOutputBuffer* c_buffer, + xmlDoc* c_doc, char* c_root_name, char* encoding): + cdef tree.xmlDtd* c_dtd + cdef xmlNode* c_node + c_dtd = c_doc.intSubset + if c_dtd == NULL or c_dtd.name == NULL: + return + if c_dtd.ExternalID == NULL and c_dtd.SystemID == NULL: + return + if cstd.strcmp(c_root_name, c_dtd.name) != 0: + return + tree.xmlOutputBufferWrite(c_buffer, 10, "\n') + return + tree.xmlOutputBufferWrite(c_buffer, 4, '" [\n') + if c_dtd.notations != NULL: + tree.xmlDumpNotationTable(c_buffer.buffer, + c_dtd.notations) + c_node = c_dtd.children + while c_node is not NULL: + tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, 0, encoding) + c_node = c_node.next + tree.xmlOutputBufferWrite(c_buffer, 3, "]>\n") + cdef void _writeTail(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, int pretty_print): "Write the element tail." @@ -179,7 +219,8 @@ return (<_FilelikeWriter>ctxt).close() cdef _tofilelike(f, _Element element, encoding, - int write_xml_declaration, int pretty_print): + int write_xml_declaration, int write_doctype, + int pretty_print): cdef python.PyThreadState* state cdef _FilelikeWriter writer cdef tree.xmlOutputBuffer* c_buffer @@ -209,7 +250,7 @@ raise TypeError, "File or filename expected, got '%s'" % type(f) _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, pretty_print) + write_xml_declaration, write_doctype, pretty_print) tree.xmlOutputBufferClose(c_buffer) tree.xmlCharEncCloseFunc(enchandler) if writer is None: Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py (original) +++ lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Mon Aug 13 15:11:28 2007 @@ -1502,6 +1502,20 @@ self.assertEquals(docinfo.system_url, None) self.assertEquals(docinfo.root_name, 'html') self.assertEquals(docinfo.doctype, '') + + def test_dtd_io(self): + # check that DTDs that go in also go back out + xml = '''\ + + + + ]> + test-test\ + ''' + root = self.etree.parse(StringIO(xml)) + self.assertEqual(self.etree.tostring(root).replace(" ", ""), + xml.replace(" ", "")) def test_byte_zero(self): Element = self.etree.Element Modified: lxml/branch/lxml-1.3/src/lxml/tree.pxd ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tree.pxd (original) +++ lxml/branch/lxml-1.3/src/lxml/tree.pxd Mon Aug 13 15:11:28 2007 @@ -58,7 +58,8 @@ ctypedef struct xmlDoc ctypedef struct xmlAttr - + ctypedef struct xmlNotationTable + ctypedef enum xmlElementType: XML_ELEMENT_NODE= 1 XML_ATTRIBUTE_NODE= 2 @@ -103,8 +104,16 @@ unsigned short line ctypedef struct xmlDtd: + char* name char* ExternalID char* SystemID + void* notations + void* entities + void* pentities + void* attributes + void* elements + xmlNode* children + xmlDoc* doc ctypedef struct xmlDoc: xmlElementType type @@ -152,7 +161,7 @@ xmlDoc* doc ctypedef struct xmlBuffer - + ctypedef struct xmlOutputBuffer: xmlBuffer* buffer xmlBuffer* conv @@ -223,9 +232,12 @@ cdef extern from "libxml/valid.h": cdef xmlAttr* xmlGetID(xmlDoc* doc, char* ID) + cdef void xmlDumpNotationTable(xmlBuffer* buffer, xmlNotationTable* table) cdef extern from "libxml/xmlIO.h": + cdef void xmlBufferWriteQuotedString(xmlOutputBuffer* out, char* str) cdef int xmlOutputBufferWriteString(xmlOutputBuffer* out, char* str) + cdef int xmlOutputBufferWrite(xmlOutputBuffer* out, int len, char* str) cdef int xmlOutputBufferFlush(xmlOutputBuffer* out) cdef int xmlOutputBufferClose(xmlOutputBuffer* out) From scoder at codespeak.net Mon Aug 13 15:12:54 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 15:12:54 +0200 (CEST) Subject: [Lxml-checkins] r45624 - lxml/trunk/src/lxml Message-ID: <20070813131254.A84BD8189@code0.codespeak.net> Author: scoder Date: Mon Aug 13 15:12:54 2007 New Revision: 45624 Modified: lxml/trunk/src/lxml/serializer.pxi Log: also write comment and PI siblings of the root node only when serialising an ElementTree Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Mon Aug 13 15:12:54 2007 @@ -1,7 +1,8 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int write_doctype, int pretty_print): + int write_xml_declaration, int write_complete_document, + int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +30,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, write_doctype, + write_xml_declaration, write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) @@ -44,7 +45,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int write_doctype, int pretty_print): +cdef _tounicode(_Element element, int write_complete_document, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -57,7 +58,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, - write_doctype, pretty_print) + write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -74,19 +75,21 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int write_doctype, + int write_xml_declaration, + int write_complete_document, int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_xml_declaration: + if write_complete_document: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) - if write_doctype: + if write_complete_document: _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) - _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) + _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) - _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) + if write_complete_document: + _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) cdef void _writeDeclarationToBuffer(tree.xmlOutputBuffer* c_buffer, char* version, char* encoding): From scoder at codespeak.net Mon Aug 13 16:15:07 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:15:07 +0200 (CEST) Subject: [Lxml-checkins] r45626 - lxml/branch/lxml-1.3/src/lxml Message-ID: <20070813141507.988008131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:15:06 2007 New Revision: 45626 Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi Log: also write comment and PI siblings of the root node only when serialising an ElementTree Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Mon Aug 13 16:15:06 2007 @@ -1,7 +1,8 @@ # XML serialization and output functions cdef _tostring(_Element element, encoding, - int write_xml_declaration, int write_doctype, int pretty_print): + int write_xml_declaration, int write_complete_document, + int pretty_print): "Serialize an element to an encoded string representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -29,7 +30,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, c_enc, - write_xml_declaration, write_doctype, + write_xml_declaration, write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) @@ -44,7 +45,7 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, int write_doctype, int pretty_print): +cdef _tounicode(_Element element, int write_complete_document, int pretty_print): "Serialize an element to the Python unicode representation of its XML tree." cdef python.PyThreadState* state cdef tree.xmlOutputBuffer* c_buffer @@ -57,7 +58,7 @@ try: state = python.PyEval_SaveThread() _writeNodeToBuffer(c_buffer, element._c_node, NULL, 0, - write_doctype, pretty_print) + write_complete_document, pretty_print) tree.xmlOutputBufferFlush(c_buffer) python.PyEval_RestoreThread(state) if c_buffer.conv is not NULL: @@ -74,19 +75,21 @@ cdef void _writeNodeToBuffer(tree.xmlOutputBuffer* c_buffer, xmlNode* c_node, char* encoding, - int write_xml_declaration, int write_doctype, + int write_xml_declaration, + int write_complete_document, int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_xml_declaration: + if write_complete_document: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) - if write_doctype: + if write_complete_document: _writeDtdToBuffer(c_buffer, c_doc, c_node.name, encoding) - _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) + _writePrevSiblings(c_buffer, c_node, encoding, pretty_print) tree.xmlNodeDumpOutput(c_buffer, c_doc, c_node, 0, pretty_print, encoding) _writeTail(c_buffer, c_node, encoding, pretty_print) - _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) + if write_complete_document: + _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) cdef void _writeDeclarationToBuffer(tree.xmlOutputBuffer* c_buffer, char* version, char* encoding): From scoder at codespeak.net Mon Aug 13 16:15:20 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:15:20 +0200 (CEST) Subject: [Lxml-checkins] r45627 - lxml/trunk Message-ID: <20070813141520.C563F8131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:15:20 2007 New Revision: 45627 Modified: lxml/trunk/CHANGES.txt Log: changelog update Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Aug 13 16:15:20 2007 @@ -8,6 +8,10 @@ Features added -------------- +* Serialising an ElementTree now includes any internal DTD subsets that are + part of the document, as well as comments and PIs that are siblings of the + root node. + * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. @@ -53,6 +57,9 @@ Other changes ------------- +* Serialising an Element no longer includes includes its comment and PI + siblings (only ElementTree serialisation includes them). + * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias From scoder at codespeak.net Mon Aug 13 16:17:35 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:17:35 +0200 (CEST) Subject: [Lxml-checkins] r45628 - lxml/trunk Message-ID: <20070813141735.5D0688131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:17:33 2007 New Revision: 45628 Modified: lxml/trunk/CHANGES.txt Log: typo Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Aug 13 16:17:33 2007 @@ -57,8 +57,8 @@ Other changes ------------- -* Serialising an Element no longer includes includes its comment and PI - siblings (only ElementTree serialisation includes them). +* Serialising an Element no longer includes its comment and PI siblings (only + ElementTree serialisation includes them). * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias From scoder at codespeak.net Mon Aug 13 16:18:48 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 16:18:48 +0200 (CEST) Subject: [Lxml-checkins] r45629 - lxml/branch/lxml-1.3 Message-ID: <20070813141848.9A1EE8131@code0.codespeak.net> Author: scoder Date: Mon Aug 13 16:18:47 2007 New Revision: 45629 Modified: lxml/branch/lxml-1.3/CHANGES.txt Log: changelog update Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Mon Aug 13 16:18:47 2007 @@ -8,11 +8,21 @@ Features added -------------- +* Serialising an ElementTree now includes any internal DTD subsets that are + part of the document, as well as comments and PIs that are siblings of the + root node. + Bugs fixed ---------- * Parsing with the ``no_network`` option could fail +Other changes +------------- + +* Serialising an Element no longer includes its comment and PI siblings (only + ElementTree serialisation includes them). + 1.3.3 (2007-07-26) ================== From scoder at codespeak.net Mon Aug 13 17:12:29 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 13 Aug 2007 17:12:29 +0200 (CEST) Subject: [Lxml-checkins] r45630 - lxml/trunk Message-ID: <20070813151229.68B318163@code0.codespeak.net> Author: scoder Date: Mon Aug 13 17:12:28 2007 New Revision: 45630 Modified: lxml/trunk/CHANGES.txt Log: Changelog cleanup Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Aug 13 17:12:28 2007 @@ -8,10 +8,6 @@ Features added -------------- -* Serialising an ElementTree now includes any internal DTD subsets that are - part of the document, as well as comments and PIs that are siblings of the - root node. - * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. @@ -41,8 +37,6 @@ Bugs fixed ---------- -* Parsing with the ``no_network`` option could fail - * lxml.etree did not check tag/attribute names * The XML parser did not report undefined entities as error @@ -57,8 +51,7 @@ Other changes ------------- -* Serialising an Element no longer includes its comment and PI siblings (only - ElementTree serialisation includes them). +* objectify.PyType for None is now called "NoneType" * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias @@ -71,6 +64,28 @@ * Network access in parsers disabled by default +1.3.4 (???) +================== + +* Serialising an ElementTree now includes any internal DTD subsets that are + part of the document, as well as comments and PIs that are siblings of the + root node. + +Features added +-------------- + +Bugs fixed +---------- + +* Parsing with the ``no_network`` option could fail + +Other changes +------------- + +* Serialising an Element no longer includes its comment and PI siblings (only + ElementTree serialisation includes them). + + 1.3.3 (2007-07-26) ================== From scoder at codespeak.net Tue Aug 14 11:03:43 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 14 Aug 2007 11:03:43 +0200 (CEST) Subject: [Lxml-checkins] r45644 - lxml/trunk/doc Message-ID: <20070814090343.94F598163@code0.codespeak.net> Author: scoder Date: Tue Aug 14 11:03:43 2007 New Revision: 45644 Modified: lxml/trunk/doc/FAQ.txt Log: FAQ entry on trailing .tail's on serialisation Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Tue Aug 14 11:03:43 2007 @@ -141,6 +141,30 @@ .. _threading: #threading +What about that trailing text on serialised Elements? +----------------------------------------------------- + +The ElementTree tree model defines an Element as a container with a tag name, +contained text, child Elements and a tail text. This means that whenever you +serialise an Element, you will get all parts of that Element:: + + >>> from lxml import etree + >>> root = etree.XML("texttail") + >>> print etree.tostring(root[0]) + texttail + +This is a huge simplification for the tree model as it avoids text nodes to +appear in the list of children and makes access to them quick and simple. So +this is a benefit in most applications and simplifies many, many XML tree +algorithms. + +However, in document-like XML (and especially HTML), the above result can be +unexpected to new users and can sometimes require a bit more overhead. A good +way to deal with this is to use helper functions that copy the Element without +its tail. The ``lxml.html`` package also deals with this in a couple of +places, as most HTML algorithms benefit from a tail-free behaviour. + + Installation ============ From scoder at codespeak.net Tue Aug 14 11:04:05 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 14 Aug 2007 11:04:05 +0200 (CEST) Subject: [Lxml-checkins] r45645 - lxml/branch/lxml-1.3/doc Message-ID: <20070814090405.BC23F814F@code0.codespeak.net> Author: scoder Date: Tue Aug 14 11:04:05 2007 New Revision: 45645 Modified: lxml/branch/lxml-1.3/doc/FAQ.txt Log: FAQ entry on trailing .tail's on serialisation Modified: lxml/branch/lxml-1.3/doc/FAQ.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/FAQ.txt (original) +++ lxml/branch/lxml-1.3/doc/FAQ.txt Tue Aug 14 11:04:05 2007 @@ -142,6 +142,30 @@ .. _threading: #threading +What about that trailing text on serialised Elements? +----------------------------------------------------- + +The ElementTree tree model defines an Element as a container with a tag name, +contained text, child Elements and a tail text. This means that whenever you +serialise an Element, you will get all parts of that Element:: + + >>> from lxml import etree + >>> root = etree.XML("texttail") + >>> print etree.tostring(root[0]) + texttail + +This is a huge simplification for the tree model as it avoids text nodes to +appear in the list of children and makes access to them quick and simple. So +this is a benefit in most applications and simplifies many, many XML tree +algorithms. + +However, in document-like XML (and especially HTML), the above result can be +unexpected to new users and can sometimes require a bit more overhead. A good +way to deal with this is to use helper functions that copy the Element without +its tail. The ``lxml.html`` package also deals with this in a couple of +places, as most HTML algorithms benefit from a tail-free behaviour. + + Installation ============ From scoder at codespeak.net Thu Aug 16 08:35:34 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 08:35:34 +0200 (CEST) Subject: [Lxml-checkins] r45692 - lxml/branch/lxml-1.3/doc Message-ID: <20070816063534.1BEA680FC@code0.codespeak.net> Author: scoder Date: Thu Aug 16 08:35:32 2007 New Revision: 45692 Added: lxml/branch/lxml-1.3/doc/tutorial.txt - copied, changed from r45630, lxml/trunk/doc/tutorial.txt Log: updated Tutorial from trunk Copied: lxml/branch/lxml-1.3/doc/tutorial.txt (from r45630, lxml/trunk/doc/tutorial.txt) ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/branch/lxml-1.3/doc/tutorial.txt Thu Aug 16 08:35:32 2007 @@ -484,8 +484,8 @@ -One such example is the module ``lxml.html.builder``, which provides a -vocabulary for HTML. +One such example is the module ``lxml.html.builder`` in lxml 2.0, which +provides a vocabulary for HTML. ElementPath From scoder at codespeak.net Thu Aug 16 08:46:04 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 08:46:04 +0200 (CEST) Subject: [Lxml-checkins] r45693 - lxml/trunk/src/lxml Message-ID: <20070816064604.7B53E8108@code0.codespeak.net> Author: scoder Date: Thu Aug 16 08:46:03 2007 New Revision: 45693 Modified: lxml/trunk/src/lxml/serializer.pxi Log: fix for DTD serialisation Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Thu Aug 16 08:46:03 2007 @@ -80,7 +80,7 @@ int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_complete_document: + if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) if write_complete_document: From scoder at codespeak.net Thu Aug 16 08:47:28 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 08:47:28 +0200 (CEST) Subject: [Lxml-checkins] r45694 - lxml/branch/lxml-1.3/src/lxml Message-ID: <20070816064728.9D1B1810D@code0.codespeak.net> Author: scoder Date: Thu Aug 16 08:47:28 2007 New Revision: 45694 Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi Log: fix for DTD serialisation Modified: lxml/branch/lxml-1.3/src/lxml/serializer.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/serializer.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/serializer.pxi Thu Aug 16 08:47:28 2007 @@ -80,7 +80,7 @@ int pretty_print): cdef xmlDoc* c_doc c_doc = c_node.doc - if write_complete_document: + if write_xml_declaration: _writeDeclarationToBuffer(c_buffer, c_doc.version, encoding) if write_complete_document: From scoder at codespeak.net Thu Aug 16 09:05:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 09:05:57 +0200 (CEST) Subject: [Lxml-checkins] r45695 - lxml/trunk/doc Message-ID: <20070816070557.1FF5D8105@code0.codespeak.net> Author: scoder Date: Thu Aug 16 09:05:56 2007 New Revision: 45695 Modified: lxml/trunk/doc/tutorial.txt Log: extended section on ElementTree serialisation Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Thu Aug 16 09:05:56 2007 @@ -332,7 +332,52 @@ The ElementTree class ===================== -An ``ElementTree`` is mainly a wrapper around a tree with a root node. +An ``ElementTree`` is mainly a document wrapper around a tree with a root +node. It provides a couple of methods for parsing, serialisation and general +document handling. One of the bigger differences is that it serialises as a +complete document, as opposed to a single Element. This includes top-level +processing instructions and comments, as well as a DOCTYPE and other DTD +content in the document:: + + >>> from StringIO import StringIO + >>> tree = etree.parse(StringIO('''\ + ... + ... ]> + ... + ... &tasty; + ... + ... ''')) + + >>> print tree.docinfo.doctype + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(tree) + + ]> + + eggs + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(etree.ElementTree(tree.getroot())) + + ]> + + eggs + + + >>> # ElementTree and lxml <= 1.3.3 + >>> print etree.tostring(tree.getroot()) + + eggs + + +Note that this has changed in lxml 1.3.4 to match the behaviour of the +upcoming lxml 2.0. Before, both would serialise without DTD content, which +made lxml loose DTD information in an input-output cycle. Parsing files and XML literals From scoder at codespeak.net Thu Aug 16 09:06:26 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 09:06:26 +0200 (CEST) Subject: [Lxml-checkins] r45696 - lxml/branch/lxml-1.3/doc Message-ID: <20070816070626.D2CEA8105@code0.codespeak.net> Author: scoder Date: Thu Aug 16 09:06:26 2007 New Revision: 45696 Modified: lxml/branch/lxml-1.3/doc/tutorial.txt Log: extended section on ElementTree serialisation Modified: lxml/branch/lxml-1.3/doc/tutorial.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/tutorial.txt (original) +++ lxml/branch/lxml-1.3/doc/tutorial.txt Thu Aug 16 09:06:26 2007 @@ -332,7 +332,52 @@ The ElementTree class ===================== -An ``ElementTree`` is mainly a wrapper around a tree with a root node. +An ``ElementTree`` is mainly a document wrapper around a tree with a root +node. It provides a couple of methods for parsing, serialisation and general +document handling. One of the bigger differences is that it serialises as a +complete document, as opposed to a single Element. This includes top-level +processing instructions and comments, as well as a DOCTYPE and other DTD +content in the document:: + + >>> from StringIO import StringIO + >>> tree = etree.parse(StringIO('''\ + ... + ... ]> + ... + ... &tasty; + ... + ... ''')) + + >>> print tree.docinfo.doctype + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(tree) + + ]> + + eggs + + + >>> # lxml 1.3.4 and later + >>> print etree.tostring(etree.ElementTree(tree.getroot())) + + ]> + + eggs + + + >>> # ElementTree and lxml <= 1.3.3 + >>> print etree.tostring(tree.getroot()) + + eggs + + +Note that this has changed in lxml 1.3.4 to match the behaviour of the +upcoming lxml 2.0. Before, both would serialise without DTD content, which +made lxml loose DTD information in an input-output cycle. Parsing files and XML literals From scoder at codespeak.net Thu Aug 16 22:38:39 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 22:38:39 +0200 (CEST) Subject: [Lxml-checkins] r45755 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20070816203839.8A6388175@code0.codespeak.net> Author: scoder Date: Thu Aug 16 22:38:39 2007 New Revision: 45755 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/dtd.pxi lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/tests/test_dtd.py lxml/trunk/src/lxml/tree.pxd Log: support for retrieving the DTD defined internally in a document for validation Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Aug 16 22:38:39 2007 @@ -8,6 +8,10 @@ Features added -------------- +* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` + and ``externalDTD`` that return a DTD object for the internal or external + subset of the document respectively. + * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. Modified: lxml/trunk/src/lxml/dtd.pxi ============================================================================== --- lxml/trunk/src/lxml/dtd.pxi (original) +++ lxml/trunk/src/lxml/dtd.pxi Thu Aug 16 22:38:39 2007 @@ -99,3 +99,19 @@ if c_dtd is NULL: raise DTDParseError, "error parsing DTD" return c_dtd + +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for fast instantiation + cdef DTD NEW_DTD "PY_NEW" (object t) + +cdef DTD _dtdFactory(tree.xmlDtd* c_dtd): + # do not run through DTD.__init__()! + cdef DTD dtd + if c_dtd is NULL: + return None + dtd = NEW_DTD(DTD) + dtd._c_dtd = tree.xmlCopyDtd(c_dtd) + if dtd._c_dtd is NULL: + python.PyErr_NoMemory() + _Validator.__init__(dtd) + return dtd Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Thu Aug 16 22:38:39 2007 @@ -397,37 +397,76 @@ cdef class DocInfo: "Document information provided by parser and DTD." - cdef readonly object root_name - cdef readonly object public_id - cdef readonly object system_url - cdef readonly object xml_version - cdef readonly object encoding - cdef readonly object URL + cdef _Document _doc def __init__(self, tree): "Create a DocInfo object for an ElementTree object or root Element." - cdef _Document doc - doc = _documentOrRaise(tree) - self.root_name, self.public_id, self.system_url = doc.getdoctype() - if not self.root_name and (self.public_id or self.system_url): + self._doc = _documentOrRaise(tree) + root_name, public_id, system_url = self._doc.getdoctype() + if not root_name and (public_id or system_url): raise ValueError, "Could not find root node" - self.xml_version, self.encoding = doc.getxmlinfo() - self.URL = doc.getURL() + + property root_name: + "Returns the name of the root node as defined by the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return root_name + + property public_id: + "Returns the public ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return public_id + + property system_url: + "Returns the system ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return system_url + + property xml_version: + "Returns the XML version as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return xml_version + + property encoding: + "Returns the encoding name as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return encoding + + property URL: + "Returns the source URL of the document (or None if unknown)." + def __get__(self): + return self._doc.getURL() property doctype: + "Returns a DOCTYPE declaration string for the document." def __get__(self): - if self.public_id: - if self.system_url: + root_name, public_id, system_url = self._doc.getdoctype() + if public_id: + if system_url: return '' % ( - self.root_name, self.public_id, self.system_url) + root_name, public_id, system_url) else: return '' % ( - self.root_name, self.public_id) - elif self.system_url: + root_name, public_id) + elif system_url: return '' % ( - self.root_name, self.system_url) + root_name, system_url) else: return "" + property internalDTD: + "Returns a DTD validator based on the internal subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.intSubset) + + property externalDTD: + "Returns a DTD validator based on the external subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.extSubset) + cdef public class _Element [ type LxmlElementType, object LxmlElement ]: """Element class. References a document object and a libxml node. Modified: lxml/trunk/src/lxml/tests/test_dtd.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_dtd.py (original) +++ lxml/trunk/src/lxml/tests/test_dtd.py Thu Aug 16 22:38:39 2007 @@ -36,6 +36,31 @@ dtd = etree.DTD(StringIO("")) dtd.assertValid(root) + def test_dtd_internal(self): + root = etree.XML(''' + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + dtd.assertValid(root) + + def test_dtd_internal_invalid(self): + root = etree.XML(''' + + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + self.assertFalse(dtd.validate(root)) + def test_dtd_broken(self): self.assertRaises(etree.DTDParseError, etree.DTD, StringIO("")) Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Thu Aug 16 22:38:39 2007 @@ -219,6 +219,7 @@ int format, char* encoding) cdef void xmlNodeSetName(xmlNode* cur, char* name) cdef void xmlNodeSetContent(xmlNode* cur, char* content) + cdef xmlDtd* xmlCopyDtd(xmlDtd* dtd) cdef xmlDoc* xmlCopyDoc(xmlDoc* doc, int recursive) cdef xmlNode* xmlCopyNode(xmlNode* node, int extended) cdef xmlNode* xmlDocCopyNode(xmlNode* node, xmlDoc* doc, int extended) From scoder at codespeak.net Thu Aug 16 22:41:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 16 Aug 2007 22:41:17 +0200 (CEST) Subject: [Lxml-checkins] r45756 - in lxml/branch/lxml-1.3: . src/lxml src/lxml/tests Message-ID: <20070816204117.21EA9815D@code0.codespeak.net> Author: scoder Date: Thu Aug 16 22:41:16 2007 New Revision: 45756 Modified: lxml/branch/lxml-1.3/CHANGES.txt lxml/branch/lxml-1.3/src/lxml/dtd.pxi lxml/branch/lxml-1.3/src/lxml/etree.pyx lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py lxml/branch/lxml-1.3/src/lxml/tree.pxd Log: trunk merge: support for retrieving the DTD defined internally in a document for validation Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Thu Aug 16 22:41:16 2007 @@ -8,6 +8,10 @@ Features added -------------- +* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` + and ``externalDTD`` that return a DTD object for the internal or external + subset of the document respectively. + * Serialising an ElementTree now includes any internal DTD subsets that are part of the document, as well as comments and PIs that are siblings of the root node. Modified: lxml/branch/lxml-1.3/src/lxml/dtd.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/dtd.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/dtd.pxi Thu Aug 16 22:41:16 2007 @@ -96,3 +96,19 @@ if c_dtd is NULL: raise DTDParseError, "error parsing DTD" return c_dtd + +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for fast instantiation + cdef DTD NEW_DTD "PY_NEW" (object t) + +cdef DTD _dtdFactory(tree.xmlDtd* c_dtd): + # do not run through DTD.__init__()! + cdef DTD dtd + if c_dtd is NULL: + return None + dtd = NEW_DTD(DTD) + dtd._c_dtd = tree.xmlCopyDtd(c_dtd) + if dtd._c_dtd is NULL: + python.PyErr_NoMemory() + _Validator.__init__(dtd) + return dtd Modified: lxml/branch/lxml-1.3/src/lxml/etree.pyx ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/etree.pyx (original) +++ lxml/branch/lxml-1.3/src/lxml/etree.pyx Thu Aug 16 22:41:16 2007 @@ -384,37 +384,76 @@ cdef class DocInfo: "Document information provided by parser and DTD." - cdef readonly object root_name - cdef readonly object public_id - cdef readonly object system_url - cdef readonly object xml_version - cdef readonly object encoding - cdef readonly object URL + cdef _Document _doc def __init__(self, tree): "Create a DocInfo object for an ElementTree object or root Element." - cdef _Document doc - doc = _documentOrRaise(tree) - self.root_name, self.public_id, self.system_url = doc.getdoctype() - if not self.root_name and (self.public_id or self.system_url): + self._doc = _documentOrRaise(tree) + root_name, public_id, system_url = self._doc.getdoctype() + if not root_name and (public_id or system_url): raise ValueError, "Could not find root node" - self.xml_version, self.encoding = doc.getxmlinfo() - self.URL = doc.getURL() + + property root_name: + "Returns the name of the root node as defined by the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return root_name + + property public_id: + "Returns the public ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return public_id + + property system_url: + "Returns the system ID of the DOCTYPE." + def __get__(self): + root_name, public_id, system_url = self._doc.getdoctype() + return system_url + + property xml_version: + "Returns the XML version as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return xml_version + + property encoding: + "Returns the encoding name as declared by the document." + def __get__(self): + xml_version, encoding = self._doc.getxmlinfo() + return encoding + + property URL: + "Returns the source URL of the document (or None if unknown)." + def __get__(self): + return self._doc.getURL() property doctype: + "Returns a DOCTYPE declaration string for the document." def __get__(self): - if self.public_id: - if self.system_url: + root_name, public_id, system_url = self._doc.getdoctype() + if public_id: + if system_url: return '' % ( - self.root_name, self.public_id, self.system_url) + root_name, public_id, system_url) else: return '' % ( - self.root_name, self.public_id) - elif self.system_url: + root_name, public_id) + elif system_url: return '' % ( - self.root_name, self.system_url) + root_name, system_url) else: return "" + property internalDTD: + "Returns a DTD validator based on the internal subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.intSubset) + + property externalDTD: + "Returns a DTD validator based on the external subset of the document." + def __get__(self): + return _dtdFactory(self._doc._c_doc.extSubset) + cdef public class _Element [ type LxmlElementType, object LxmlElement ]: """Element class. References a document object and a libxml node. Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py (original) +++ lxml/branch/lxml-1.3/src/lxml/tests/test_dtd.py Thu Aug 16 22:41:16 2007 @@ -36,6 +36,31 @@ dtd = etree.DTD(StringIO("")) dtd.assertValid(root) + def test_dtd_internal(self): + root = etree.XML(''' + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + dtd.assertValid(root) + + def test_dtd_internal_invalid(self): + root = etree.XML(''' + + + + ]> + + ''') + dtd = etree.ElementTree(root).docinfo.internalDTD + self.assert_(dtd) + self.assertFalse(dtd.validate(root)) + def test_dtd_broken(self): self.assertRaises(etree.DTDParseError, etree.DTD, StringIO("")) Modified: lxml/branch/lxml-1.3/src/lxml/tree.pxd ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tree.pxd (original) +++ lxml/branch/lxml-1.3/src/lxml/tree.pxd Thu Aug 16 22:41:16 2007 @@ -218,6 +218,7 @@ int format, char* encoding) cdef void xmlNodeSetName(xmlNode* cur, char* name) cdef void xmlNodeSetContent(xmlNode* cur, char* content) + cdef xmlDtd* xmlCopyDtd(xmlDtd* dtd) cdef xmlDoc* xmlCopyDoc(xmlDoc* doc, int recursive) cdef xmlNode* xmlCopyNode(xmlNode* node, int extended) cdef xmlNode* xmlDocCopyNode(xmlNode* node, xmlDoc* doc, int extended) From scoder at codespeak.net Sat Aug 18 11:37:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:37:57 +0200 (CEST) Subject: [Lxml-checkins] r45834 - lxml/branch/html/src/lxml/html Message-ID: <20070818093757.AD4AE81C7@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:37:55 2007 New Revision: 45834 Modified: lxml/branch/html/src/lxml/html/__init__.py Log: raise KeyError if no default is passed to a failed get_element_by_id() Modified: lxml/branch/html/src/lxml/html/__init__.py ============================================================================== --- lxml/branch/html/src/lxml/html/__init__.py (original) +++ lxml/branch/html/src/lxml/html/__init__.py Sat Aug 18 11:37:55 2007 @@ -152,23 +152,26 @@ """ return _class_xpath(self, class_name=class_name) - def get_element_by_id(self, id, default=None): + def get_element_by_id(self, id, *default): """ - Get the first element in a document with the given id. If - none are found, return default (None). + Get the first element in a document with the given id. If none is + found, return the default argument if provided or raise KeyError + otherwise. Note that there can be more than one element with the same id, and this isn't uncommon in HTML documents found in the wild. Browsers return only the first match, and this function does the same. """ - # FIXME: should this raise an exception when something isn't found? try: # FIXME: should this check for multiple matches? # browsers just return the first one return _id_xpath(self, id=id)[0] except IndexError: - return default + if default: + return default[0] + else: + raise KeyError, id def text_content(self): """ From scoder at codespeak.net Sat Aug 18 11:46:14 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:46:14 +0200 (CEST) Subject: [Lxml-checkins] r45835 - lxml/trunk/src/lxml/html Message-ID: <20070818094614.1F98C81D0@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:46:12 2007 New Revision: 45835 Added: lxml/trunk/src/lxml/html/ - copied from r45834, lxml/branch/html/src/lxml/html/ Log: copied lxml.html from html branch From scoder at codespeak.net Sat Aug 18 11:48:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:48:17 +0200 (CEST) Subject: [Lxml-checkins] r45836 - lxml/trunk/src/lxml Message-ID: <20070818094817.D2F3381E7@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:48:17 2007 New Revision: 45836 Added: lxml/trunk/src/lxml/cssselect.py - copied unchanged from r45835, lxml/branch/html/src/lxml/cssselect.py Log: copied lxml.cssselect from html branch From scoder at codespeak.net Sat Aug 18 11:48:41 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:48:41 +0200 (CEST) Subject: [Lxml-checkins] r45837 - lxml/trunk/src/lxml Message-ID: <20070818094841.26BDB81E8@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:48:40 2007 New Revision: 45837 Added: lxml/trunk/src/lxml/doctestcompare.py - copied unchanged from r45836, lxml/branch/html/src/lxml/doctestcompare.py Log: copied lxml.doctestcompare from html branch From scoder at codespeak.net Sat Aug 18 11:52:39 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:52:39 +0200 (CEST) Subject: [Lxml-checkins] r45838 - lxml/trunk/doc Message-ID: <20070818095239.252F881E8@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:52:38 2007 New Revision: 45838 Added: lxml/trunk/doc/cssselect.txt - copied unchanged from r45837, lxml/branch/html/doc/cssselect.txt Log: copied docs from html branch From scoder at codespeak.net Sat Aug 18 11:52:53 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:52:53 +0200 (CEST) Subject: [Lxml-checkins] r45839 - lxml/trunk/doc Message-ID: <20070818095253.221D881E8@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:52:52 2007 New Revision: 45839 Added: lxml/trunk/doc/elementsoup.txt - copied unchanged from r45838, lxml/branch/html/doc/elementsoup.txt Log: copied docs from html branch From scoder at codespeak.net Sat Aug 18 11:54:14 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 11:54:14 +0200 (CEST) Subject: [Lxml-checkins] r45840 - lxml/trunk/doc Message-ID: <20070818095414.A6F2381A6@code0.codespeak.net> Author: scoder Date: Sat Aug 18 11:54:14 2007 New Revision: 45840 Added: lxml/trunk/doc/lxmlhtml.txt - copied unchanged from r45839, lxml/branch/html/doc/lxmlhtml.txt Log: copied docs from html branch From scoder at codespeak.net Sat Aug 18 12:18:30 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:18:30 +0200 (CEST) Subject: [Lxml-checkins] r45841 - lxml/trunk/src/lxml Message-ID: <20070818101830.365EE81F2@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:18:29 2007 New Revision: 45841 Added: lxml/trunk/src/lxml/usedoctest.py - copied unchanged from r45840, lxml/branch/html/src/lxml/usedoctest.py Log: copied lxml.usedoctest from html branch From scoder at codespeak.net Sat Aug 18 12:28:59 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:28:59 +0200 (CEST) Subject: [Lxml-checkins] r45842 - in lxml/trunk: . doc Message-ID: <20070818102859.2689B81EC@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:28:57 2007 New Revision: 45842 Modified: lxml/trunk/CHANGES.txt lxml/trunk/doc/mkhtml.py lxml/trunk/setup.py Log: integrated lxml.html Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Aug 18 12:28:57 2007 @@ -8,14 +8,22 @@ Features added -------------- -* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` - and ``externalDTD`` that return a DTD object for the internal or external - subset of the document respectively. +* HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup`` + +* New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified + doctests based on XML/HTML output. Use by importing ``lxml.usedoctest`` or + ``lxml.html.usedoctest`` from within a doctest. + +* New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS + selectors. + +* New package ``lxml.html`` written by Ian Bicking for sophisticated HTML + handling. * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. -* Schematron validation +* Schematron validation (incomplete in libxml2) * Extended type support for ``objectify.E`` based on registered PyTypes. Supports an additional argument to ``PyType()`` that takes a conversion @@ -71,6 +79,10 @@ 1.3.4 (???) ================== +* The ``docinfo`` on ElementTree objects has new properties ``internalDTD`` + and ``externalDTD`` that return a DTD object for the internal or external + subset of the document respectively. + * Serialising an ElementTree now includes any internal DTD subsets that are part of the document, as well as comments and PIs that are siblings of the root node. Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Sat Aug 18 12:28:57 2007 @@ -6,7 +6,8 @@ 'performance.txt', 'build.txt')), ('Developing with lxml', ('tutorial.txt', 'api.txt', 'parsing.txt', 'validation.txt', 'xpathxslt.txt', - 'objectify.txt')), + 'objectify.txt', 'lxmlhtml.txt', + 'cssselect.txt', 'elementsoup.txt')), ('Extending lxml', ('resolvers.txt', 'extensions.txt', 'element_classes.txt', 'sax.txt', 'capi.txt')), ] Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Sat Aug 18 12:28:57 2007 @@ -85,7 +85,7 @@ ], package_dir = {'': 'src'}, - packages = ['lxml'], + packages = ['lxml', 'lxml.html'], zip_safe = False, ext_modules = setupinfo.ext_modules( STATIC_INCLUDE_DIRS, STATIC_LIBRARY_DIRS, STATIC_CFLAGS), From scoder at codespeak.net Sat Aug 18 12:31:51 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:31:51 +0200 (CEST) Subject: [Lxml-checkins] r45843 - lxml/trunk Message-ID: <20070818103151.BF9228202@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:31:51 2007 New Revision: 45843 Modified: lxml/trunk/CHANGES.txt Log: cleanup Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Aug 18 12:31:51 2007 @@ -17,8 +17,8 @@ * New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS selectors. -* New package ``lxml.html`` written by Ian Bicking for sophisticated HTML - handling. +* New package ``lxml.html`` written by Ian Bicking for advanced HTML + treatment. * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. From scoder at codespeak.net Sat Aug 18 12:47:47 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 18 Aug 2007 12:47:47 +0200 (CEST) Subject: [Lxml-checkins] r45845 - lxml/trunk/doc Message-ID: <20070818104747.02AE881E5@code0.codespeak.net> Author: scoder Date: Sat Aug 18 12:47:46 2007 New Revision: 45845 Modified: lxml/trunk/doc/lxmlhtml.txt Log: doc cleanup Modified: lxml/trunk/doc/lxmlhtml.txt ============================================================================== --- lxml/trunk/doc/lxmlhtml.txt (original) +++ lxml/trunk/doc/lxmlhtml.txt Sat Aug 18 12:47:46 2007 @@ -351,9 +351,11 @@ In addition to cleaning up malicious HTML, ``lxml.html.clean`` contains functions to do other things to your HTML. This includes -autolinking: +autolinking:: - ``autolink(doc, ...)`` and ``autolink_html(html, ...)`` + autolink(doc, ...) + + autolink_html(html, ...) This finds anything that looks like a link (e.g., ``http://example.com``) in the *text* of an HTML document, and @@ -378,9 +380,11 @@ wordwrap -------- -You can also wrap long words in your html: +You can also wrap long words in your html:: + + word_break(doc, max_width=40, ...) - ``word_break(doc, max_width=40, ...)`` and ``word_break_html(html, ...)`` + word_break_html(html, ...) This finds any long words in the text of the document and inserts ``​`` in the document (which is the Unicode zero-width space). @@ -416,7 +420,7 @@ >>> doc = HTML(content) >>> doc.make_links_absolute(url) -Then we create some objects to put the information in: +Then we create some objects to put the information in:: >>> class Card(object): ... def __init__(self, **kw): @@ -426,7 +430,7 @@ ... def __init__(self, phone, types=()): ... self.phone, self.types = phone, types -And some generally handy functions for microformats: +And some generally handy functions for microformats:: >>> def get_text(el, class_name): ... els = el.find_class(class_name) @@ -442,7 +446,7 @@ ... # Ideally this would parse street, etc. ... return el.find_class('adr') -Then the parsing: +Then the parsing:: >>> for el in doc.find_class('hcard'): ... card = Card() From scoder at codespeak.net Mon Aug 20 11:14:58 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 20 Aug 2007 11:14:58 +0200 (CEST) Subject: [Lxml-checkins] r45875 - in lxml/branch/lxml-1.3: . src/lxml src/lxml/tests Message-ID: <20070820091458.51C45819C@code0.codespeak.net> Author: scoder Date: Mon Aug 20 11:14:56 2007 New Revision: 45875 Modified: lxml/branch/lxml-1.3/CHANGES.txt lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Log: raise Warning instead of Error for ':' tag names Modified: lxml/branch/lxml-1.3/CHANGES.txt ============================================================================== --- lxml/branch/lxml-1.3/CHANGES.txt (original) +++ lxml/branch/lxml-1.3/CHANGES.txt Mon Aug 20 11:14:56 2007 @@ -24,6 +24,15 @@ Other changes ------------- +* lxml now raises a TagNameWarning about tag names containing ':' instead of + an Error as 1.3.3 did. The reason is that a number of projects currently + misuse the previous lack of tag name validation to generate namespace + prefixes without declaring namespaces. Apart from the danger of generating + broken XML this way, it also breaks most of the namespace-aware tools in + XML, including XPath, XSLT and validation. lxml 1.3.x will continue to + support this bug with a Warning, while lxml 2.0 will be strict about + well-formed tag names (not only regarding ':'). + * Serialising an Element no longer includes its comment and PI siblings (only ElementTree serialisation includes them). Modified: lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/apihelpers.pxi Mon Aug 20 11:14:56 2007 @@ -691,6 +691,17 @@ else: raise TypeError, "Argument must be string or unicode." +cdef object warnings +import warnings +class TagNameWarning(SyntaxWarning): + pass + +cdef int warnAboutTagName() except -1: + warnings.warn("Tag names must not contain ':', " + "lxml 2.0 will enforce well-formed tag names " + "as required by the XML specification.", + TagNameWarning) + cdef _getNsTag(tag): """Given a tag, find namespace URI and tag name. Return None for NS uri if no namespace URI available. @@ -709,7 +720,7 @@ if c_ns_end is NULL: raise ValueError, "Invalid tag name" if cstd.strchr(c_ns_end, c':') is not NULL: - raise ValueError, "Invalid tag name" + warnAboutTagName() nslen = c_ns_end - c_tag taglen = python.PyString_GET_SIZE(tag) - nslen - 2 if taglen == 0: @@ -720,7 +731,7 @@ elif python.PyString_GET_SIZE(tag) == 0: raise ValueError, "Empty tag name" elif cstd.strchr(c_tag, c':') is not NULL: - raise ValueError, "Invalid tag name" + warnAboutTagName() return ns, tag cdef object _namespacedName(xmlNode* c_node): Modified: lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py (original) +++ lxml/branch/lxml-1.3/src/lxml/tests/test_etree.py Mon Aug 20 11:14:56 2007 @@ -8,7 +8,7 @@ """ -import unittest, copy, sys +import unittest, copy, sys, warnings from common_imports import etree, StringIO, HelperTestCase, fileInTestDir from common_imports import SillyFileLike, canonicalize, doctest @@ -32,6 +32,8 @@ seq.sort() return seq +warnings.simplefilter("error", etree.TagNameWarning) + class ETreeOnlyTestCase(HelperTestCase): """Tests only for etree, not ElementTree""" etree = etree @@ -68,11 +70,14 @@ def test_element_name_colon(self): Element = self.etree.Element - self.assertRaises(ValueError, Element, 'p:name') - self.assertRaises(ValueError, Element, '{test}p:name') + self.assertRaises(self.etree.TagNameWarning, + Element, 'p:name') + self.assertRaises(self.etree.TagNameWarning, + Element, '{test}p:name') el = Element('name') - self.assertRaises(ValueError, setattr, el, 'tag', 'p:name') + self.assertRaises(self.etree.TagNameWarning, + setattr, el, 'tag', 'p:name') def test_attribute_set(self): # ElementTree accepts arbitrary attribute values From scoder at codespeak.net Mon Aug 20 12:15:34 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 20 Aug 2007 12:15:34 +0200 (CEST) Subject: [Lxml-checkins] r45876 - in lxml/trunk/src/lxml: . tests Message-ID: <20070820101534.C5F348173@code0.codespeak.net> Author: scoder Date: Mon Aug 20 12:15:32 2007 New Revision: 45876 Modified: lxml/trunk/src/lxml/cstd.pxd lxml/trunk/src/lxml/etree_defs.h lxml/trunk/src/lxml/objectify.pyx lxml/trunk/src/lxml/python.pxd lxml/trunk/src/lxml/tests/test_objectify.py Log: objectify updates by Holger, support passing ObjectifiedElement objects into DateElement() Modified: lxml/trunk/src/lxml/cstd.pxd ============================================================================== --- lxml/trunk/src/lxml/cstd.pxd (original) +++ lxml/trunk/src/lxml/cstd.pxd Mon Aug 20 12:15:32 2007 @@ -9,6 +9,7 @@ cdef int strlen(char* s) cdef char* strstr(char* haystack, char* needle) cdef char* strchr(char* haystack, int needle) + cdef char* strrchr(char* haystack, int needle) cdef int strcmp(char* s1, char* s2) cdef int strncmp(char* s1, char* s2, size_t len) cdef void* memcpy(void* dest, void* src, size_t len) Modified: lxml/trunk/src/lxml/etree_defs.h ============================================================================== --- lxml/trunk/src/lxml/etree_defs.h (original) +++ lxml/trunk/src/lxml/etree_defs.h Mon Aug 20 12:15:32 2007 @@ -99,6 +99,7 @@ #define repr(o) PyObject_Repr(o) #define iter(o) PyObject_GetIter(o) #define _cstr(s) PyString_AS_STRING(s) +#define _fqtypename(o) (((PyTypeObject*)o)->ob_type->tp_name) static PyObject* __PY_NEW_GLOBAL_EMPTY_TUPLE = NULL; Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Mon Aug 20 12:15:32 2007 @@ -70,6 +70,16 @@ cdef object _ElementMaker from builder import ElementMaker as _ElementMaker +cdef object _typename(object t): + cdef char* c_name + cdef char* s + c_name = python._fqtypename(t) + s = cstd.strrchr(c_name, c'.') + if s == NULL: + return c_name + else: + return (s+1) + # namespace/name for "pytype" hint attribute cdef object PYTYPE_NAMESPACE cdef char* _PYTYPE_NAMESPACE @@ -232,7 +242,7 @@ if tag == 'text' or tag == 'pyval': # read-only ! raise TypeError, "attribute '%s' of '%s' objects is not writable"% \ - (tag, type(self).__name__) + (tag, _typename(self)) elif tag == 'tail': cetree.setTailText(self._c_node, value) return @@ -916,6 +926,15 @@ def __lower_bool(b): return _lower_bool(b) +cdef _get_pytypename(obj): + if python.PyUnicode_Check(obj): + return "str" + else: + return _typename(obj) + +def __get_pytypename(obj): + return _get_pytypename(obj) + cdef _registerPyTypes(): pytype = PyType('int', int, IntElement) pytype.xmlSchemaTypes = ("int", "short", "byte", "unsignedShort", @@ -1020,7 +1039,6 @@ """Type map for the ElementMaker. """ cdef object _typemap - cdef object _typemap_get def __init__(self, initial=None): if initial is None: @@ -1132,7 +1150,7 @@ else: value = repr(value) result = "%s%s = %s [%s]\n" % (indentstr, element.tag, - value, type(element).__name__) + value, _typename(element)) xsi_ns = "{%s}" % XML_SCHEMA_INSTANCE_NS pytype_ns = "{%s}" % PYTYPE_NAMESPACE for name, value in cetree.iterattributes(element, 3): @@ -2019,6 +2037,13 @@ attrib = dict(attrib) attrib.update(_attributes) _attributes = attrib + if isinstance(_value, ObjectifiedElement): + if _pytype is None: + if _xsi is None and not _attributes and nsmap is _DEFAULT_NSMAP: + # special case: no change! + return _value.__copy__() + elif PYTYPE_ATTRIBUTE not in _attributes: + _pytype = _get_pytypename(_value) if isinstance(_value, ObjectifiedDataElement): # reuse existing nsmap unless redefined in nsmap parameter temp = _value.nsmap @@ -2070,9 +2095,9 @@ if dict_result is not NULL: _pytype = (dict_result).name - if _value is None: + if _value is None and _pytype != "str": + _pytype = _pytype or "NoneType" strval = None - _pytype = "NoneType" elif python._isString(_value): strval = _value elif python.PyBool_Check(_value): @@ -2102,7 +2127,7 @@ type_check(strval) if _pytype is not None: - if _pytype == "NoneType": + if _pytype == "NoneType" or _pytype == "none": strval = None python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") else: Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Mon Aug 20 12:15:32 2007 @@ -107,6 +107,7 @@ cdef int _isString(object obj) cdef int isinstance(object instance, object classes) cdef int issubclass(object derived, object superclasses) + cdef char* _fqtypename(object t) cdef int hasattr(object obj, object attr) cdef object getattr(object obj, object attr) cdef int callable(object obj) Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Mon Aug 20 12:15:32 2007 @@ -658,6 +658,18 @@ self.assertEquals(value.text, None) self.assertEquals(value.pyval, None) + def test_data_element_pytype_none_compat(self): + # pre-2.0 lxml called NoneElement "none" + pyval = 1 + pytype = "none" + objclass = objectify.NoneElement + value = objectify.DataElement(pyval, _pytype=pytype) + self.assert_(isinstance(value, objclass), + "DataElement(%s, _pytype='%s') returns %s, expected %s" + % (pyval, pytype, type(value), objclass)) + self.assertEquals(value.text, None) + self.assertEquals(value.pyval, None) + def test_schema_types(self): XML = self.XML root = XML('''\ From scoder at codespeak.net Tue Aug 21 17:20:46 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 21 Aug 2007 17:20:46 +0200 (CEST) Subject: [Lxml-checkins] r45900 - lxml/trunk/src/lxml/html Message-ID: <20070821152046.99B898156@code0.codespeak.net> Author: scoder Date: Tue Aug 21 17:20:46 2007 New Revision: 45900 Modified: lxml/trunk/src/lxml/html/__init__.py Log: some cleanup in iterlinks() Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Tue Aug 21 17:20:46 2007 @@ -242,16 +242,17 @@ """ link_attrs = defs.link_attrs for el in self.getiterator(): + attribs = el.attrib for attrib in link_attrs: - if attrib in el.attrib: - yield (el, attrib, el.attrib[attrib], 0) + if attrib in attribs: + yield (el, attrib, attribs[attrib], 0) if el.tag == 'style' and el.text: for match in _css_url_re.finditer(el.text): yield (el, None, match.group(1), match.start(1)) for match in _css_import_re.finditer(el.text): yield (el, None, match.group(1), match.start(1)) - if 'style' in el.attrib: - for match in _css_url_re.finditer(el.attrib['style']): + if 'style' in attribs: + for match in _css_url_re.finditer(attribs['style']): yield (el, 'style', match.group(1), match.start(1)) def rewrite_links(self, link_repl_func, resolve_base_href=True, From scoder at codespeak.net Wed Aug 22 22:22:56 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:22:56 +0200 (CEST) Subject: [Lxml-checkins] r45918 - lxml/trunk/src/lxml Message-ID: <20070822202256.116548172@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:22:55 2007 New Revision: 45918 Modified: lxml/trunk/src/lxml/objectify.pyx Log: cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:22:55 2007 @@ -814,7 +814,7 @@ """ cdef readonly object name cdef readonly object type_check - cdef object _stringify + cdef object _add_text cdef object _type cdef object _schema_types def __init__(self, name, type_check, type_class, stringify=None): @@ -831,9 +831,9 @@ self._type = type_class self.type_check = type_check if stringify is None: - self._stringify = _StringValueSetter(__builtin__.str) + self._add_text = _StringValueSetter(__builtin__.str) else: - self._stringify = _StringValueSetter(stringify) + self._add_text = _StringValueSetter(stringify) self._schema_types = [] def __repr__(self): @@ -1081,7 +1081,7 @@ result = python.PyDict_GetItem(_PYTYPE_DICT, name) if result is NULL: return None - return (result)._stringify + return (result)._add_text return result def __contains__(self, type): @@ -1798,110 +1798,6 @@ tree.xmlSetNsProp(c_node, c_ns, "nil", "true") tree.END_FOR_EACH_ELEMENT_FROM(c_node) -def __xsiannotate(element_or_tree, ignore_old=True): - """Recursively annotates the elements of an XML tree with 'xsi:type' - attributes. - - If the 'ignore_old' keyword argument is True (the default), current - 'xsi:type' attributes will be ignored and replaced. Otherwise, they will be - checked and only replaced if they no longer fit the current text value. - - Note that tha mapping from Python types to XSI types is usually ambiguous. - Currently, only the first XSI type name in the corresponding PyType - definition will be used for annotation. Thus, you should consider naming - the widest type first here if you define additional types. - """ - cdef _Element element - cdef _Document doc - cdef int ignore - cdef int istree - cdef tree.xmlNode* c_node - cdef tree.xmlNs* c_ns - cdef python.PyObject* dict_result - cdef PyType pytype - element = cetree.rootNodeOrRaise(element_or_tree) - doc = element._doc - ignore = bool(ignore_old) - - StrType = _PYTYPE_DICT.get('str') - c_node = element._c_node - tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) - if c_node.type == tree.XML_ELEMENT_NODE: - typename = None - pytype = None - value = None - istree = 0 - if not ignore: - # check that old value is valid - typename = cetree.attributeValueFromNsName( - c_node, _XML_SCHEMA_INSTANCE_NS, "type") - if typename is not None: - dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, typename) - if dict_result is NULL and ':' in typename: - prefix, typename = typename.split(':', 1) - dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, typename) - if dict_result is not NULL: - pytype = dict_result - if pytype is not StrType: - # StrType does not have a typecheck but is the default anyway, - # so just accept it if given as type information - pytype = _check_type(c_node, pytype) - if pytype is None: - typename = None - - if typename is None: - if pytype is None: - # check for pytype hint - value = cetree.attributeValueFromNsName( - c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) - - if value is not None: - if value == TREE_PYTYPE: - istree = 1 - else: - dict_result = python.PyDict_GetItem(_PYTYPE_DICT, value) - if dict_result is not NULL: - pytype = dict_result - if pytype is not StrType: - pytype = _check_type(c_node, pytype) - - if not istree and pytype is None: - # try to guess type - if cetree.findChildForwards(c_node, 0) is NULL: - # element has no children => data class - pytype = _guessPyType(textOf(c_node), StrType) - else: - istree = 1 - - if typename is None and not istree and pytype is not None: - if python.PyList_GET_SIZE(pytype._schema_types) > 0: - # pytype->xsi:type is a 1:n mapping so simply take the first - typename = pytype._schema_types[0] - - if typename is None or istree: - # delete attribute if it exists - cetree.delAttributeFromNsName(c_node, _XML_SCHEMA_INSTANCE_NS, "type") - else: - # update or create attribute - c_ns = cetree.findOrBuildNodeNsPrefix( - doc, c_node, _XML_SCHEMA_NS, 'xsd') - if c_ns is not NULL: - if ':' in typename: - prefix, name = typename.split(':', 1) - if c_ns.prefix is NULL or c_ns.prefix[0] == c'\0': - typename = name - elif cstd.strcmp(_cstr(prefix), c_ns.prefix) != 0: - prefix = c_ns.prefix - typename = prefix + ':' + name - elif c_ns.prefix is not NULL or c_ns.prefix[0] != c'\0': - prefix = c_ns.prefix - typename = prefix + ':' + typename - c_ns = cetree.findOrBuildNodeNsPrefix( - doc, c_node, _XML_SCHEMA_INSTANCE_NS, 'xsi') - tree.xmlSetNsProp(c_node, c_ns, "type", _cstr(typename)) - tree.END_FOR_EACH_ELEMENT_FROM(c_node) - - def deannotate(element_or_tree, pytype=True, xsi=True): """Recursively de-annotate the elements of an XML tree by removing 'pytype' and/or 'type' attributes. @@ -2042,19 +1938,17 @@ if _xsi is None and not _attributes and nsmap is _DEFAULT_NSMAP: # special case: no change! return _value.__copy__() - elif PYTYPE_ATTRIBUTE not in _attributes: - _pytype = _get_pytypename(_value) if isinstance(_value, ObjectifiedDataElement): # reuse existing nsmap unless redefined in nsmap parameter temp = _value.nsmap if temp is not None and temp: - temp = dict(_value.nsmap) + temp = dict(temp) temp.update(nsmap) nsmap = temp # reuse existing attributes unless redefined in attrib/_attributes temp = _value.attrib if temp is not None and temp: - temp = dict(_value.attrib) + temp = dict(temp) temp.update(_attributes) _attributes = temp # reuse existing xsi:type or py:pytype attributes, unless provided as From scoder at codespeak.net Wed Aug 22 22:24:51 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:24:51 +0200 (CEST) Subject: [Lxml-checkins] r45919 - lxml/trunk/src/lxml Message-ID: <20070822202451.2C721816C@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:24:49 2007 New Revision: 45919 Modified: lxml/trunk/src/lxml/objectify.pyx Log: new ElementMaker implementation specifically for objectify Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:24:49 2007 @@ -1030,6 +1030,78 @@ ################################################################################ # adapted ElementMaker supports registered PyTypes +cdef class ElementMaker: + cdef object _makeelement + cdef object _namespace + cdef object _nsmap + def __init__(self, namespace=None, nsmap=None, makeelement=None): + self._nsmap = nsmap + if namespace is None: + self._namespace = None + else: + self._namespace = "{%s}" % namespace + if makeelement is not None: + assert callable(makeelement) + self._makeelement = makeelement + else: + self._makeelement = None + + def __getattr__(self, tag): + if tag[0] != "{" and self._namespace is not None: + tag = self._namespace + tag + return _ObjectifyElementMakerCaller( + self._makeelement, tag, self._nsmap) + +cdef class _ObjectifyElementMakerCaller: + cdef object _tag + cdef object _nsmap + cdef object _element_factory + def __init__(self, element_factory, tag, nsmap): + self._element_factory = element_factory + self._tag = tag + self._nsmap = nsmap + + def __call__(self, *children, **attrib): + cdef _ObjectifyElementMakerCaller elementMaker + cdef python.PyObject* pytype + cdef _Element element + if self._element_factory is None: + element = cetree.makeElement( + self._tag, None, objectify_parser, + None, None, attrib, self._nsmap) + else: + element = self._element_factory(self._tag, attrib, self._nsmap) + + for child in children: + if child is None: + if len(children) == 1: + cetree.setAttributeValue( + element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") + elif python._isString(child): + _add_text(element, child) + elif isinstance(child, _Element): + cetree.appendChild(element, child) + elif isinstance(child, _ObjectifyElementMakerCaller): + elementMaker = <_ObjectifyElementMakerCaller>child + if elementMaker._element_factory is None: + child = cetree.makeElement( + elementMaker._tag, element._doc, objectify_parser, + None, None, None, None) + else: + child = elementMaker._element_factory( + (<_ObjectifyElementMakerCaller>child)._tag) + cetree.appendChild(element, child) + else: + pytype = python.PyDict_GetItem( + _PYTYPE_DICT, _typename(child)) + if pytype is not NULL: + (pytype)._add_text(element, child) + else: + child = str(child) + _add_text(element, child) + + return element + class ElementMaker(_ElementMaker): def __init__(self, typemap=None): typemap = _ObjectifyTypemap(typemap) From scoder at codespeak.net Wed Aug 22 22:25:50 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:25:50 +0200 (CEST) Subject: [Lxml-checkins] r45920 - lxml/trunk/src/lxml Message-ID: <20070822202550.D556A816C@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:25:50 2007 New Revision: 45920 Modified: lxml/trunk/src/lxml/objectify.pyx Log: removed old ElementMaker implementation Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:25:50 2007 @@ -1102,82 +1102,6 @@ return element -class ElementMaker(_ElementMaker): - def __init__(self, typemap=None): - typemap = _ObjectifyTypemap(typemap) - _ElementMaker.__init__(self, typemap, objectify_parser.makeelement) - -cdef class _ObjectifyTypemap: - """Type map for the ElementMaker. - """ - cdef object _typemap - - def __init__(self, initial=None): - if initial is None: - self._typemap = {} - else: - self._typemap = dict(initial) - - self._typemap[__builtin__.str] = __add_text - self._typemap[__builtin__.str.__name__] = __add_text - - self._typemap[__builtin__.unicode] = __add_text - self._typemap[__builtin__.unicode.__name__] = __add_text - - self._typemap[__builtin__.int] = __add_stringifiable - self._typemap[__builtin__.int.__name__] = __add_stringifiable - - self._typemap[__builtin__.long] = __add_stringifiable - self._typemap[__builtin__.long.__name__] = __add_stringifiable - - self._typemap[__builtin__.float] = __add_stringifiable - self._typemap[__builtin__.float.__name__] = __add_stringifiable - - self._typemap[__builtin__.bool] = __add_bool - self._typemap[__builtin__.bool.__name__] = __add_bool - - NoneType = type(None) - self._typemap[NoneType] = __add_none - self._typemap[NoneType.__name__] = __add_none - - def copy(self): - return self - - def get(self, type): - cdef python.PyObject* result - result = python.PyDict_GetItem(self._typemap, type) - if result is NULL: - name = type.__name__ - result = python.PyDict_GetItem(self._typemap, name) - if result is NULL: - result = python.PyDict_GetItem(_PYTYPE_DICT, name) - if result is NULL: - return None - return (result)._add_text - return result - - def __contains__(self, type): - return type in self._typemap or type.__name__ in self._typemap - - def __getitem__(self, key): - return self._typemap[key] - - def __setitem__(self, key, value): - self._typemap[key] = value - self._typemap[key.__name__] = value - -def __add_stringifiable(_Element elem not None, number): - _add_text(elem, str(number)) - -def __add_bool(_Element elem not None, bool_val): - _add_text(elem, _lower_bool(bool_val)) - -def __add_text(_Element elem not None, text): - _add_text(elem, text) - -def __add_none(_Element elem not None, none_val): - pass - cdef _add_text(_Element elem, text): cdef tree.xmlNode* c_child c_child = cetree.findChildBackwards(elem._c_node, 0) From scoder at codespeak.net Wed Aug 22 22:44:57 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:44:57 +0200 (CEST) Subject: [Lxml-checkins] r45921 - lxml/trunk/src/lxml Message-ID: <20070822204457.91AEB81FF@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:44:55 2007 New Revision: 45921 Modified: lxml/trunk/src/lxml/objectify.pyx Log: cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:44:55 2007 @@ -1062,7 +1062,6 @@ self._nsmap = nsmap def __call__(self, *children, **attrib): - cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element if self._element_factory is None: @@ -1074,7 +1073,7 @@ for child in children: if child is None: - if len(children) == 1: + if python.PyTuple_GET_SIZE(children) == 1: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif python._isString(child): @@ -1082,14 +1081,14 @@ elif isinstance(child, _Element): cetree.appendChild(element, child) elif isinstance(child, _ObjectifyElementMakerCaller): - elementMaker = <_ObjectifyElementMakerCaller>child - if elementMaker._element_factory is None: + if (<_ObjectifyElementMakerCaller>child)._element_factory is None: child = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: - child = elementMaker._element_factory( - (<_ObjectifyElementMakerCaller>child)._tag) + child = (<_ObjectifyElementMakerCaller>child). + _element_factory(( + <_ObjectifyElementMakerCaller>child)._tag) cetree.appendChild(element, child) else: pytype = python.PyDict_GetItem( From scoder at codespeak.net Wed Aug 22 22:51:53 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 22 Aug 2007 22:51:53 +0200 (CEST) Subject: [Lxml-checkins] r45922 - lxml/trunk/src/lxml Message-ID: <20070822205153.97CC7812F@code0.codespeak.net> Author: scoder Date: Wed Aug 22 22:51:53 2007 New Revision: 45922 Modified: lxml/trunk/src/lxml/objectify.pyx Log: more cleanup, small fix for last commit Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Wed Aug 22 22:51:53 2007 @@ -1062,8 +1062,10 @@ self._nsmap = nsmap def __call__(self, *children, **attrib): + cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element + cdef _Element childElement if self._element_factory is None: element = cetree.makeElement( self._tag, None, objectify_parser, @@ -1079,17 +1081,17 @@ elif python._isString(child): _add_text(element, child) elif isinstance(child, _Element): - cetree.appendChild(element, child) + cetree.appendChild(element, <_Element>child) elif isinstance(child, _ObjectifyElementMakerCaller): - if (<_ObjectifyElementMakerCaller>child)._element_factory is None: - child = cetree.makeElement( + elementMaker = <_ObjectifyElementMakerCaller>child + if elementMaker._element_factory is None: + childElement = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: - child = (<_ObjectifyElementMakerCaller>child). - _element_factory(( - <_ObjectifyElementMakerCaller>child)._tag) - cetree.appendChild(element, child) + childElement = elementMaker._element_factory( + elementMaker._tag) + cetree.appendChild(element, childElement) else: pytype = python.PyDict_GetItem( _PYTYPE_DICT, _typename(child)) From scoder at codespeak.net Fri Aug 24 08:34:59 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 08:34:59 +0200 (CEST) Subject: [Lxml-checkins] r45940 - lxml/trunk/src/lxml Message-ID: <20070824063459.8A0B481B6@code0.codespeak.net> Author: scoder Date: Fri Aug 24 08:34:57 2007 New Revision: 45940 Modified: lxml/trunk/src/lxml/proxy.pxi lxml/trunk/src/lxml/python.pxd Log: avoid incref/decref around decrefing Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri Aug 24 08:34:57 2007 @@ -38,7 +38,7 @@ c_node = proxy._c_node assert c_node._private is proxy, "Tried to unregister unknown proxy" c_node._private = NULL - python.Py_DECREF(proxy._gc_doc) + python._Py_DECREF(proxy._gc_doc) ################################################################################ # temporarily make a node the root node of its document Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Aug 24 08:34:57 2007 @@ -10,6 +10,7 @@ cdef void Py_INCREF(object o) cdef void Py_DECREF(object o) + cdef void _Py_DECREF "Py_DECREF" (PyObject* o) cdef FILE* PyFile_AsFile(object p) cdef int PyFile_Check(object p) From scoder at codespeak.net Fri Aug 24 08:35:17 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 08:35:17 +0200 (CEST) Subject: [Lxml-checkins] r45941 - lxml/trunk/src/lxml Message-ID: <20070824063517.81CA481B6@code0.codespeak.net> Author: scoder Date: Fri Aug 24 08:35:17 2007 New Revision: 45941 Modified: lxml/trunk/src/lxml/etree.pyx Log: comment Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Fri Aug 24 08:35:17 2007 @@ -728,7 +728,7 @@ property attrib: """Element attribute dictionary. Where possible, use get(), set(), - keys() and items() to access element attributes. + keys(), values() and items() to access element attributes. """ def __get__(self): if self._attrib is None: From scoder at codespeak.net Fri Aug 24 10:11:48 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 10:11:48 +0200 (CEST) Subject: [Lxml-checkins] r45942 - lxml/trunk/src/lxml Message-ID: <20070824081148.8A16181BF@code0.codespeak.net> Author: scoder Date: Fri Aug 24 10:11:47 2007 New Revision: 45942 Modified: lxml/trunk/src/lxml/objectify.pyx lxml/trunk/src/lxml/pyclasslookup.pyx Log: docstring cleanup Modified: lxml/trunk/src/lxml/objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/objectify.pyx (original) +++ lxml/trunk/src/lxml/objectify.pyx Fri Aug 24 10:11:47 2007 @@ -806,7 +806,7 @@ string value. It may be None in which case it is not considered for type guessing. - Example: + Example:: PyType('int', int, MyIntClass).register() Note that the order in which types are registered matters. The first Modified: lxml/trunk/src/lxml/pyclasslookup.pyx ============================================================================== --- lxml/trunk/src/lxml/pyclasslookup.pyx (original) +++ lxml/trunk/src/lxml/pyclasslookup.pyx Fri Aug 24 10:11:47 2007 @@ -246,15 +246,15 @@ cdef class PythonElementClassLookup(FallbackElementClassLookup): """Element class lookup based on a subclass method. - To use it, inherit from this class and override the method + To use it, inherit from this class and override the lookup method to + lookup the element class for a node:: lookup(self, document, node_proxy) - to lookup the element class for a node. The first argument is the opaque - document instance that contains the Element. The second arguments is a - lightweight Element proxy implementation that is only valid during the - lookup. Do not try to keep a reference to it. Once the lookup is done, the - proxy will be invalid. + The first argument is the opaque document instance that contains the + Element. The second arguments is a lightweight Element proxy + implementation that is only valid during the lookup. Do not try to keep a + reference to it. Once the lookup is done, the proxy will be invalid. If you return None from this method, the fallback will be called. """ From scoder at codespeak.net Fri Aug 24 10:12:26 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 10:12:26 +0200 (CEST) Subject: [Lxml-checkins] r45943 - in lxml/trunk: . doc src/lxml src/lxml/tests Message-ID: <20070824081226.A82EE81BF@code0.codespeak.net> Author: scoder Date: Fri Aug 24 10:12:26 2007 New Revision: 45943 Modified: lxml/trunk/CHANGES.txt lxml/trunk/doc/extensions.txt lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/tests/test_xpathevaluator.py Log: provide the context node and a propagated evaluation context dict to XPath functions Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Aug 24 10:12:26 2007 @@ -8,6 +8,11 @@ Features added -------------- +* XPath extension functions can now access the current context node + (``context.context_node``) and use a context dictionary + (``context.eval_context``) from the context provided in their first + parameter + * HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup`` * New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified Modified: lxml/trunk/doc/extensions.txt ============================================================================== --- lxml/trunk/doc/extensions.txt (original) +++ lxml/trunk/doc/extensions.txt Fri Aug 24 10:12:26 2007 @@ -7,10 +7,9 @@ Here is how such a function looks like. As the first argument, it always -receives a dummy object. It is currently None, but do not rely on this as it -may become meaningful in later versions of lxml. The other arguments are -provided by the respective call in the XPath expression, one in the following -examples. Any number of arguments is allowed:: +receives a context object (see below). The other arguments are provided by +the respective call in the XPath expression, one in the following examples. +Any number of arguments is allowed:: >>> def hello(dummy, a): ... return "Hello %s" % a @@ -100,6 +99,40 @@ would rather complicate things than be of any help. +The XPath context +----------------- + +Functions get a context object as first parameter. In lxml 1.x, this value +was None, but since lxml 2.0 it provides two properties: ``eval_context`` and +``context_node``. The context node is the Element where the current function +is called:: + + >>> def print_tag(context, nodes): + ... print context.context_node.tag, [ n.tag for n in nodes ] + + >>> ns = etree.FunctionNamespace('http://mydomain.org/printtag') + >>> ns.prefix = "pt" + >>> ns["print_tag"] = print_tag + + >>> ignore = root.xpath("//*[pt:print_tag(.//*)]") + a ['b'] + b [] + +The ``eval_context`` is a dictionary that is local to the evaluation. It +allows functions to keep state:: + + >>> def print_context(context): + ... context.eval_context[context.context_node.tag] = "done" + ... entries = context.eval_context.items() + ... entries.sort() + ... print entries + >>> ns["print_context"] = print_context + + >>> ignore = root.xpath("//*[pt:print_context()]") + [('a', 'done')] + [('a', 'done'), ('b', 'done')] + + Evaluators and XSLT ------------------- @@ -238,9 +271,12 @@ What to return from a function ------------------------------ +.. _`XPath return values`: xpathxslt.html#xpath-return-values + Extension functions can return any data type for which there is an XPath -equivalent. This includes numbers, boolean values, elements and lists of -elements. Note that integers will also be returned as floats:: +equivalent (see the documentation on `XPath return values`). This includes +numbers, boolean values, elements and lists of elements. Note that integers +will also be returned as floats:: >>> def returnsFloat(_): ... return 1.7 Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Fri Aug 24 10:12:26 2007 @@ -36,6 +36,7 @@ cdef object _global_namespaces cdef object _utf_refs cdef object _function_cache + cdef object _eval_context_dict # for exception handling and temporary reference keeping: cdef _TempStore _temp_refs cdef _ExceptionContext _exc @@ -45,6 +46,7 @@ self._utf_refs = {} self._global_namespaces = [] self._function_cache = {} + self._eval_context_dict = None if extensions is not None: # convert extensions to UTF-8 @@ -123,6 +125,7 @@ #xpath.xmlXPathRegisteredNsCleanup(self._xpathCtxt) #self.unregisterGlobalNamespaces() python.PyDict_Clear(self._utf_refs) + self._eval_context_dict = None self._doc = None cdef _release_context(self): @@ -268,6 +271,31 @@ return dict_result return None + # Python access to the XPath context for extension functions + + property context_node: + def __get__(self): + cdef xmlNode* c_node + if self._xpathCtxt is NULL: + raise XPathError, \ + "XPath context is only usable during the evaluation" + c_node = self._xpathCtxt.node + if c_node is NULL: + raise XPathError, "no context node" + if c_node.doc != self._xpathCtxt.doc: + raise XPathError, \ + "document-external context nodes are not supported" + if self._doc is None: + raise XPathError, \ + "document context is missing" + return _elementFactory(self._doc, c_node) + + property eval_context: + def __get__(self): + if self._eval_context_dict is None: + self._eval_context_dict = {} + return self._eval_context_dict + # Python reference keeping during XPath function evaluation cdef _release_temp_refs(self): @@ -538,7 +566,7 @@ python.PyList_Append(args, o) python.PyList_Reverse(args) - res = function(None, *args) + res = function(context, *args) # wrap result for XPath consumption obj = _wrapXPathObject(res) # prevent Python from deallocating elements handed to libxml2 Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Fri Aug 24 10:12:26 2007 @@ -265,6 +265,72 @@ self.assertEquals('Dag', r[1].text) self.assertEquals('Honk', r[2].text) + def test_xpath_context_node(self): + tree = self.parse('') + + check_call = [] + def check_context(ctxt, nodes): + self.assertEquals(len(nodes), 1) + check_call.append(nodes[0].tag) + self.assertEquals(ctxt.context_node, nodes[0]) + return True + + find = etree.XPath("//*[p:foo(.)]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + find(tree) + + check_call.sort() + self.assertEquals(check_call, ["a", "b", "c", "root"]) + + def test_xpath_eval_context_propagation(self): + tree = self.parse('') + + check_call = {} + def check_context(ctxt, nodes): + self.assertEquals(len(nodes), 1) + tag = nodes[0].tag + # empty during the "b" call, a "b" during the "c" call + check_call[tag] = ctxt.eval_context.get("b") + ctxt.eval_context[tag] = tag + return True + + find = etree.XPath("//b[p:foo(.)]/c[p:foo(.)]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + result = find(tree) + + self.assertEquals(result, [tree.getroot()[1][0]]) + self.assertEquals(check_call, {'b':None, 'c':'b'}) + + def test_xpath_eval_context_clear(self): + tree = self.parse('') + + check_call = {} + def check_context(ctxt): + check_call["done"] = True + # context must be empty for each new evaluation + self.assertEquals(len(ctxt.eval_context), 0) + ctxt.eval_context["test"] = True + return True + + find = etree.XPath("//b[p:foo()]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + result = find(tree) + + self.assertEquals(result, [tree.getroot()[1]]) + self.assertEquals(check_call["done"], True) + + check_call.clear() + find = etree.XPath("//b[p:foo()]", + namespaces={'p' : 'ns'}, + extensions=[{('ns', 'foo') : check_context}]) + result = find(tree) + + self.assertEquals(result, [tree.getroot()[1]]) + self.assertEquals(check_call["done"], True) + def test_xpath_variables(self): x = self.parse('') e = etree.XPathEvaluator(x) From scoder at codespeak.net Fri Aug 24 10:33:06 2007 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 24 Aug 2007 10:33:06 +0200 (CEST) Subject: [Lxml-checkins] r45944 - lxml/trunk/src/lxml Message-ID: <20070824083306.9EBDF81D5@code0.codespeak.net> Author: scoder Date: Fri Aug 24 10:33:05 2007 New Revision: 45944 Modified: lxml/trunk/src/lxml/etree.pyx lxml/trunk/src/lxml/proxy.pxi Log: another deallocation bug: order matters ... Modified: lxml/trunk/src/lxml/etree.pyx ============================================================================== --- lxml/trunk/src/lxml/etree.pyx (original) +++ lxml/trunk/src/lxml/etree.pyx Fri Aug 24