From lxml-checkins at codespeak.net Mon Jun 1 19:16:50 2009 From: lxml-checkins at codespeak.net (Allan) Date: Mon, 1 Jun 2009 19:16:50 +0200 (CEST) Subject: [Lxml-checkins] Is that you? Found u by surname Message-ID: <394515397653081.QPWHOPRGKMIZQVL@apn-89-223-159-89.vodafone.hu> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090601/c1d63afb/attachment.htm From lxml-checkins at codespeak.net Tue Jun 2 16:12:28 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 2 Jun 2009 16:12:28 +0200 (CEST) Subject: [Lxml-checkins] Pre-register info #253998 Message-ID: <20090602141228.08F9F169F5F@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090602/1bc4fa8c/attachment.htm From scoder at codespeak.net Tue Jun 2 21:14:35 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Jun 2009 21:14:35 +0200 (CEST) Subject: [Lxml-checkins] r65545 - lxml/trunk Message-ID: <20090602191435.BB3AB169EC5@codespeak.net> Author: scoder Date: Tue Jun 2 21:14:32 2009 New Revision: 65545 Modified: lxml/trunk/ (props changed) lxml/trunk/buildlibxml.py Log: r5130 at delle: sbehnel | 2009-06-02 21:05:27 +0200 fix Py3 syntax in lib builder script Modified: lxml/trunk/buildlibxml.py ============================================================================== --- lxml/trunk/buildlibxml.py (original) +++ lxml/trunk/buildlibxml.py Tue Jun 2 21:14:32 2009 @@ -1,13 +1,21 @@ import os, re, sys from distutils import log, sysconfig + +try: + from urlparse import urlsplit, urljoin + from urllib import urlretrieve +except ImportError: + from urllib.parse import urlsplit + from urllib.request import urlretrieve + ## Routines to download and build libxml2/xslt: LIBXML2_LOCATION = 'ftp://xmlsoft.org/libxml2/' match_libfile_version = re.compile('^[^-]*-([.0-9-]+)[.].*').match def ftp_listdir(url): - import ftplib, urlparse, posixpath - scheme, netloc, path, qs, fragment = urlparse.urlsplit(url) + import ftplib, posixpath + scheme, netloc, path, qs, fragment = urlsplit(url) assert scheme.lower() == 'ftp' server = ftplib.FTP(netloc) server.login() @@ -36,7 +44,6 @@ def download_library(dest_dir, location, name, version_re, filename, version=None): - import urllib, urlparse if version is None: try: fns = ftp_listdir(location) @@ -66,14 +73,14 @@ else: raise filename = filename % version - full_url = urlparse.urljoin(location, filename) + full_url = urljoin(location, filename) dest_filename = os.path.join(dest_dir, filename) if os.path.exists(dest_filename): print('Using existing %s downloaded into %s (delete this file if you want to re-download the package)' % (name, dest_filename)) else: print('Downloading %s into %s' % (name, dest_filename)) - urllib.urlretrieve(full_url, dest_filename) + urlretrieve(full_url, dest_filename) return dest_filename ## Backported method of tarfile.TarFile.extractall (doesn't exist in 2.4): @@ -105,7 +112,7 @@ # Extract directories with a safe mode. directories.append((tarinfo.name, tarinfo)) tarinfo = copy.copy(tarinfo) - tarinfo.mode = 0700 + tarinfo.mode = 448 # 0700 self.extract(tarinfo, path) # Reverse sort directories. @@ -119,11 +126,11 @@ self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) - except tarfile.ExtractError, e: + except tarfile.ExtractError: if self.errorlevel > 1: raise else: - self._dbg(1, "tarfile: %s" % e) + self._dbg(1, "tarfile: %s" % sys.exc_info()[1]) def unpack_tarball(tar_filename, dest): import tarfile From scoder at codespeak.net Tue Jun 2 21:14:39 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Jun 2009 21:14:39 +0200 (CEST) Subject: [Lxml-checkins] r65546 - in lxml/trunk: . doc Message-ID: <20090602191439.34609169EC5@codespeak.net> Author: scoder Date: Tue Jun 2 21:14:38 2009 New Revision: 65546 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt lxml/trunk/version.txt Log: r5131 at delle: sbehnel | 2009-06-02 21:11:16 +0200 prepare release of lxml 2.2.1 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Jun 2 21:14:38 2009 @@ -2,8 +2,8 @@ lxml changelog ============== -Under development -================= +2.2.1 (2009-06-02) +================== Features added -------------- @@ -17,8 +17,11 @@ Bugs fixed ---------- +* The script for statically building libxml2 and libxslt didn't work + in Py3. + * ``XMLSchema()`` also passes invalid schema documents on to libxml2 - for parsing (which could lead to a crash before release 2.6.24) + for parsing (which could lead to a crash before release 2.6.24). Other changes ------------- Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Tue Jun 2 21:14:38 2009 @@ -147,8 +147,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.2`_, released 2009-03-21 -(`changes for 2.2`_). `Older versions`_ are listed below. +The latest version is `lxml 2.2.1`_, released 2009-06-02 +(`changes for 2.2.1`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -221,7 +221,7 @@ `_ and the `current in-development version `_. -.. _`PDF documentation`: lxmldoc-2.2.pdf +.. _`PDF documentation`: lxmldoc-2.2.1.pdf * `lxml 2.2`_, released 2009-03-21 (`changes for 2.2`_) @@ -321,6 +321,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.2.1`: lxml-2.2.1.tgz .. _`lxml 2.2`: lxml-2.2.tgz .. _`lxml 2.2beta4`: lxml-2.2beta4.tgz .. _`lxml 2.2beta3`: lxml-2.2beta3.tgz @@ -370,6 +371,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.2.1`: changes-2.2.1.html .. _`changes for 2.2`: changes-2.2.html .. _`changes for 2.2beta4`: changes-2.2beta4.html .. _`changes for 2.2beta3`: changes-2.2beta3.html Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Tue Jun 2 21:14:38 2009 @@ -1 +1 @@ -2.2 +2.2.1 From lxml-checkins at codespeak.net Wed Jun 3 11:00:05 2009 From: lxml-checkins at codespeak.net (Official Site PFIZER.INC) Date: Wed, 3 Jun 2009 10:00:05 +0100 Subject: [Lxml-checkins] lxml-checkins@codespeak.net JUNE News on Pfizer ! Message-ID: <20090603110005.3657.qmail@amerblind.outbound.ed10.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090603/bbe8276e/attachment.htm -------------- next part -------------- New from WebMD: Sign-up today! You are subscribed as lxml-checkins at codespeak.net. View and manage your WebMD newsletter preferences. Subscribe to more newsletters. Change/update your email address. WebMD Privacy Policy WebMD Office of Privacy 1175 Peachtree Street, Suite 2400, Atlanta, GA 30361 ? 2009 WebMD, LLC. All rights reserved. From lxml-checkins at codespeak.net Thu Jun 4 16:37:32 2009 From: lxml-checkins at codespeak.net (Kleidon Doretha) Date: Thu, 4 Jun 2009 16:37:32 +0200 (CEST) Subject: [Lxml-checkins] Let's go for a walk Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090604/6c57cdf9/attachment-0001.htm From lxml-checkins at codespeak.net Fri Jun 5 14:24:38 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 5 Jun 2009 14:24:38 +0200 (CEST) Subject: [Lxml-checkins] Pre-register info #187981 Message-ID: <20090605122438.ADAE4168556@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090605/5d2293da/attachment.htm From scoder at codespeak.net Sat Jun 6 10:36:36 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 6 Jun 2009 10:36:36 +0200 (CEST) Subject: [Lxml-checkins] r65611 - in lxml/trunk: . src/lxml Message-ID: <20090606083636.3277B1684BE@codespeak.net> Author: scoder Date: Sat Jun 6 10:36:32 2009 New Revision: 65611 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi Log: r5134 at delle: sbehnel | 2009-06-06 10:31:38 +0200 cleanup Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Sat Jun 6 10:36:32 2009 @@ -455,7 +455,6 @@ return 0 cdef int _delAttribute(_Element element, key) except -1: - cdef xmlAttr* c_attr cdef char* c_href ns, tag = _getNsTag(key) if ns is None: From scoder at codespeak.net Sat Jun 6 10:36:44 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 6 Jun 2009 10:36:44 +0200 (CEST) Subject: [Lxml-checkins] r65612 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20090606083644.D368C1684BF@codespeak.net> Author: scoder Date: Sat Jun 6 10:36:44 2009 New Revision: 65612 Added: lxml/trunk/src/lxml/cleanup.pxi Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_etree.py Log: r5135 at delle: sbehnel | 2009-06-06 10:33:16 +0200 new helper functions to strip attributes/elements/subtrees from an XML tree Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Jun 6 10:36:44 2009 @@ -2,6 +2,23 @@ lxml changelog ============== +Under development +================== + +Features added +-------------- + +* New helper functions ``strip_attributes()``, ``strip_elements()``, + ``strip_tags()`` in lxml.etree to remove attributes/subtrees/tags + from a subtree. + +Bugs fixed +---------- + +Other changes +------------- + + 2.2.1 (2009-06-02) ================== Added: lxml/trunk/src/lxml/cleanup.pxi ============================================================================== --- (empty file) +++ lxml/trunk/src/lxml/cleanup.pxi Sat Jun 6 10:36:44 2009 @@ -0,0 +1,287 @@ +# functions for tree cleanup and removing elements from subtrees + +def cleanup_namespaces(tree_or_element): + u"""cleanup_namespaces(tree_or_element) + + Remove all namespace declarations from a subtree that are not used + by any of the elements in that tree. + """ + cdef _Element element + element = _rootNodeOrRaise(tree_or_element) + _removeUnusedNamespaceDeclarations(element._c_node) + +def strip_attributes(tree_or_element, *attribute_names): + u"""strip_attributes(tree_or_element, *attribute_names) + + Delete all attributes with the provided attribute names from an + Element (or ElementTree) and its descendants. + + Example usage:: + + strip_attributes(root_element, + 'simpleattr', + '{http://some/ns}attrname') + """ + cdef xmlNode* c_node + cdef xmlAttr* c_attr + cdef _Element element + cdef list ns_tags + cdef char* c_name + + element = _rootNodeOrRaise(tree_or_element) + if not attribute_names: return + + ns_tags = _sortedTagList([ _getNsTag(attr) + for attr in attribute_names ]) + ns_tags = [ (ns, tag if tag != '*' else None) + for ns, tag in ns_tags ] + + c_node = element._c_node + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + if c_node.type == tree.XML_ELEMENT_NODE: + if c_node.properties is not NULL: + for ns, tag in ns_tags: + # must search attributes manually to make sure we only + # match on blank tag names if there is no namespace + c_name = NULL if tag is None else _cstr(tag) + c_attr = c_node.properties + while c_attr is not NULL: + if ns is None: + if c_attr.ns is NULL or c_attr.ns.href is NULL: + if c_name is NULL or \ + cstd.strcmp(c_attr.name, c_name) == 0: + tree.xmlRemoveProp(c_attr) + break + elif c_attr.ns is not NULL and c_attr.ns.href is not NULL: + if cstd.strcmp(c_attr.ns.href, _cstr(ns)) == 0: + if c_name is NULL or \ + cstd.strcmp(c_attr.name, c_name) == 0: + tree.xmlRemoveProp(c_attr) + break + c_attr = c_attr.next + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + +def strip_elements(tree_or_element, *tag_names, bint with_tail=True): + u"""strip_elements(tree_or_element, *tag_names, with_tail=True) + + Delete all elements with the provided tag names from a tree or + subtree. This will remove the elements and their entire subtree, + including all their attributes, text content and descendants. It + will also remove the tail text of the element unless you + explicitly set the ``with_tail`` option to False. + + Note that this will not delete the element (or ElementTree root + element) that you passed even if it matches. It will only treat + its descendants. If you want to include the root element, check + its tag name directly before even calling this function. + + Example usage:: + + strip_elements(some_element, + 'simpletagname', # non-namespaced tag + '{http://some/ns}tagname', # namespaced tag + '{http://some/other/ns}*' # any tag from a namespace + Comment # comments + ) + """ + cdef xmlNode* c_node + cdef xmlNode* c_child + cdef xmlNode* c_next + cdef char* c_href + cdef char* c_name + cdef _Element element + cdef _Document doc + cdef list ns_tags + cdef bint strip_comments, strip_pis, strip_entities + + doc = _documentOrRaise(tree_or_element) + element = _rootNodeOrRaise(tree_or_element) + if not tag_names: return + + ns_tags = _filterSpecialTagNames( + tag_names, &strip_comments, &strip_pis, &strip_entities) + + c_node = element._c_node + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + if c_node.type == tree.XML_ELEMENT_NODE: + # we run through the children here to prevent any problems + # with the tree iteration which would occur if we unlinked the + # c_node itself + c_child = c_node.children + while c_child is not NULL: + if c_child.type == tree.XML_ELEMENT_NODE: + for ns, tag in ns_tags: + if ns is None: + # _tagMatches() considers NULL a wildcard + # match but we don't + if c_child.ns is not NULL and c_child.ns.href is not NULL: + continue + c_href = NULL + else: + c_href = _cstr(ns) + c_name = NULL if tag is None else _cstr(tag) + if _tagMatches(c_child, c_href, c_name): + c_next = c_child.next + if not with_tail: + tree.xmlUnlinkNode(c_child) + _removeNode(doc, c_child) + c_child = c_next + break + else: + c_child = c_child.next + elif strip_comments and c_child.type == tree.XML_COMMENT_NODE or \ + strip_pis and c_child.type == tree.XML_PI_NODE or \ + strip_entities and c_child.type == tree.XML_ENTITY_REF_NODE: + c_next = c_child.next + if with_tail: + _removeText(c_next) + tree.xmlUnlinkNode(c_child) + attemptDeallocation(c_child) + c_child = c_next + else: + c_child = c_child.next + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + +def strip_tags(tree_or_element, *tag_names): + u"""strip_tags(tree_or_element, *tag_names) + + Delete all elements with the provided tag names from a tree or + subtree. This will remove the elements and their attributes, but + *not* their text/tail content or descendants. Instead, it will + merge the text content and children of the element into its + parent. + + Note that this will not delete the element (or ElementTree root + element) that you passed even if it matches. It will only treat + its descendants. + + Example usage:: + + strip_tags(some_element, + 'simpletagname', # non-namespaced tag + '{http://some/ns}tagname', # namespaced tag + '{http://some/other/ns}*' # any tag from a namespace + Comment # comments (including their text!) + ) + """ + cdef xmlNode* c_node + cdef xmlNode* c_child + cdef xmlNode* c_next + cdef xmlNode* c_merge_child + cdef char* c_href + cdef char* c_name + cdef _Element element + cdef _Document doc + cdef list ns_tags + cdef bint strip_comments, strip_pis, strip_entities + + doc = _documentOrRaise(tree_or_element) + element = _rootNodeOrRaise(tree_or_element) + if not tag_names: return + + ns_tags = _filterSpecialTagNames( + tag_names, &strip_comments, &strip_pis, &strip_entities) + + c_node = element._c_node + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + if c_node.type == tree.XML_ELEMENT_NODE: + # we run through the children here to prevent any problems + # with the tree iteration which would occur if we unlinked the + # c_node itself + c_child = c_node.children + while c_child is not NULL: + if c_child.type == tree.XML_ELEMENT_NODE: + for ns, tag in ns_tags: + if ns is None: + # _tagMatches() considers NULL a wildcard + # match but we don't + if c_child.ns is not NULL and c_child.ns.href is not NULL: + continue + c_href = NULL + else: + c_href = _cstr(ns) + c_name = NULL if tag is None else _cstr(tag) + if _tagMatches(c_child, c_href, c_name): + # replace c_child by its children + if c_child.children is NULL: + c_next = c_child.next + tree.xmlUnlinkNode(c_child) + else: + c_next = c_child.children + # fix parent links of children + c_merge_child = c_child.children + while c_merge_child is not NULL: + c_merge_child.parent = c_node + c_merge_child = c_merge_child.next + + # fix sibling links to/from child slice + if c_child.prev is NULL: + c_node.children = c_child.children + else: + c_child.prev.next = c_child.children + c_child.children.prev = c_child.prev + if c_child.next is NULL: + c_node.last = c_child.last + else: + c_child.next.prev = c_child.last + c_child.last.next = c_child.next + + # unlink c_child + c_child.children = c_child.last = NULL + c_child.parent = c_child.next = c_child.prev = NULL + + if not attemptDeallocation(c_child): + if c_child.ns is not NULL: + # make namespaces absolute + moveNodeToDocument(doc, doc._c_doc, c_child) + c_child = c_next + break + else: + c_child = c_child.next + elif strip_comments and c_child.type == tree.XML_COMMENT_NODE or \ + strip_pis and c_child.type == tree.XML_PI_NODE or \ + strip_entities and c_child.type == tree.XML_ENTITY_REF_NODE: + c_next = c_child.next + tree.xmlUnlinkNode(c_child) + attemptDeallocation(c_child) + c_child = c_next + else: + c_child = c_child.next + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + + +# helper functions + +cdef list _sortedTagList(list l): + # This is required since the namespace may be None (which Py3 + # can't compare to strings). A bit of overhead, but at least + # portable ... + cdef list decorated_list + cdef tuple ns_tag + cdef Py_ssize_t i + decorated_list = [ (ns_tag[0] or '', ns_tag[1], i, ns_tag) + for i, ns_tag in enumerate(l) ] + decorated_list.sort() + return [ item[-1] for item in decorated_list ] + +cdef list _filterSpecialTagNames(tag_names, bint* comments, bint* pis, bint* entities): + cdef list ns_tags + comments[0] = 0 + pis[0] = 0 + entities[0] = 0 + + if Comment in tag_names: + comments[0] = 1 + tag_names = [ tag for tag in tag_names + if tag is not Comment ] + if ProcessingInstruction in tag_names: + pis[0] = 1 + tag_names = [ tag for tag in tag_names + if tag is not ProcessingInstruction ] + if Entity in tag_names: + entities[0] = 1 + tag_names = [ tag for tag in tag_names + if tag is not Entity ] + ns_tags = _sortedTagList([ _getNsTag(tag) for tag in tag_names ]) + return [ (ns, tag if tag != '*' else None) + for ns, tag in ns_tags ] Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Sat Jun 6 10:36:44 2009 @@ -2700,16 +2700,6 @@ except _TargetParserResult, result_container: return result_container.result -def cleanup_namespaces(tree_or_element): - u"""cleanup_namespaces(tree_or_element) - - Remove all namespace declarations from a subtree that are not used - by any of the elements in that tree. - """ - cdef _Element element - element = _rootNodeOrRaise(tree_or_element) - _removeUnusedNamespaceDeclarations(element._c_node) - ################################################################################ # Include submodules @@ -2725,6 +2715,7 @@ include "iterparse.pxi" # incremental XML parsing include "xmlid.pxi" # XMLID and IDDict include "xinclude.pxi" # XInclude +include "cleanup.pxi" # Cleanup and recursive element removal functions ################################################################################ Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Sat Jun 6 10:36:44 2009 @@ -205,6 +205,123 @@ self.assertRaises(TypeError, root.set, "newattr", 5) self.assertRaises(TypeError, root.set, "newattr", None) + def test_strip_attributes(self): + XML = self.etree.XML + xml = _bytes('') + + root = XML(xml) + self.etree.strip_attributes(root, 'a') + self.assertEquals(_bytes(''), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_attributes(root, 'b', 'c') + self.assertEquals(_bytes(''), + self._writeElement(root)) + + def test_strip_attributes_ns(self): + XML = self.etree.XML + xml = _bytes('') + + root = XML(xml) + self.etree.strip_attributes(root, 'a') + self.assertEquals( + _bytes(''), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_attributes(root, '{http://test/ns}a', 'c') + self.assertEquals( + _bytes(''), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_attributes(root, '{http://test/ns}*') + self.assertEquals( + _bytes(''), + self._writeElement(root)) + + def test_strip_elements(self): + XML = self.etree.XML + xml = _bytes('') + + root = XML(xml) + self.etree.strip_elements(root, 'a') + self.assertEquals(_bytes(''), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_elements(root, 'b', 'c', 'X', 'Y', 'Z') + self.assertEquals(_bytes(''), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_elements(root, 'c') + self.assertEquals(_bytes(''), + self._writeElement(root)) + + def test_strip_elements_ns(self): + XML = self.etree.XML + xml = _bytes('TESTABCBTATXABTCTATXT') + + root = XML(xml) + self.etree.strip_elements(root, 'a') + self.assertEquals(_bytes('TESTABCBTATXXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_elements(root, '{urn:a}b', 'c') + self.assertEquals(_bytes('TESTABCBTATXACTATXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_elements(root, '{urn:a}*', 'c') + self.assertEquals(_bytes('TESTXACTATXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_elements(root, '{urn:a}*', 'c', with_tail=False) + self.assertEquals(_bytes('TESTATXABTCTATXT'), + self._writeElement(root)) + + def test_strip_tags(self): + XML = self.etree.XML + xml = _bytes('TESTABCTBTATXABTCTATXT') + + root = XML(xml) + self.etree.strip_tags(root, 'a') + self.assertEquals(_bytes('TESTABCTBTATXABTCTATXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_tags(root, 'b', 'c', 'X', 'Y', 'Z') + self.assertEquals(_bytes('TESTABCTBTATXABTCTATXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_tags(root, 'c') + self.assertEquals(_bytes('TESTABCTBTATXABTCTATXT'), + self._writeElement(root)) + + def test_strip_tags_ns(self): + XML = self.etree.XML + xml = _bytes('TESTABCTBTATXABTCTATXT') + + root = XML(xml) + self.etree.strip_tags(root, 'a') + self.assertEquals(_bytes('TESTABCTBTATXABTCTATXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_tags(root, '{urn:a}b', 'c') + self.assertEquals(_bytes('TESTABCTBTATXABTCTATXT'), + self._writeElement(root)) + + root = XML(xml) + self.etree.strip_tags(root, '{urn:a}*', 'c') + self.assertEquals(_bytes('TESTABCTBTATXABTCTATXT'), + self._writeElement(root)) + def test_pi(self): # lxml.etree separates target and text Element = self.etree.Element From scoder at codespeak.net Sat Jun 6 17:07:25 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 6 Jun 2009 17:07:25 +0200 (CEST) Subject: [Lxml-checkins] r65623 - in lxml/trunk: . src/lxml Message-ID: <20090606150725.BB18716849E@codespeak.net> Author: scoder Date: Sat Jun 6 17:07:23 2009 New Revision: 65623 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/cleanup.pxi Log: r5138 at delle: sbehnel | 2009-06-06 17:04:06 +0200 make sure we fix up namespaces when replacing a node with namespace declarations by its children Modified: lxml/trunk/src/lxml/cleanup.pxi ============================================================================== --- lxml/trunk/src/lxml/cleanup.pxi (original) +++ lxml/trunk/src/lxml/cleanup.pxi Sat Jun 6 17:07:23 2009 @@ -214,6 +214,15 @@ c_merge_child.parent = c_node c_merge_child = c_merge_child.next + # fix namespace references of children if + # their parent's namespace declarations + # get lost + if c_child.nsDef is not NULL: + c_merge_child = c_child.children + while c_merge_child is not NULL: + moveNodeToDocument(doc, doc._c_doc, c_merge_child) + c_merge_child = c_merge_child.next + # fix sibling links to/from child slice if c_child.prev is NULL: c_node.children = c_child.children From scoder at codespeak.net Sat Jun 6 17:38:20 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 6 Jun 2009 17:38:20 +0200 (CEST) Subject: [Lxml-checkins] r65625 - in lxml/trunk: . src/lxml Message-ID: <20090606153820.4BDAB16849C@codespeak.net> Author: scoder Date: Sat Jun 6 17:38:17 2009 New Revision: 65625 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/cleanup.pxi Log: r5140 at delle: sbehnel | 2009-06-06 17:35:01 +0200 refactoring: replacing nodes by their children deserves a separate function Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Sat Jun 6 17:38:17 2009 @@ -389,6 +389,45 @@ cstd.free(c_ns_list) return 0 +cdef int _replaceNodeByChildren(_Document doc, xmlNode* c_node) except -1: + cdef xmlNode* c_parent + cdef xmlNode* c_child + if c_node.children is NULL: + tree.xmlUnlinkNode(c_node) + return 0 + + c_parent = c_node.parent + # fix parent links of children + c_child = c_node.children + while c_child is not NULL: + c_child.parent = c_parent + c_child = c_child.next + + # fix namespace references of children if their parent's namespace + # declarations get lost + if c_node.nsDef is not NULL: + c_child = c_node.children + while c_child is not NULL: + moveNodeToDocument(doc, doc._c_doc, c_child) + c_child = c_child.next + + # fix sibling links to/from child slice + if c_node.prev is NULL: + c_parent.children = c_node.children + else: + c_node.prev.next = c_node.children + c_node.children.prev = c_node.prev + if c_node.next is NULL: + c_parent.last = c_node.last + else: + c_node.next.prev = c_node.last + c_node.last.next = c_node.next + + # unlink c_node + c_node.children = c_node.last = NULL + c_node.parent = c_node.next = c_node.prev = NULL + return 0 + cdef object _attributeValue(xmlNode* c_element, xmlAttr* c_attrib_node): cdef char* value cdef char* href Modified: lxml/trunk/src/lxml/cleanup.pxi ============================================================================== --- lxml/trunk/src/lxml/cleanup.pxi (original) +++ lxml/trunk/src/lxml/cleanup.pxi Sat Jun 6 17:38:17 2009 @@ -167,7 +167,6 @@ cdef xmlNode* c_node cdef xmlNode* c_child cdef xmlNode* c_next - cdef xmlNode* c_merge_child cdef char* c_href cdef char* c_name cdef _Element element @@ -202,43 +201,11 @@ c_href = _cstr(ns) c_name = NULL if tag is None else _cstr(tag) if _tagMatches(c_child, c_href, c_name): - # replace c_child by its children - if c_child.children is NULL: - c_next = c_child.next - tree.xmlUnlinkNode(c_child) - else: + if c_child.children is not NULL: c_next = c_child.children - # fix parent links of children - c_merge_child = c_child.children - while c_merge_child is not NULL: - c_merge_child.parent = c_node - c_merge_child = c_merge_child.next - - # fix namespace references of children if - # their parent's namespace declarations - # get lost - if c_child.nsDef is not NULL: - c_merge_child = c_child.children - while c_merge_child is not NULL: - moveNodeToDocument(doc, doc._c_doc, c_merge_child) - c_merge_child = c_merge_child.next - - # fix sibling links to/from child slice - if c_child.prev is NULL: - c_node.children = c_child.children - else: - c_child.prev.next = c_child.children - c_child.children.prev = c_child.prev - if c_child.next is NULL: - c_node.last = c_child.last - else: - c_child.next.prev = c_child.last - c_child.last.next = c_child.next - - # unlink c_child - c_child.children = c_child.last = NULL - c_child.parent = c_child.next = c_child.prev = NULL - + else: + c_next = c_child.next + _replaceNodeByChildren(doc, c_child) if not attemptDeallocation(c_child): if c_child.ns is not NULL: # make namespaces absolute From scoder at codespeak.net Sat Jun 6 21:53:09 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 6 Jun 2009 21:53:09 +0200 (CEST) Subject: [Lxml-checkins] r65628 - in lxml/trunk: . src/lxml/html Message-ID: <20090606195309.EACDC168482@codespeak.net> Author: scoder Date: Sat Jun 6 21:53:07 2009 New Revision: 65628 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/clean.py Log: r5142 at delle: sbehnel | 2009-06-06 21:49:49 +0200 tiny change in lxml.html.clean to use new strip_attributes() function in lxml.etree Modified: lxml/trunk/src/lxml/html/clean.py ============================================================================== --- lxml/trunk/src/lxml/html/clean.py (original) +++ lxml/trunk/src/lxml/html/clean.py Sat Jun 6 21:53:07 2009 @@ -305,8 +305,7 @@ kill_tags.add(etree.ProcessingInstruction) if self.style: kill_tags.add('style') - for el in _find_styled_elements(doc): - del el.attrib['style'] + etree.strip_attributes(doc, 'style') if self.links: kill_tags.add('link') elif self.style or self.javascript: From scoder at codespeak.net Sat Jun 6 22:32:01 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 6 Jun 2009 22:32:01 +0200 (CEST) Subject: [Lxml-checkins] r65630 - in lxml/trunk: . src/lxml Message-ID: <20090606203201.23C71168453@codespeak.net> Author: scoder Date: Sat Jun 6 22:32:00 2009 New Revision: 65630 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/cleanup.pxi Log: r5144 at delle: sbehnel | 2009-06-06 22:28:48 +0200 avoid unnecessary loop iterations in strip_*() functions Modified: lxml/trunk/src/lxml/cleanup.pxi ============================================================================== --- lxml/trunk/src/lxml/cleanup.pxi (original) +++ lxml/trunk/src/lxml/cleanup.pxi Sat Jun 6 22:32:00 2009 @@ -107,7 +107,7 @@ # we run through the children here to prevent any problems # with the tree iteration which would occur if we unlinked the # c_node itself - c_child = c_node.children + c_child = _findChildForwards(c_node, 0) while c_child is not NULL: if c_child.type == tree.XML_ELEMENT_NODE: for ns, tag in ns_tags: @@ -121,25 +121,25 @@ c_href = _cstr(ns) c_name = NULL if tag is None else _cstr(tag) if _tagMatches(c_child, c_href, c_name): - c_next = c_child.next + c_next = _nextElement(c_child) if not with_tail: tree.xmlUnlinkNode(c_child) _removeNode(doc, c_child) c_child = c_next break else: - c_child = c_child.next + c_child = _nextElement(c_child) elif strip_comments and c_child.type == tree.XML_COMMENT_NODE or \ strip_pis and c_child.type == tree.XML_PI_NODE or \ strip_entities and c_child.type == tree.XML_ENTITY_REF_NODE: - c_next = c_child.next + c_next = _nextElement(c_child) if with_tail: _removeText(c_next) tree.xmlUnlinkNode(c_child) attemptDeallocation(c_child) c_child = c_next else: - c_child = c_child.next + c_child = _nextElement(c_child) tree.END_FOR_EACH_ELEMENT_FROM(c_node) def strip_tags(tree_or_element, *tag_names): @@ -187,7 +187,7 @@ # we run through the children here to prevent any problems # with the tree iteration which would occur if we unlinked the # c_node itself - c_child = c_node.children + c_child = _findChildForwards(c_node, 0) while c_child is not NULL: if c_child.type == tree.XML_ELEMENT_NODE: for ns, tag in ns_tags: @@ -202,9 +202,9 @@ c_name = NULL if tag is None else _cstr(tag) if _tagMatches(c_child, c_href, c_name): if c_child.children is not NULL: - c_next = c_child.children + c_next = _findChildForwards(c_child, 0) else: - c_next = c_child.next + c_next = _nextElement(c_child) _replaceNodeByChildren(doc, c_child) if not attemptDeallocation(c_child): if c_child.ns is not NULL: @@ -217,12 +217,12 @@ elif strip_comments and c_child.type == tree.XML_COMMENT_NODE or \ strip_pis and c_child.type == tree.XML_PI_NODE or \ strip_entities and c_child.type == tree.XML_ENTITY_REF_NODE: - c_next = c_child.next + c_next = _nextElement(c_child) tree.xmlUnlinkNode(c_child) attemptDeallocation(c_child) c_child = c_next else: - c_child = c_child.next + c_child = _nextElement(c_child) tree.END_FOR_EACH_ELEMENT_FROM(c_node) From lxml-checkins at codespeak.net Sun Jun 7 07:13:58 2009 From: lxml-checkins at codespeak.net (Angel) Date: Sun, 7 Jun 2009 07:13:58 +0200 (CEST) Subject: [Lxml-checkins] LAPD officer raped! Message-ID: <534619391019774.PWTDNNVGBMRCOEV@ABTS-KK-dynamic-092.118.172.122.airtelbroadband.in> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090607/599b7afa/attachment.htm From lxml-checkins at codespeak.net Mon Jun 8 11:17:21 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 8 Jun 2009 11:17:21 +0200 (CEST) Subject: [Lxml-checkins] Your iTunes Account #649975 Message-ID: <20090608091721.0B29D168411@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090608/5ac672f1/attachment.htm From jholg at codespeak.net Tue Jun 9 00:22:50 2009 From: jholg at codespeak.net (jholg at codespeak.net) Date: Tue, 9 Jun 2009 00:22:50 +0200 (CEST) Subject: [Lxml-checkins] r65680 - in lxml/trunk: doc src/lxml src/lxml/tests Message-ID: <20090608222250.6A27116840D@codespeak.net> Author: jholg Date: Tue Jun 9 00:22:49 2009 New Revision: 65680 Modified: lxml/trunk/doc/objectify.txt lxml/trunk/src/lxml/lxml.objectify.pyx lxml/trunk/src/lxml/tests/test_objectify.py Log: new keyword arg xsi_nil (default False) for deannotate(), to remove xsi:nil if needed, deannotate() now uses strip_attributes() implementation. Tests included, removed duplicate test_pytype_deannotate method. Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Tue Jun 9 00:22:49 2009 @@ -1039,6 +1039,43 @@ i = 5 [IntElement] s = 5 [IntElement] +You can control which type attributes should be de-annotated with the keyword +arguments 'pytype' (default: True) and 'xsi' (default: True). +``deannotate()`` can also remove 'xsi:nil' attributes by setting 'xsi_nil=True' +(default: False): + +.. sourcecode:: pycon + + >>> root = objectify.fromstring('''\ + ... + ... 5 + ... 5 + ... 5 + ... + ... ''') + >>> objectify.annotate(root) + >>> print(objectify.dump(root)) + root = None [ObjectifiedElement] + d = 5.0 [FloatElement] + * xsi:type = 'xsd:double' + * py:pytype = 'float' + i = 5 [IntElement] + * xsi:type = 'xsd:int' + * py:pytype = 'int' + s = '5' [StringElement] + * xsi:type = 'xsd:string' + * py:pytype = 'str' + n = None [NoneElement] + * xsi:nil = 'true' + * py:pytype = 'NoneType' + >>> objectify.deannotate(root, xsi_nil=True) + >>> print(objectify.dump(root)) + root = None [ObjectifiedElement] + d = 5 [IntElement] + i = 5 [IntElement] + s = 5 [IntElement] + n = u'' [StringElement] The DataElement factory ----------------------- Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Tue Jun 9 00:22:49 2009 @@ -1693,42 +1693,31 @@ tree.xmlSetNsProp(c_node, c_ns, "nil", "true") tree.END_FOR_EACH_ELEMENT_FROM(c_node) -def deannotate(element_or_tree, *, pytype=True, xsi=True): - u"""deannotate(element_or_tree, pytype=True, xsi=True) +cdef object _strip_attributes +_strip_attributes = etree.strip_attributes - Recursively de-annotate the elements of an XML tree by removing 'pytype' - and/or 'type' attributes. +def deannotate(element_or_tree, *, pytype=True, xsi=True, xsi_nil=False): + u"""deannotate(element_or_tree, pytype=True, xsi=True, xsi_nil=False) - If the 'pytype' keyword argument is True (the default), 'pytype' attributes - will be removed. If the 'xsi' keyword argument is True (the default), - 'xsi:type' attributes will be removed. + Recursively de-annotate the elements of an XML tree by removing 'py:pytype' + and/or 'xsi:type' attributes and/or 'xsi:nil' attributes. + + If the 'pytype' keyword argument is True (the default), 'py:pytype' + attributes will be removed. If the 'xsi' keyword argument is True (the + default), 'xsi:type' attributes will be removed. + If the 'xsi_nil' keyword argument is True (default: False), 'xsi:nil' + attributes will be removed. """ - cdef _Element element - cdef tree.xmlNode* c_node + cdef list attribute_names = [] - element = cetree.rootNodeOrRaise(element_or_tree) - c_node = element._c_node - if pytype and xsi: - tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) - if c_node.type == tree.XML_ELEMENT_NODE: - cetree.delAttributeFromNsName( - c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) - cetree.delAttributeFromNsName( - c_node, _XML_SCHEMA_INSTANCE_NS, "type") - tree.END_FOR_EACH_ELEMENT_FROM(c_node) - elif pytype: - tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) - if c_node.type == tree.XML_ELEMENT_NODE: - cetree.delAttributeFromNsName( - c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) - tree.END_FOR_EACH_ELEMENT_FROM(c_node) - elif xsi: - tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) - if c_node.type == tree.XML_ELEMENT_NODE: - cetree.delAttributeFromNsName( - c_node, _XML_SCHEMA_INSTANCE_NS, "type") - tree.END_FOR_EACH_ELEMENT_FROM(c_node) + if pytype: + attribute_names.append(PYTYPE_ATTRIBUTE) + if xsi: + attribute_names.append(XML_SCHEMA_INSTANCE_TYPE_ATTR) + if xsi_nil: + attribute_names.append(XML_SCHEMA_INSTANCE_NIL_ATTR) + _strip_attributes(element_or_tree, *attribute_names) ################################################################################ # Module level parser setup Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Tue Jun 9 00:22:49 2009 @@ -1664,7 +1664,7 @@ self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) - def test_pytype_deannotate(self): + def test_xsinil_deannotate(self): XML = self.XML root = XML(_bytes('''\ ''')) - objectify.xsiannotate(root) - objectify.deannotate(root, xsi=False) + objectify.annotate( + root, ignore_old=False, ignore_xsi=False, annotate_xsi=True, + empty_pytype='str', empty_type='string') + objectify.deannotate(root, pytype=False, xsi=False, xsi_nil=True) child_types = [ c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR) for c in root.iterchildren() ] - self.assertEquals("xsd:int", child_types[ 0]) + self.assertEquals("xsd:integer", child_types[ 0]) self.assertEquals("xsd:string", child_types[ 1]) self.assertEquals("xsd:double", child_types[ 2]) self.assertEquals("xsd:string", child_types[ 3]) self.assertEquals("xsd:boolean", child_types[ 4]) self.assertEquals(None, child_types[ 5]) - self.assertEquals(None, child_types[ 6]) - self.assertEquals("xsd:int", child_types[ 7]) - self.assertEquals("xsd:int", child_types[ 8]) - self.assertEquals("xsd:int", child_types[ 9]) + self.assertEquals("xsd:string", child_types[ 6]) + self.assertEquals("xsd:double", child_types[ 7]) + self.assertEquals("xsd:float", child_types[ 8]) + self.assertEquals("xsd:string", child_types[ 9]) self.assertEquals("xsd:string", child_types[10]) - self.assertEquals("xsd:double", child_types[11]) + self.assertEquals("xsd:double", child_types[11]) self.assertEquals("xsd:integer", child_types[12]) self.assertEquals(None, child_types[13]) - self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) + self.assertEquals(None, root.n.get(XML_SCHEMA_NIL_ATTR)) - for c in root.getiterator(): - self.assertEquals(None, c.get(objectify.PYTYPE_ATTRIBUTE)) + for c in root.iterchildren(): + self.assertNotEquals(None, c.get(objectify.PYTYPE_ATTRIBUTE)) + # these have no equivalent in xsi:type + if (c.get(objectify.PYTYPE_ATTRIBUTE) not in [TREE_PYTYPE, + "NoneType"]): + self.assertNotEquals( + None, c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR)) def test_xsitype_deannotate(self): XML = self.XML From lxml-checkins at codespeak.net Fri Jun 12 03:24:11 2009 From: lxml-checkins at codespeak.net (Hunter) Date: Fri, 12 Jun 2009 03:24:11 +0200 (CEST) Subject: [Lxml-checkins] I photoshopped your face Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090612/8880040a/attachment.htm From lxml-checkins at codespeak.net Sat Jun 13 15:02:00 2009 From: lxml-checkins at codespeak.net (Nila Avie) Date: Sat, 13 Jun 2009 15:02:00 +0200 (CEST) Subject: [Lxml-checkins] Come check boobs Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090613/b619abe2/attachment.htm From lxml-checkins at codespeak.net Sun Jun 14 03:04:48 2009 From: lxml-checkins at codespeak.net (Travillion Delinda) Date: Sun, 14 Jun 2009 03:04:48 +0200 (CEST) Subject: [Lxml-checkins] Information Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090614/4971c84e/attachment.htm From scoder at codespeak.net Sun Jun 14 12:36:59 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 14 Jun 2009 12:36:59 +0200 (CEST) Subject: [Lxml-checkins] r65771 - in lxml/trunk: . doc doc/html Message-ID: <20090614103659.1E789168430@codespeak.net> Author: scoder Date: Sun Jun 14 12:36:56 2009 New Revision: 65771 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/html/style.css lxml/trunk/doc/main.txt lxml/trunk/doc/mkhtml.py Log: r5148 at delle: sbehnel | 2009-06-14 12:29:20 +0200 front page update: better page title Modified: lxml/trunk/doc/html/style.css ============================================================================== --- lxml/trunk/doc/html/style.css (original) +++ lxml/trunk/doc/html/style.css Sun Jun 14 12:36:56 2009 @@ -214,10 +214,13 @@ font-style: italic; } -div.eyecatcher { +div.eyecatcher, p.eyecatcher { font-family: Times, "Times New Roman", serif; text-align: center; font-size: 140%; + line-height: 1.2em; + margin-left: 9em; + margin-right: 9em; } div.pagequote { Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sun Jun 14 12:36:56 2009 @@ -3,7 +3,7 @@ .. meta:: :description: lxml - the most feature-rich and easy-to-use library for working with XML and HTML in the Python language - :keywords: lxml, etree, objectify, Python, XML, HTML + :keywords: Python, XML, HTML, lxml, ElementTree, etree, objectify .. class:: pagequote @@ -12,10 +12,10 @@ .. class:: eyecatcher -| lxml is the most feature-rich -| and easy-to-use library -| for working with XML and HTML -| in the Python language. + lxml is the most feature-rich + and easy-to-use library + for working with XML and HTML + in the Python language. .. 1 Introduction Modified: lxml/trunk/doc/mkhtml.py ============================================================================== --- lxml/trunk/doc/mkhtml.py (original) +++ lxml/trunk/doc/mkhtml.py Sun Jun 14 12:36:56 2009 @@ -4,15 +4,16 @@ import os, shutil, re, sys, copy, time RST2HTML_OPTIONS = " ".join([ - "--no-toc-backlinks", - "--strip-comments", - "--language en", - "--date", + '--no-toc-backlinks', + '--strip-comments', + '--language en', + '--date', ]) htmlnsmap = {"h" : "http://www.w3.org/1999/xhtml"} find_title = XPath("/h:html/h:head/h:title/text()", namespaces=htmlnsmap) +find_title_tag = XPath("/h:html/h:head/h:title", namespaces=htmlnsmap) find_headings = XPath("//h:h1[not(@class)]//text()", namespaces=htmlnsmap) find_menu = XPath("//h:ul[@id=$name]", namespaces=htmlnsmap) find_page_end = XPath("/h:html/h:body/h:div[last()]", namespaces=htmlnsmap) @@ -130,6 +131,9 @@ # integrate menu for tree, basename, outpath in trees.itervalues(): new_tree = merge_menu(tree, menu, basename) + title = find_title_tag(new_tree) + if title and title[0].text == 'lxml': + title[0].text = "lxml - Processing XML and HTML with Python" new_tree.write(outpath) if __name__ == '__main__': From scoder at codespeak.net Sun Jun 14 12:37:01 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 14 Jun 2009 12:37:01 +0200 (CEST) Subject: [Lxml-checkins] r65772 - in lxml/trunk: . doc Message-ID: <20090614103701.39A88168430@codespeak.net> Author: scoder Date: Sun Jun 14 12:36:59 2009 New Revision: 65772 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r5149 at delle: sbehnel | 2009-06-14 12:31:52 +0200 front page update: keywords Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sun Jun 14 12:36:59 2009 @@ -3,7 +3,7 @@ .. meta:: :description: lxml - the most feature-rich and easy-to-use library for working with XML and HTML in the Python language - :keywords: Python, XML, HTML, lxml, ElementTree, etree, objectify + :keywords: Python, XML, HTML, lxml, ElementTree, etree, objectify, parsing, validation, XSLT .. class:: pagequote From scoder at codespeak.net Sun Jun 14 12:37:05 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 14 Jun 2009 12:37:05 +0200 (CEST) Subject: [Lxml-checkins] r65773 - in lxml/trunk: . doc Message-ID: <20090614103705.696E1168436@codespeak.net> Author: scoder Date: Sun Jun 14 12:37:04 2009 New Revision: 65773 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r5150 at delle: sbehnel | 2009-06-14 12:32:39 +0200 front page update: keywords Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sun Jun 14 12:37:04 2009 @@ -3,7 +3,7 @@ .. meta:: :description: lxml - the most feature-rich and easy-to-use library for working with XML and HTML in the Python language - :keywords: Python, XML, HTML, lxml, ElementTree, etree, objectify, parsing, validation, XSLT + :keywords: Python, XML, HTML, lxml, simple, ElementTree, etree, objectify, parsing, validation, XPath, XSLT .. class:: pagequote From lxml-checkins at codespeak.net Mon Jun 15 16:58:05 2009 From: lxml-checkins at codespeak.net (Maycock Brandy) Date: Mon, 15 Jun 2009 16:58:05 +0200 (CEST) Subject: [Lxml-checkins] Make your wang upright! Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090615/2a4219d1/attachment.htm From lxml-checkins at codespeak.net Tue Jun 16 17:53:06 2009 From: lxml-checkins at codespeak.net (Miesha Enetadu) Date: Tue, 16 Jun 2009 17:53:06 +05-30 Subject: [Lxml-checkins] C'mon, answer! Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090616/dbabdbac/attachment-0001.htm From lxml-checkins at codespeak.net Tue Jun 16 15:57:26 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 16 Jun 2009 15:57:26 +0200 (CEST) Subject: [Lxml-checkins] Pre-register info #698600 Message-ID: <20090616135726.51D62168466@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090616/b55f9384/attachment.htm From scoder at codespeak.net Tue Jun 16 20:47:23 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 16 Jun 2009 20:47:23 +0200 (CEST) Subject: [Lxml-checkins] r65794 - in lxml/trunk: . doc Message-ID: <20090616184723.AEC1E1684A9@codespeak.net> Author: scoder Date: Tue Jun 16 20:47:21 2009 New Revision: 65794 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r5154 at delle: sbehnel | 2009-06-14 14:25:07 +0200 doc fix Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Tue Jun 16 20:47:21 2009 @@ -28,7 +28,7 @@ 1.9 How can I map an XML tree into a dict of dicts? 2 Installation 2.1 Which version of libxml2 and libxslt should I use or require? - 2.2 Where are the Windows binaries? + 2.2 Where are the binary builds? 2.3 Why do I get errors about missing UCS4 symbols when installing lxml? 3 Contributing 3.1 Why is lxml not written in Python? From scoder at codespeak.net Tue Jun 16 20:47:26 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 16 Jun 2009 20:47:26 +0200 (CEST) Subject: [Lxml-checkins] r65795 - in lxml/trunk: . doc Message-ID: <20090616184726.27BA61684A9@codespeak.net> Author: scoder Date: Tue Jun 16 20:47:24 2009 New Revision: 65795 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r5155 at delle: sbehnel | 2009-06-16 20:42:29 +0200 FAQ update Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Tue Jun 16 20:47:24 2009 @@ -485,6 +485,12 @@ or an idea how to make it more readable and accessible while you are reading it, please send a comment to the `mailing list`_. +* enhance the web site. We put some work into making the web site + usable, understandable and also easy to find, but there's always + things that can be done better. You may notice that we are not + top-ranked when searching the web for "Python and XML", so maybe you + have an idea how to improve that. + * help with the tutorial. A tutorial is the most important stating point for new users, so it is important for us to provide an easy to understand guide into lxml. As allo documentation, the tutorial is work in progress, so we From scoder at codespeak.net Tue Jun 16 20:47:31 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 16 Jun 2009 20:47:31 +0200 (CEST) Subject: [Lxml-checkins] r65796 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20090616184731.C329716854F@codespeak.net> Author: scoder Date: Tue Jun 16 20:47:30 2009 New Revision: 65796 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/parsertarget.pxi lxml/trunk/src/lxml/saxparser.pxi lxml/trunk/src/lxml/tests/test_elementtree.py Log: r5156 at delle: sbehnel | 2009-06-16 20:43:55 +0200 raising an exception from a parser target callback didn't always terminate the parser Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Jun 16 20:47:30 2009 @@ -15,6 +15,9 @@ Bugs fixed ---------- +* Raising an exception from a parser target callback didn't always + terminate the parser. + Other changes ------------- Modified: lxml/trunk/src/lxml/parsertarget.pxi ============================================================================== --- lxml/trunk/src/lxml/parsertarget.pxi (original) +++ lxml/trunk/src/lxml/parsertarget.pxi Tue Jun 16 20:47:30 2009 @@ -108,13 +108,23 @@ context._setTarget(self._python_target) return context + cdef void _cleanupTargetParserContext(self, xmlDoc* result): + if self._c_ctxt.myDoc is not NULL: + if self._c_ctxt.myDoc is not result and \ + self._c_ctxt.myDoc._private is NULL: + # no _Document proxy => orphen + tree.xmlFreeDoc(self._c_ctxt.myDoc) + self._c_ctxt.myDoc = NULL + cdef object _handleParseResult(self, _BaseParser parser, xmlDoc* result, filename): cdef bint recover recover = parser._parse_options & xmlparser.XML_PARSE_RECOVER + if self._has_raised(): + self._cleanupTargetParserContext(result) + self._raise_if_stored() if not self._c_ctxt.wellFormed and not recover: _raiseParseError(self._c_ctxt, filename, self._error_log) - self._raise_if_stored() return self._python_target.close() cdef xmlDoc* _handleParseResultDoc(self, _BaseParser parser, @@ -124,13 +134,8 @@ if result is not NULL and result._private is NULL: # no _Document proxy => orphen tree.xmlFreeDoc(result) - if self._c_ctxt.myDoc is not NULL: - if self._c_ctxt.myDoc is not result and \ - self._c_ctxt.myDoc._private is NULL: - # no _Document proxy => orphen - tree.xmlFreeDoc(self._c_ctxt.myDoc) - self._c_ctxt.myDoc = NULL + self._cleanupTargetParserContext(result) + self._raise_if_stored() if not self._c_ctxt.wellFormed and not recover: _raiseParseError(self._c_ctxt, filename, self._error_log) - self._raise_if_stored() raise _TargetParserResult(self._python_target.close()) Modified: lxml/trunk/src/lxml/saxparser.pxi ============================================================================== --- lxml/trunk/src/lxml/saxparser.pxi (original) +++ lxml/trunk/src/lxml/saxparser.pxi Tue Jun 16 20:47:30 2009 @@ -108,10 +108,12 @@ c_ctxt.replaceEntities = 1 cdef void _handleSaxException(self, xmlparser.xmlParserCtxt* c_ctxt): - self._store_raised() if c_ctxt.errNo == xmlerror.XML_ERR_OK: c_ctxt.errNo = xmlerror.XML_ERR_INTERNAL_ERROR + # stop parsing immediately + c_ctxt.wellFormed = 0 c_ctxt.disableSAX = 1 + self._store_raised() cdef void _handleSaxStart(void* ctxt, char* c_localname, char* c_prefix, char* c_namespace, int c_nb_namespaces, @@ -246,7 +248,7 @@ cdef _SaxParserContext context cdef xmlparser.xmlParserCtxt* c_ctxt c_ctxt = ctxt - if c_ctxt._private is NULL: + if c_ctxt._private is NULL or c_ctxt.disableSAX: return context = <_SaxParserContext>c_ctxt._private if context._origSaxData is not NULL: @@ -261,7 +263,7 @@ cdef _SaxParserContext context cdef xmlparser.xmlParserCtxt* c_ctxt c_ctxt = ctxt - if c_ctxt._private is NULL: + if c_ctxt._private is NULL or c_ctxt.disableSAX: return context = <_SaxParserContext>c_ctxt._private if context._origSaxCData is not NULL: @@ -277,7 +279,7 @@ cdef _SaxParserContext context cdef xmlparser.xmlParserCtxt* c_ctxt c_ctxt = ctxt - if c_ctxt._private is NULL: + if c_ctxt._private is NULL or c_ctxt.disableSAX: return context = <_SaxParserContext>c_ctxt._private if context._origSaxDoctype is not NULL: Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Tue Jun 16 20:47:30 2009 @@ -3445,6 +3445,62 @@ self.assertRaises(self.etree.ParseError, feed) + def test_parser_target_feed_exception(self): + events = [] + class Target(object): + def start(self, tag, attrib): + events.append("start-" + tag) + def end(self, tag): + events.append("end-" + tag) + if tag == 'a': + raise ValueError("dead and gone") + def data(self, data): + events.append("data-" + data) + def close(self): + events.append("close") + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + try: + parser.feed(_bytes('AcaB')) + done = parser.close() + self.fail("error expected, but parsing succeeded") + except ValueError: + done = 'value error received as expected' + + self.assertEquals(["start-root", "data-A", "start-a", + "data-ca", "end-a"], + events) + + def test_parser_target_fromstring_exception(self): + events = [] + class Target(object): + def start(self, tag, attrib): + events.append("start-" + tag) + def end(self, tag): + events.append("end-" + tag) + if tag == 'a': + raise ValueError("dead and gone") + def data(self, data): + events.append("data-" + data) + def close(self): + events.append("close") + return "DONE" + + parser = self.etree.XMLParser(target=Target()) + + try: + done = self.etree.fromstring(_bytes('AcaB'), + parser=parser) + self.fail("error expected, but parsing succeeded") + except ValueError: + done = 'value error received as expected' + + self.assertEquals(["start-root", "data-A", "start-a", + "data-ca", "end-a"], + events) + def test_treebuilder(self): builder = self.etree.TreeBuilder() el = builder.start("root", {'a':'A', 'b':'B'}) From lxml-checkins at codespeak.net Wed Jun 17 13:51:06 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Wed, 17 Jun 2009 13:51:06 +0200 (CEST) Subject: [Lxml-checkins] Invitation: 06 June Message-ID: <20090617115106.82EC1169E9F@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090617/51d0794f/attachment.htm From jholg at codespeak.net Wed Jun 17 17:39:10 2009 From: jholg at codespeak.net (jholg at codespeak.net) Date: Wed, 17 Jun 2009 17:39:10 +0200 (CEST) Subject: [Lxml-checkins] r65803 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20090617153910.858B6168569@codespeak.net> Author: jholg Date: Wed Jun 17 17:39:09 2009 New Revision: 65803 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.objectify.pyx lxml/trunk/src/lxml/tests/test_objectify.py Log: Accept only true, false, 1, 0 as boolean values from an XML doc. Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Jun 17 17:39:09 2009 @@ -18,6 +18,10 @@ * Raising an exception from a parser target callback didn't always terminate the parser. +* Only {true, false, 1, 0} are accepted as the lexical representation for + BoolElement ({True, False, T, F, t, f} not any more), restoring lxml <= 2.0 + behaviour. + Other changes ------------- Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Wed Jun 17 17:39:09 2009 @@ -872,18 +872,15 @@ return value cdef inline int __parseBoolAsInt(text): - cdef char* c_str - if text == u'0': + if text == u'false': return 0 - elif text == u'1': + elif text == u'true': return 1 - text = text.lower() - if text == u'f' or text == u'false': + elif text == u'0': return 0 - elif text == u't' or text == u'true': + elif text == u'1': return 1 - else: - return -1 + return -1 cdef inline object _parseNumber(NumberElement element): return element._parse_value(textOf(element._c_node)) Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Wed Jun 17 17:39:09 2009 @@ -2411,6 +2411,44 @@ root.get('{http://www.w3.org/XML/1998/namespace}base'), "https://secret/url") + def test_standard_lookup(self): + XML = self.XML + + xml = _bytes('''\ + + 5 + -5 + 4294967296 + -4294967296 + 1.1 + true + false + Strange things happen, where strings collide + True + False + t + f + + None + + + ''') + root = XML(xml) + + for i in root.i: + self.assert_(isinstance(i, objectify.IntElement)) + for l in root.l: + self.assert_(isinstance(l, objectify.IntElement)) + for f in root.f: + self.assert_(isinstance(f, objectify.FloatElement)) + for b in root.b: + self.assert_(isinstance(b, objectify.BoolElement)) + self.assertEquals(True, root.b[0]) + self.assertEquals(False, root.b[1]) + for s in root.s: + self.assert_(isinstance(s, objectify.StringElement)) + self.assert_(isinstance(root.n, objectify.NoneElement)) + self.assertEquals(None, root.n) def test_suite(): suite = unittest.TestSuite() From lxml-checkins at codespeak.net Wed Jun 17 20:39:55 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Wed, 17 Jun 2009 20:39:55 +0200 (CEST) Subject: [Lxml-checkins] Invitation: 06 June Message-ID: <20090617183955.6E2F7169DFE@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090617/aafa57eb/attachment-0001.htm From lxml-checkins at codespeak.net Thu Jun 18 16:32:58 2009 From: lxml-checkins at codespeak.net (Mattie Nixyvj) Date: Thu, 18 Jun 2009 16:32:58 +0200 Subject: [Lxml-checkins] Did you see my keys? Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090618/8709c255/attachment.htm From lxml-checkins at codespeak.net Fri Jun 19 04:06:16 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 19 Jun 2009 04:06:16 +0200 (CEST) Subject: [Lxml-checkins] Your iTunes Account #840764 Message-ID: <20090619020616.EB911169E81@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090619/ef47aebf/attachment.htm From lxml-checkins at codespeak.net Sat Jun 20 22:22:08 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Sat, 20 Jun 2009 15:22:08 -0500 Subject: [Lxml-checkins] Get the worlds # 1 food Acai Berry in your diet. Message-ID: <000d01c9f1dc$6727f1e0$6400a8c0@advisabilityxo732> Acai Slim is a life changing experience.   Step into this site ? Studies show that the acai berry may fight cancer, increase energy, aid in weight loss and more clck here ? Click doubtless ? ? best ragards Early -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090620/cafcb7fc/attachment.htm From scoder at codespeak.net Sat Jun 20 21:57:05 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 20 Jun 2009 21:57:05 +0200 (CEST) Subject: [Lxml-checkins] r65839 - lxml/trunk Message-ID: <20090620195705.244EA1684BF@codespeak.net> Author: scoder Date: Sat Jun 20 21:57:04 2009 New Revision: 65839 Modified: lxml/trunk/ (props changed) lxml/trunk/IDEAS.txt Log: r5162 at delle: sbehnel | 2009-06-20 17:59:17 +0200 idea: embed lzma C code and use it for in-memory compression Modified: lxml/trunk/IDEAS.txt ============================================================================== --- lxml/trunk/IDEAS.txt (original) +++ lxml/trunk/IDEAS.txt Sat Jun 20 21:57:04 2009 @@ -4,7 +4,17 @@ * zlib-based parsing/serialising of compressed in-memory data * requires a libxml2 I/O OutputBuffer with appropriate I/O functions - that handle a zlib buffer + that call into the lzma compression routines + +* lzma-based parsing/serialising of compressed in-memory data + + * requires a libxml2 I/O OutputBuffer with appropriate I/O functions + that call into the lzma compression routines + + * advantage over zlib: probably faster and better compression + + * maybe embed the lzma C sources in the distro + http://www.7-zip.org/sdk.html * generating XML using the ``with`` statement From scoder at codespeak.net Sat Jun 20 21:57:08 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 20 Jun 2009 21:57:08 +0200 (CEST) Subject: [Lxml-checkins] r65840 - in lxml/trunk: . src/lxml Message-ID: <20090620195708.91CE91684AA@codespeak.net> Author: scoder Date: Sat Jun 20 21:57:07 2009 New Revision: 65840 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi Log: r5163 at delle: sbehnel | 2009-06-20 21:45:07 +0200 code comment Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Sat Jun 20 21:57:07 2009 @@ -1026,6 +1026,10 @@ # R->L, remember right neighbour c_orig_neighbour = _nextElement(c_node) + # We remove the original slice elements one by one. Since we hold + # a Python reference to all elements that we will insert, it is + # safe to let _removeNode() try (and fail) to free them even if + # the element itself or one of its descendents will be reinserted. c = 0 c_next = c_node while c_node is not NULL and c < slicelength: From scoder at codespeak.net Sat Jun 20 21:57:12 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 20 Jun 2009 21:57:12 +0200 (CEST) Subject: [Lxml-checkins] r65841 - in lxml/trunk: . src/lxml Message-ID: <20090620195712.90F8B169E1F@codespeak.net> Author: scoder Date: Sat Jun 20 21:57:11 2009 New Revision: 65841 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.objectify.pyx Log: r5164 at delle: sbehnel | 2009-06-20 21:46:45 +0200 provide __all__ in lxml.objectify to prevent overly broad star imports Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Sat Jun 20 21:57:11 2009 @@ -11,6 +11,16 @@ cimport tree cimport cstd +__all__ = [u'BoolElement', u'DataElement', u'E', u'Element', u'ElementMaker', + u'FloatElement', u'IntElement', u'LongElement', u'NoneElement', + u'NumberElement', u'ObjectPath', u'ObjectifiedDataElement', + u'ObjectifiedElement', u'ObjectifyElementClassLookup', + u'PYTYPE_ATTRIBUTE', u'PyType', u'StringElement', u'XML', + u'annotate', u'deannotate', u'dump', u'enable_recursive_str', + u'fromstring', u'getRegisteredTypes', u'makeparser', u'parse', + u'pyannotate', u'pytypename', u'set_default_parser', + u'set_pytype_attribute_tag', u'xsiannotate'] + cdef object etree from lxml import etree # initialize C-API of lxml.etree From scoder at codespeak.net Sat Jun 20 21:57:17 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 20 Jun 2009 21:57:17 +0200 (CEST) Subject: [Lxml-checkins] r65842 - in lxml/trunk: . src/lxml Message-ID: <20090620195717.ED00E1684BF@codespeak.net> Author: scoder Date: Sat Jun 20 21:57:17 2009 New Revision: 65842 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/proxy.pxi Log: r5165 at delle: sbehnel | 2009-06-20 21:53:27 +0200 fix bug #389611: namespace cleanup must define them on the top-most subtree node, not on the node that needs the namespace Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Jun 20 21:57:17 2009 @@ -15,6 +15,11 @@ Bugs fixed ---------- +* Namespace cleanup on subtree insertions could result in missing + namespace declarations (and potentially crashes) if the element + defining a namespace was deleted and the namespace was not used by + the top element of the inserted subtree but only in deeper subtrees. + * Raising an exception from a parser target callback didn't always terminate the parser. Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Sat Jun 20 21:57:17 2009 @@ -348,7 +348,7 @@ else: # not in cache => find a replacement from this document c_ns = doc._findOrBuildNodeNs( - c_element, c_node.ns.href, c_node.ns.prefix) + c_start_node, c_node.ns.href, c_node.ns.prefix) _appendToNsCache(&c_ns_cache, c_node.ns, c_ns) c_node.ns = c_ns From scoder at codespeak.net Sat Jun 20 22:09:42 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 20 Jun 2009 22:09:42 +0200 (CEST) Subject: [Lxml-checkins] r65843 - in lxml/trunk: . src/lxml/tests Message-ID: <20090620200942.70012168469@codespeak.net> Author: scoder Date: Sat Jun 20 22:09:42 2009 New Revision: 65843 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_elementtree.py Log: r5170 at delle: sbehnel | 2009-06-20 22:06:15 +0200 test case for bug #389611 Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Sat Jun 20 22:09:42 2009 @@ -1368,6 +1368,20 @@ self.assertEquals("B2", b.tail) self.assertEquals("C2", c.tail) + def test_merge_namespaced_subtree_as_slice(self): + XML = self.etree.XML + root = XML(_bytes( + '')) + root[:] = root.findall('.//puh') # delete bar from hierarchy + + # previously, this lost a namespace declaration on bump2 + result = self.etree.tostring(root) + foo = self.etree.fromstring(result) + + self.assertEquals('puh', foo[0].tag) + self.assertEquals('{http://huhu}bump1', foo[0][0].tag) + self.assertEquals('{http://huhu}bump2', foo[0][1].tag) + def test_delitem_tail(self): ElementTree = self.etree.ElementTree f = BytesIO('B2C2') From scoder at codespeak.net Sun Jun 21 09:36:45 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 21 Jun 2009 09:36:45 +0200 (CEST) Subject: [Lxml-checkins] r65844 - in lxml/trunk: . doc Message-ID: <20090621073645.13DCD169E86@codespeak.net> Author: scoder Date: Sun Jun 21 09:36:42 2009 New Revision: 65844 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt lxml/trunk/version.txt Log: r5172 at delle: sbehnel | 2009-06-21 09:33:17 +0200 prepare release of 2.2.2 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Jun 21 09:36:42 2009 @@ -2,7 +2,7 @@ lxml changelog ============== -Under development +2.2.2 (2009-06-21) ================== Features added Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sun Jun 21 09:36:42 2009 @@ -147,8 +147,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.2.1`_, released 2009-06-02 -(`changes for 2.2.1`_). `Older versions`_ are listed below. +The latest version is `lxml 2.2.2`_, released 2009-06-21 +(`changes for 2.2.2`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -221,7 +221,9 @@ `_ and the `current in-development version `_. -.. _`PDF documentation`: lxmldoc-2.2.1.pdf +.. _`PDF documentation`: lxmldoc-2.2.2.pdf + +* `lxml 2.1`_, released 2009-06-02 (`changes for 2.2.1`_) * `lxml 2.2`_, released 2009-03-21 (`changes for 2.2`_) Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Sun Jun 21 09:36:42 2009 @@ -1 +1 @@ -2.2.1 +2.2.2 From scoder at codespeak.net Sun Jun 21 09:37:42 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 21 Jun 2009 09:37:42 +0200 (CEST) Subject: [Lxml-checkins] r65845 - in lxml/trunk: . doc Message-ID: <20090621073742.AE6211684BD@codespeak.net> Author: scoder Date: Sun Jun 21 09:37:42 2009 New Revision: 65845 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r5174 at delle: sbehnel | 2009-06-21 09:34:20 +0200 prepare release of 2.2.2 Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sun Jun 21 09:37:42 2009 @@ -223,7 +223,7 @@ .. _`PDF documentation`: lxmldoc-2.2.2.pdf -* `lxml 2.1`_, released 2009-06-02 (`changes for 2.2.1`_) +* `lxml 2.2.1`_, released 2009-06-02 (`changes for 2.2.1`_) * `lxml 2.2`_, released 2009-03-21 (`changes for 2.2`_) @@ -323,6 +323,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.2.2`: lxml-2.2.2.tgz .. _`lxml 2.2.1`: lxml-2.2.1.tgz .. _`lxml 2.2`: lxml-2.2.tgz .. _`lxml 2.2beta4`: lxml-2.2beta4.tgz @@ -373,6 +374,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.2.2`: changes-2.2.2.html .. _`changes for 2.2.1`: changes-2.2.1.html .. _`changes for 2.2`: changes-2.2.html .. _`changes for 2.2beta4`: changes-2.2beta4.html From lxml-checkins at codespeak.net Sun Jun 21 10:21:04 2009 From: lxml-checkins at codespeak.net (Sheena Uvqa) Date: Sun, 21 Jun 2009 10:21:04 +0200 Subject: [Lxml-checkins] Message dated incorrectly Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090621/1439e13d/attachment-0001.htm From lxml-checkins at codespeak.net Sun Jun 21 18:27:01 2009 From: lxml-checkins at codespeak.net (Sophie Vowuup) Date: Sun, 21 Jun 2009 13:27:01 -0300 Subject: [Lxml-checkins] Last news on vacation Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090621/bf48221b/attachment.htm From lxml-checkins at codespeak.net Mon Jun 22 21:33:56 2009 From: lxml-checkins at codespeak.net (Reggie Uhosic) Date: Mon, 22 Jun 2009 21:33:56 +05-30 Subject: [Lxml-checkins] Contact administrator Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090622/b2e72701/attachment.htm From lxml-checkins at codespeak.net Wed Jun 24 00:27:21 2009 From: lxml-checkins at codespeak.net (Mauricio Wyemj) Date: Wed, 24 Jun 2009 00:27:21 +0200 Subject: [Lxml-checkins] Stop and read Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090624/6ee4417a/attachment.htm From lxml-checkins at codespeak.net Thu Jun 25 14:53:51 2009 From: lxml-checkins at codespeak.net (Kathryn Torounj) Date: Thu, 25 Jun 2009 14:53:51 +0200 Subject: [Lxml-checkins] Online cartoons Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090625/c73bcce4/attachment.htm From lxml-checkins at codespeak.net Thu Jun 25 20:15:11 2009 From: lxml-checkins at codespeak.net (Stella Nubol) Date: Thu, 25 Jun 2009 20:15:11 +0200 Subject: [Lxml-checkins] Course Syllabus Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090625/22f4225c/attachment.htm From lxml-checkins at codespeak.net Sat Jun 27 05:29:01 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Sat, 27 Jun 2009 05:29:01 +0200 (CEST) Subject: [Lxml-checkins] 0rder #292246 Message-ID: <20090627032901.4B8731684BF@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090627/a0b9a8b4/attachment.htm From lxml-checkins at codespeak.net Sat Jun 27 22:57:45 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Sat, 27 Jun 2009 22:57:45 +0200 (CEST) Subject: [Lxml-checkins] 0rder #848330 Message-ID: <20090627205745.D968616803A@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090627/38445a1d/attachment-0001.htm From scoder at codespeak.net Tue Jun 30 12:42:37 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 30 Jun 2009 12:42:37 +0200 (CEST) Subject: [Lxml-checkins] r66037 - in lxml/trunk: . doc Message-ID: <20090630104237.481E4169DB7@codespeak.net> Author: scoder Date: Tue Jun 30 12:42:35 2009 New Revision: 66037 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/capi.txt Log: r5176 at delle: sbehnel | 2009-06-30 12:39:02 +0200 doc fix Modified: lxml/trunk/doc/capi.txt ============================================================================== --- lxml/trunk/doc/capi.txt (original) +++ lxml/trunk/doc/capi.txt Tue Jun 30 12:42:35 2009 @@ -53,7 +53,7 @@ self.set("my_attribute", myval) etree.set_element_class_lookup( - DefaultElementClassLookup(element=NewElementClass)) + etree.DefaultElementClassLookup(element=NewElementClass)) Writing external modules in C From lxml-checkins at codespeak.net Tue Jun 30 15:33:52 2009 From: lxml-checkins at codespeak.net (Pfizer Inc 1927-2009.) Date: Tue, 30 Jun 2009 15:33:52 +0200 (CEST) Subject: [Lxml-checkins] Dear lxml-checkins@codespeak.net 30.6.2009 80% 0FF on Pfizer. Message-ID: <643601c9f997$b47ba5f0$28798e50@Fabian> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090630/718d3990/attachment.htm