From scoder at codespeak.net Thu Jan 3 18:22:39 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 3 Jan 2008 18:22:39 +0100 (CET) Subject: [Lxml-checkins] r50291 - in lxml/trunk: . doc Message-ID: <20080103172239.A8CD21684D3@codespeak.net> Author: scoder Date: Thu Jan 3 18:22:38 2008 New Revision: 50291 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3203 at delle: sbehnel | 2008-01-03 18:22:27 +0100 FAQ update Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Thu Jan 3 18:22:38 2008 @@ -42,6 +42,7 @@ 6.2 Why can't lxml parse my XML from unicode strings? 6.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ? 6.4 Why can't I just delete parents or clear the root node in iterparse()? + 6.5 How do I output null bytes in XML text? 7 XPath and Document Traversal 7.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)? 7.2 Why doesn't ``findall()`` support full XPath expressions? @@ -608,6 +609,15 @@ .. _`iterparse section`: api.html#iterparse-and-iterwalk +How do I output null bytes in XML text? +--------------------------------------- + +Don't. What you would produce is not well-formed XML. XML parsers +will refuse to parse a document that contains null bytes. The right +way to embed binary data in XML is using a text encoding such as +uuencode or base64. + + XPath and Document Traversal ============================ From scoder at codespeak.net Fri Jan 4 19:22:01 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 4 Jan 2008 19:22:01 +0100 (CET) Subject: [Lxml-checkins] r50334 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080104182201.E79E816843F@codespeak.net> Author: scoder Date: Fri Jan 4 19:22:01 2008 New Revision: 50334 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_etree.py Log: r3205 at delle: sbehnel | 2008-01-04 19:21:48 +0100 check entity/character references in Entity() factory Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 4 19:22:01 2008 @@ -8,6 +8,9 @@ Features added -------------- +* Invalid entity names and character references will now be rejected + by the ``Entity()`` factory. + * ``entity.text`` now returns the textual representation of the entity, e.g. ``&``. Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Jan 4 19:22:01 2008 @@ -1043,22 +1043,41 @@ c_name = c_name + 1 return 1 +cdef bint _characterReferenceIsValid(char* c_name): + cdef bint is_hex + if c_name[0] == c'x': + c_name += 1 + is_hex = 1 + else: + is_hex = 0 + if c_name[0] == c'\0': + return 0 + while c_name[0] != c'\0': + if c_name[0] < c'0' or c_name[0] > c'9': + if not is_hex: + return 0 + if not (c_name[0] >= c'a' and c_name[0] <= c'f'): + if not (c_name[0] >= c'A' and c_name[0] <= c'F'): + return 0 + c_name += 1 + return 1 + cdef int _tagValidOrRaise(tag_utf) except -1: if not _pyXmlNameIsValid(tag_utf): - raise ValueError, "Invalid tag name %r" % \ - python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict') + raise ValueError("Invalid tag name %r" % \ + python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict')) return 0 cdef int _htmlTagValidOrRaise(tag_utf) except -1: if not _pyHtmlNameIsValid(tag_utf): - raise ValueError, "Invalid HTML tag name %r" % \ - python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict') + raise ValueError("Invalid HTML tag name %r" % \ + python.PyUnicode_FromEncodedObject(tag_utf, 'UTF-8', 'strict')) return 0 cdef int _attributeValidOrRaise(name_utf) except -1: if not _pyXmlNameIsValid(name_utf): - raise ValueError, "Invalid attribute name %r" % \ - python.PyUnicode_FromEncodedObject(name_utf, 'UTF-8', 'strict') + raise ValueError("Invalid attribute name %r" % \ + python.PyUnicode_FromEncodedObject(name_utf, 'UTF-8', 'strict')) return 0 cdef object _namespacedName(xmlNode* c_node): Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Jan 4 19:22:01 2008 @@ -2110,18 +2110,26 @@ PI = ProcessingInstruction def Entity(name): - """Entity factory. This factory function creates a special element that - will be serialized as an XML entity. Note, however, that the entity will - not be automatically declared in the document. A document that uses - entities requires a DTD. + """Entity factory. This factory function creates a special element + that will be serialized as an XML entity reference or character + reference. Note, however, that entities will not be automatically + declared in the document. A document that uses entity references + requires a DTD to define the entities. """ cdef _Document doc cdef xmlNode* c_node cdef xmlDoc* c_doc - name = _utf8(name) + cdef char* c_name + name_utf = _utf8(name) + c_name = _cstr(name_utf) + if c_name[0] == c'#': + if not _characterReferenceIsValid(c_name + 1): + raise ValueError("Invalid character reference: '%s'" % name) + elif not _xmlNameIsValid(c_name): + raise ValueError("Invalid entity reference: '%s'" % name) c_doc = _newDoc() doc = _documentFactory(c_doc, None) - c_node = _createEntity(c_doc, _cstr(name)) + c_node = _createEntity(c_doc, c_name) tree.xmlAddChild(c_doc, c_node) return _elementFactory(doc, c_node) Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Fri Jan 4 19:22:01 2008 @@ -605,6 +605,21 @@ self.assertEquals('&test;', tostring(root)) + def test_entity_values(self): + Entity = self.etree.Entity + self.assertEquals(Entity("test").text, '&test;') + self.assertEquals(Entity("#17683").text, '䔓') + self.assertEquals(Entity("#x1768").text, 'ᝨ') + self.assertEquals(Entity("#x98AF").text, '颯') + + def test_entity_error(self): + Entity = self.etree.Entity + self.assertRaises(ValueError, Entity, 'a b c') + self.assertRaises(ValueError, Entity, 'a,b') + self.assertRaises(AssertionError, Entity, 'a\0b') + self.assertRaises(ValueError, Entity, '#abc') + self.assertRaises(ValueError, Entity, '#xxyz') + # TypeError in etree, AssertionError in ElementTree; def test_setitem_assert(self): Element = self.etree.Element From scoder at codespeak.net Wed Jan 9 19:31:24 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 9 Jan 2008 19:31:24 +0100 (CET) Subject: [Lxml-checkins] r50461 - in lxml/trunk: . src/lxml Message-ID: <20080109183124.8BBAF168510@codespeak.net> Author: scoder Date: Wed Jan 9 19:31:23 2008 New Revision: 50461 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xslt.pxi Log: r3207 at delle: sbehnel | 2008-01-05 21:57:09 +0100 factored XSLT.__copy__() into a separate C function Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Wed Jan 9 19:31:23 2008 @@ -341,24 +341,7 @@ return self.__copy__() def __copy__(self): - cdef XSLT new_xslt - cdef xmlDoc* c_doc - new_xslt = NEW_XSLT(XSLT) # without calling __init__() - new_xslt._access_control = self._access_control - new_xslt._error_log = _ErrorLog() - new_xslt._context = self._context._copy() - - new_xslt._xslt_resolver_context = self._xslt_resolver_context._copy() - new_xslt._xslt_resolver_context._c_style_doc = _copyDoc( - self._xslt_resolver_context._c_style_doc, 1) - - c_doc = _copyDoc(self._c_style.doc, 1) - new_xslt._c_style = xslt.xsltParseStylesheetDoc(c_doc) - if new_xslt._c_style is NULL: - tree.xmlFreeDoc(c_doc) - python.PyErr_NoMemory() - - return new_xslt + return _copyXSLT(self) def __call__(self, _input, *, profile_run=False, **_kw): cdef _XSLTContext context @@ -491,6 +474,26 @@ # macro call to 't->tp_new()' for instantiation without calling __init__() cdef XSLT NEW_XSLT "PY_NEW" (object t) +cdef XSLT _copyXSLT(XSLT stylesheet): + cdef XSLT new_xslt + cdef xmlDoc* c_doc + new_xslt = NEW_XSLT(XSLT) # without calling __init__() + new_xslt._access_control = stylesheet._access_control + new_xslt._error_log = _ErrorLog() + new_xslt._context = stylesheet._context._copy() + + new_xslt._xslt_resolver_context = stylesheet._xslt_resolver_context._copy() + new_xslt._xslt_resolver_context._c_style_doc = _copyDoc( + stylesheet._xslt_resolver_context._c_style_doc, 1) + + c_doc = _copyDoc(stylesheet._c_style.doc, 1) + new_xslt._c_style = xslt.xsltParseStylesheetDoc(c_doc) + if new_xslt._c_style is NULL: + tree.xmlFreeDoc(c_doc) + python.PyErr_NoMemory() + + return new_xslt + cdef class _XSLTResultTree(_ElementTree): cdef XSLT _xslt cdef _Document _profile From scoder at codespeak.net Wed Jan 9 19:31:29 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 9 Jan 2008 19:31:29 +0100 (CET) Subject: [Lxml-checkins] r50462 - in lxml/trunk: . src/lxml Message-ID: <20080109183129.C9FB5168511@codespeak.net> Author: scoder Date: Wed Jan 9 19:31:29 2008 New Revision: 50462 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/xslt.pxi Log: r3208 at delle: sbehnel | 2008-01-05 21:59:36 +0100 copy XSLT into current thread instead of raising a 'not usable' exception Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Jan 9 19:31:29 2008 @@ -8,6 +8,9 @@ Features added -------------- +* ``XSLT`` objects are now usable in any thread - at the cost of a + deep copy if they were not created in that thread. + * Invalid entity names and character references will now be rejected by the ``Entity()`` factory. Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Wed Jan 9 19:31:29 2008 @@ -132,8 +132,8 @@ """Check that c_dict is either the local thread dictionary or the global parent dictionary. """ - if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict: - return 1 # main thread + #if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict: + # return 1 # main thread if __GLOBAL_PARSER_CONTEXT._getThreadDict(NULL) is c_dict: return 1 # local thread dict return 0 Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Wed Jan 9 19:31:29 2008 @@ -356,7 +356,8 @@ cdef xmlDoc* c_doc if not _checkThreadDict(self._c_style.doc.dict): - raise RuntimeError, "stylesheet is not usable in this thread" + _kw['profile_run'] = profile_run + return _copyXSLT(self)(_input, **_kw) input_doc = _documentOrRaise(_input) root_node = _rootNodeOrRaise(_input) From scoder at codespeak.net Wed Jan 9 19:31:33 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 9 Jan 2008 19:31:33 +0100 (CET) Subject: [Lxml-checkins] r50463 - in lxml/trunk: . src/lxml Message-ID: <20080109183133.D1217168512@codespeak.net> Author: scoder Date: Wed Jan 9 19:31:33 2008 New Revision: 50463 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xslt.pxi Log: r3209 at delle: sbehnel | 2008-01-05 22:39:37 +0100 cleanup Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Wed Jan 9 19:31:33 2008 @@ -356,7 +356,8 @@ cdef xmlDoc* c_doc if not _checkThreadDict(self._c_style.doc.dict): - _kw['profile_run'] = profile_run + if profile_run is not False: + _kw['profile_run'] = profile_run return _copyXSLT(self)(_input, **_kw) input_doc = _documentOrRaise(_input) From scoder at codespeak.net Wed Jan 9 19:31:38 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 9 Jan 2008 19:31:38 +0100 (CET) Subject: [Lxml-checkins] r50464 - in lxml/trunk: . src/lxml Message-ID: <20080109183138.A69B7168511@codespeak.net> Author: scoder Date: Wed Jan 9 19:31:38 2008 New Revision: 50464 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/python.pxd Log: r3211 at delle: sbehnel | 2008-01-08 21:02:16 +0100 return a string subclass for XPath string results that points to the source Element Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Wed Jan 9 19:31:38 2008 @@ -522,12 +522,9 @@ # XSLT: can it leak when merging trees from multiple sources? c_node = tree.xmlDocCopyNode(c_node, doc._c_doc, 1) value = _elementFactory(doc, c_node) - elif c_node.type == tree.XML_TEXT_NODE: - value = funicode(c_node.content) - elif c_node.type == tree.XML_ATTRIBUTE_NODE: - s = tree.xmlNodeGetContent(c_node) - value = funicode(s) - tree.xmlFree(s) + elif c_node.type == tree.XML_TEXT_NODE or \ + c_node.type == tree.XML_ATTRIBUTE_NODE: + value = _newElementStringResult(c_node, doc) elif c_node.type == tree.XML_NAMESPACE_DECL: s = (c_node).href if s is NULL: @@ -561,6 +558,56 @@ xpath.xmlXPathFreeObject(xpathObj) ################################################################################ +# special str/unicode subclasses + +cdef class _ElementStringResult(python.unicode): + cdef _Element parent + cdef readonly object is_tail + cdef readonly object is_text + cdef readonly object is_attribute + + def getparent(self): + return self.parent + +cdef object _newElementStringResult(xmlNode* c_node, _Document doc): + cdef _ElementStringResult element_string + cdef xmlNode* c_element + cdef char* s + cdef bint is_attribute, is_tail + + if c_node.type == tree.XML_ATTRIBUTE_NODE: + is_attribute = 1 + is_tail = 0 + s = tree.xmlNodeGetContent(c_node) + value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) + tree.xmlFree(s) + c_element = NULL + else: + #assert c_node.type == tree.XML_TEXT_NODE, "invalid node type" + is_attribute = 0 + # tail text? + value = python.PyUnicode_DecodeUTF8( + c_node.content, cstd.strlen(c_node.content), NULL) + c_element = _previousElement(c_node) + is_tail = c_element is not NULL + + if c_element is NULL: + # non-tail text or attribute text + c_element = c_node.parent + while c_element is not NULL and not _isElement(c_element): + c_element = c_element.parent + + if c_element is NULL: + return value + + element_string = _ElementStringResult(value) + element_string.parent = _elementFactory(doc, c_element) + element_string.is_attribute = is_attribute + element_string.is_tail = is_tail + element_string.is_text = not (is_tail or is_attribute) + return element_string + +################################################################################ # callbacks for XPath/XSLT extension functions cdef void _extension_function_call(_BaseContext context, function, Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Wed Jan 9 19:31:38 2008 @@ -16,6 +16,9 @@ cdef object stop cdef object step + ctypedef class __builtin__.unicode [object PyUnicodeObject]: + pass + cdef FILE* PyFile_AsFile(object p) cdef int PyFile_Check(object p) cdef object PyFile_Name(object p) From scoder at codespeak.net Wed Jan 9 19:31:44 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 9 Jan 2008 19:31:44 +0100 (CET) Subject: [Lxml-checkins] r50465 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080109183144.11317168510@codespeak.net> Author: scoder Date: Wed Jan 9 19:31:43 2008 New Revision: 50465 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/proxy.pxi lxml/trunk/src/lxml/tests/test_xpathevaluator.py Log: r3212 at delle: sbehnel | 2008-01-09 19:31:08 +0100 cleanups and bug fixes: while a document is 'fake-rooted', we must take care that we do not propagate the fake-root into Python space but the original element instead Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Wed Jan 9 19:31:43 2008 @@ -515,16 +515,16 @@ for i from 0 <= i < xpathObj.nodesetval.nodeNr: c_node = xpathObj.nodesetval.nodeTab[i] if _isElement(c_node): - if c_node.doc != doc._c_doc: + if c_node.doc != doc._c_doc and c_node.doc._private is NULL: # XXX: works, but maybe not always the right thing to do? # XPath: only runs when extensions create or copy trees # -> we store Python refs to these, so that is OK # XSLT: can it leak when merging trees from multiple sources? c_node = tree.xmlDocCopyNode(c_node, doc._c_doc, 1) - value = _elementFactory(doc, c_node) + value = _fakeDocElementFactory(doc, c_node) elif c_node.type == tree.XML_TEXT_NODE or \ c_node.type == tree.XML_ATTRIBUTE_NODE: - value = _newElementStringResult(c_node, doc) + value = _newElementStringResult(doc, c_node) elif c_node.type == tree.XML_NAMESPACE_DECL: s = (c_node).href if s is NULL: @@ -569,7 +569,7 @@ def getparent(self): return self.parent -cdef object _newElementStringResult(xmlNode* c_node, _Document doc): +cdef object _newElementStringResult(_Document doc, xmlNode* c_node): cdef _ElementStringResult element_string cdef xmlNode* c_element cdef char* s @@ -579,8 +579,10 @@ is_attribute = 1 is_tail = 0 s = tree.xmlNodeGetContent(c_node) - value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) - tree.xmlFree(s) + try: + value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) + finally: + tree.xmlFree(s) c_element = NULL else: #assert c_node.type == tree.XML_TEXT_NODE, "invalid node type" @@ -601,7 +603,7 @@ return value element_string = _ElementStringResult(value) - element_string.parent = _elementFactory(doc, c_element) + element_string.parent = _fakeDocElementFactory(doc, c_element) element_string.is_attribute = is_attribute element_string.is_tail = is_tail element_string.is_text = not (is_tail or is_attribute) Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Wed Jan 9 19:31:43 2008 @@ -66,11 +66,10 @@ c_new_root = tree.xmlDocCopyNode(c_node, c_doc, 2) # non recursive! tree.xmlDocSetRootElement(c_doc, c_new_root) _copyParentNamespaces(c_node, c_new_root) - _copyParentNamespaces(c_node, c_root) c_new_root.children = c_node.children c_new_root.last = c_node.last - c_new_root.next = c_new_root.prev = c_new_root.parent = NULL + c_new_root.next = c_new_root.prev = NULL # store original node c_doc._private = c_node @@ -89,19 +88,35 @@ cdef xmlNode* c_child cdef xmlNode* c_parent cdef xmlNode* c_root - if c_doc != c_base_doc: - c_root = tree.xmlDocGetRootElement(c_doc) + if c_doc is c_base_doc: + return + c_root = tree.xmlDocGetRootElement(c_doc) + + # restore parent pointers of children + c_parent = c_doc._private + c_child = c_root.children + while c_child is not NULL: + c_child.parent = c_parent + c_child = c_child.next - # restore parent pointers of children - c_parent = c_doc._private - c_child = c_root.children - while c_child is not NULL: - c_child.parent = c_parent - c_child = c_child.next - - # prevent recursive removal of children - c_root.children = c_root.last = NULL - tree.xmlFreeDoc(c_doc) + # prevent recursive removal of children + c_root.children = c_root.last = NULL + tree.xmlFreeDoc(c_doc) + +cdef _Element _fakeDocElementFactory(_Document doc, xmlNode* c_element): + """Special element factory for cases where we need to create a fake + root document, but still need to instantiate arbitrary nodes from + it. If we instantiate the fake root node, things will turn bad + when it's destroyed. + + Instead, if we are asked to instantiate the fake root node, we + instantiate the original node instead. + """ + if c_element.doc is not doc._c_doc: + if c_element.doc._private is not NULL: + if c_element is c_element.doc.children: + c_element = c_element.doc._private + return _elementFactory(doc, c_element) ################################################################################ # support for freeing tree elements when proxy objects are destroyed Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Wed Jan 9 19:31:43 2008 @@ -72,6 +72,13 @@ self.assertEquals(['B'], tree.xpath('/a/@b')) + def test_xpath_list_attribute_parent(self): + tree = self.parse('') + results = tree.xpath('/a/@c') + self.assertEquals(1, len(results)) + self.assertEquals('C', results[0]) + self.assertEquals(tree.getroot().tag, results[0].getparent().tag) + def test_xpath_list_comment(self): tree = self.parse('') self.assertEquals([''], @@ -182,6 +189,21 @@ [root[0]], e.evaluate('c')) + def test_xpath_evaluator_tree_absolute(self): + tree = self.parse('') + child_tree = etree.ElementTree(tree.getroot()[0]) + e = etree.XPathEvaluator(child_tree) + self.assertEquals( + [], + e.evaluate('/a')) + root = child_tree.getroot() + self.assertEquals( + [root], + e.evaluate('/b')) + self.assertEquals( + [], + e.evaluate('/c')) + def test_xpath_evaluator_element(self): tree = self.parse('') root = tree.getroot() From scoder at codespeak.net Wed Jan 9 19:41:41 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 9 Jan 2008 19:41:41 +0100 (CET) Subject: [Lxml-checkins] r50466 - in lxml/trunk: . src/lxml Message-ID: <20080109184141.5F9F1168513@codespeak.net> Author: scoder Date: Wed Jan 9 19:41:40 2008 New Revision: 50466 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/extensions.pxi Log: r3218 at delle: sbehnel | 2008-01-09 19:41:29 +0100 disabled ElementStringResult patch until it works reliably Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Wed Jan 9 19:41:40 2008 @@ -599,8 +599,8 @@ while c_element is not NULL and not _isElement(c_element): c_element = c_element.parent - if c_element is NULL: - return value + #if c_element is NULL: + return value element_string = _ElementStringResult(value) element_string.parent = _fakeDocElementFactory(doc, c_element) From scoder at codespeak.net Fri Jan 11 09:50:51 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:50:51 +0100 (CET) Subject: [Lxml-checkins] r50504 - in lxml/trunk: . src/lxml Message-ID: <20080111085051.CC3C616850A@codespeak.net> Author: scoder Date: Fri Jan 11 09:50:50 2008 New Revision: 50504 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/etree_defs.h lxml/trunk/src/lxml/lxml.pyclasslookup.pyx lxml/trunk/src/lxml/python.pxd Log: r3220 at delle: sbehnel | 2008-01-10 00:00:38 +0100 removed unused stuff Modified: lxml/trunk/src/lxml/etree_defs.h ============================================================================== --- lxml/trunk/src/lxml/etree_defs.h (original) +++ lxml/trunk/src/lxml/etree_defs.h Fri Jan 11 09:50:50 2008 @@ -94,14 +94,7 @@ #endif /* Redefinition of some Python builtins as C functions */ -#define isinstance(o,c) PyObject_IsInstance(o,c) -#define issubclass(c,csuper) PyObject_IsSubclass(c,csuper) -#define hasattr(o,a) PyObject_HasAttr(o,a) -#define getattr(o,a) PyObject_GetAttr(o,a) #define callable(o) PyCallable_Check(o) -#define str(o) PyObject_Str(o) -#define repr(o) PyObject_Repr(o) -#define iter(o) PyObject_GetIter(o) #define _cstr(s) PyString_AS_STRING(s) #define _fqtypename(o) (((PyTypeObject*)o)->ob_type->tp_name) Modified: lxml/trunk/src/lxml/lxml.pyclasslookup.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.pyclasslookup.pyx (original) +++ lxml/trunk/src/lxml/lxml.pyclasslookup.pyx Fri Jan 11 09:50:50 2008 @@ -1,7 +1,6 @@ from etreepublic cimport _Document, _Element, ElementBase from etreepublic cimport ElementClassLookup, FallbackElementClassLookup from etreepublic cimport elementFactory, import_lxml__etree -from python cimport str, repr, isinstance, issubclass, iter from python cimport _cstr cimport etreepublic as cetree cimport python Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Jan 11 09:50:50 2008 @@ -116,14 +116,7 @@ cdef extern from "etree_defs.h": # redefines some functions as macros cdef int _isString(object obj) - cdef int isinstance(object instance, object classes) - cdef int issubclass(object derived, object superclasses) cdef char* _fqtypename(object t) - cdef int hasattr(object obj, object attr) - cdef object getattr(object obj, object attr) cdef int callable(object obj) - cdef object str(object obj) - cdef object repr(object obj) - cdef object iter(object obj) cdef char* _cstr(object s) cdef object PY_NEW(object t) From scoder at codespeak.net Fri Jan 11 09:50:55 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:50:55 +0100 (CET) Subject: [Lxml-checkins] r50505 - in lxml/trunk: . src/lxml Message-ID: <20080111085055.5D8551684F2@codespeak.net> Author: scoder Date: Fri Jan 11 09:50:54 2008 New Revision: 50505 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/proxy.pxi Log: r3221 at delle: sbehnel | 2008-01-10 00:01:35 +0100 keep an assert just in case Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri Jan 11 09:50:54 2008 @@ -116,6 +116,7 @@ if c_element.doc._private is not NULL: if c_element is c_element.doc.children: c_element = c_element.doc._private + #assert c_element.type == tree.XML_ELEMENT_NODE return _elementFactory(doc, c_element) ################################################################################ From scoder at codespeak.net Fri Jan 11 09:50:59 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:50:59 +0100 (CET) Subject: [Lxml-checkins] r50506 - in lxml/trunk: . src/lxml Message-ID: <20080111085059.BF49C16850A@codespeak.net> Author: scoder Date: Fri Jan 11 09:50:59 2008 New Revision: 50506 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/python.pxd Log: r3222 at delle: sbehnel | 2008-01-10 00:09:11 +0100 separate ElementStringResult implementations for str and unicode values, requires Cython > 0.9.6.10b Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Fri Jan 11 09:50:59 2008 @@ -560,7 +560,16 @@ ################################################################################ # special str/unicode subclasses -cdef class _ElementStringResult(python.unicode): +cdef class _ElementUnicodeResult(python.unicode): + cdef _Element parent + cdef readonly object is_tail + cdef readonly object is_text + cdef readonly object is_attribute + + def getparent(self): + return self.parent + +cdef class _ElementStringResult(python.str): cdef _Element parent cdef readonly object is_tail cdef readonly object is_text @@ -570,17 +579,22 @@ return self.parent cdef object _newElementStringResult(_Document doc, xmlNode* c_node): - cdef _ElementStringResult element_string + cdef _ElementUnicodeResult element_unicode + cdef _ElementStringResult element_str cdef xmlNode* c_element cdef char* s - cdef bint is_attribute, is_tail + cdef bint is_attribute, is_tail, is_utf8 if c_node.type == tree.XML_ATTRIBUTE_NODE: is_attribute = 1 is_tail = 0 s = tree.xmlNodeGetContent(c_node) + is_utf8 = isutf8(s) try: - value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) + if is_utf8: + value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) + else: + value = s finally: tree.xmlFree(s) c_element = NULL @@ -588,8 +602,12 @@ #assert c_node.type == tree.XML_TEXT_NODE, "invalid node type" is_attribute = 0 # tail text? - value = python.PyUnicode_DecodeUTF8( - c_node.content, cstd.strlen(c_node.content), NULL) + is_utf8 = isutf8(c_node.content) + if is_utf8: + value = python.PyUnicode_DecodeUTF8( + c_node.content, cstd.strlen(c_node.content), NULL) + else: + value = c_node.content c_element = _previousElement(c_node) is_tail = c_element is not NULL @@ -599,15 +617,23 @@ while c_element is not NULL and not _isElement(c_element): c_element = c_element.parent - #if c_element is NULL: - return value + if c_element is NULL: + return value - element_string = _ElementStringResult(value) - element_string.parent = _fakeDocElementFactory(doc, c_element) - element_string.is_attribute = is_attribute - element_string.is_tail = is_tail - element_string.is_text = not (is_tail or is_attribute) - return element_string + if is_utf8: + element_unicode = _ElementUnicodeResult(value) + element_unicode.parent = _fakeDocElementFactory(doc, c_element) + element_unicode.is_attribute = is_attribute + element_unicode.is_tail = is_tail + element_unicode.is_text = not (is_tail or is_attribute) + return element_unicode + else: + element_str = _ElementStringResult(value) + element_str.parent = _fakeDocElementFactory(doc, c_element) + element_str.is_attribute = is_attribute + element_str.is_tail = is_tail + element_str.is_text = not (is_tail or is_attribute) + return element_str ################################################################################ # callbacks for XPath/XSLT extension functions Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Jan 11 09:50:59 2008 @@ -19,6 +19,9 @@ ctypedef class __builtin__.unicode [object PyUnicodeObject]: pass + ctypedef class __builtin__.str [object PyStringObject]: + pass + cdef FILE* PyFile_AsFile(object p) cdef int PyFile_Check(object p) cdef object PyFile_Name(object p) From scoder at codespeak.net Fri Jan 11 09:51:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:51:03 +0100 (CET) Subject: [Lxml-checkins] r50507 - in lxml/trunk: . doc Message-ID: <20080111085103.47DCA168514@codespeak.net> Author: scoder Date: Fri Jan 11 09:51:02 2008 New Revision: 50507 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r3223 at delle: sbehnel | 2008-01-10 13:04:21 +0100 prepare release of 2.0beta1 Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Fri Jan 11 09:51:02 2008 @@ -138,8 +138,8 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 2.0alpha6`_, released 2007-12-19 -(`changes for 2.0alpha6`_). `Older versions`_ are listed below. +The latest version is `lxml 2.0beta1`_, released 2008-01-11 +(`changes for 2.0beta1`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions @@ -199,6 +199,8 @@ Old Versions ------------ +* `lxml 2.0alpha6`_, released 2007-12-19 (`changes for 2.0alpha6`_) + * `lxml 2.0alpha5`_, released 2007-11-24 (`changes for 2.0alpha5`_) * `lxml 2.0alpha4`_, released 2007-10-07 (`changes for 2.0alpha4`_) @@ -259,6 +261,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.0beta1`: lxml-2.0beta1.tgz .. _`lxml 2.0alpha6`: lxml-2.0alpha6.tgz .. _`lxml 2.0alpha5`: lxml-2.0alpha5.tgz .. _`lxml 2.0alpha4`: lxml-2.0alpha4.tgz @@ -290,6 +293,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.0beta1`: changes-2.0beta1.html .. _`changes for 2.0alpha6`: changes-2.0alpha6.html .. _`changes for 2.0alpha5`: changes-2.0alpha5.html .. _`changes for 2.0alpha4`: changes-2.0alpha4.html From scoder at codespeak.net Fri Jan 11 09:51:06 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:51:06 +0100 (CET) Subject: [Lxml-checkins] r50508 - in lxml/trunk: . doc Message-ID: <20080111085106.B5A61168514@codespeak.net> Author: scoder Date: Fri Jan 11 09:51:06 2008 New Revision: 50508 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/build.txt Log: r3224 at delle: sbehnel | 2008-01-10 13:04:54 +0100 require Cython 0.9.6.11 Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Fri Jan 11 09:51:06 2008 @@ -33,11 +33,11 @@ be an lxml developer, you do need a working Cython installation. You can use EasyInstall_ to install it:: - easy_install Cython==0.9.6.10 + easy_install Cython==0.9.6.11 .. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall -lxml currently requires at least Cython 0.9.6.10, but later versions +lxml currently requires at least Cython 0.9.6.11, but later versions should work. From scoder at codespeak.net Fri Jan 11 09:51:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:51:09 +0100 (CET) Subject: [Lxml-checkins] r50509 - lxml/trunk Message-ID: <20080111085109.09DC0168513@codespeak.net> Author: scoder Date: Fri Jan 11 09:51:09 2008 New Revision: 50509 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3225 at delle: sbehnel | 2008-01-10 13:05:13 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 11 09:51:09 2008 @@ -8,6 +8,10 @@ Features added -------------- +* XPath string results of the ``text()`` function and attribute + selection make their Element container accessible through a + ``getparent()`` method. + * ``XSLT`` objects are now usable in any thread - at the cost of a deep copy if they were not created in that thread. @@ -20,6 +24,11 @@ Bugs fixed ---------- +* XPath on ElementTrees could crash when selecting the virtual root + node of the ElementTree. + +* Compilation ``--without-threading`` was buggy in alpha5/6. + Other changes ------------- From scoder at codespeak.net Fri Jan 11 09:51:17 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:51:17 +0100 (CET) Subject: [Lxml-checkins] r50510 - in lxml/trunk: . doc src/lxml src/lxml/tests Message-ID: <20080111085117.F3CF016850A@codespeak.net> Author: scoder Date: Fri Jan 11 09:51:17 2008 New Revision: 50510 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/validation.txt lxml/trunk/src/lxml/iterparse.pxi lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/tests/common_imports.py lxml/trunk/src/lxml/tests/test_dtd.py lxml/trunk/src/lxml/tests/test_xmlschema.py lxml/trunk/src/lxml/xmlparser.pxd lxml/trunk/src/lxml/xmlschema.pxd lxml/trunk/src/lxml/xmlschema.pxi Log: r3226 at delle: sbehnel | 2008-01-10 20:28:46 +0100 on-the-fly XML schema validation in the parser Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 11 09:51:17 2008 @@ -8,6 +8,8 @@ Features added -------------- +* Parse-time XML schema validation (``schema`` parser keyword). + * XPath string results of the ``text()`` function and attribute selection make their Element container accessible through a ``getparent()`` method. Modified: lxml/trunk/doc/validation.txt ============================================================================== --- lxml/trunk/doc/validation.txt (original) +++ lxml/trunk/doc/validation.txt Fri Jan 11 09:51:17 2008 @@ -13,16 +13,17 @@ There is also initial support for Schematron_. However, it does not currently support error reporting in the validation phase due to insufficiencies in the -implementation as of libxml2 2.6.29. +implementation as of libxml2 2.6.30. .. _Schematron: http://www.ascc.net/xml/schematron .. contents:: .. - 1 DTD - 2 RelaxNG - 3 XMLSchema - 4 Schematron + 1 Validation at parse time + 2 DTD + 3 RelaxNG + 4 XMLSchema + 5 Schematron The usual setup procedure:: @@ -30,20 +31,59 @@ >>> from StringIO import StringIO +Validation at parse time +------------------------ + +The parser in lxml can do on-the-fly validation of a document against +a DTD or an XML schema. The DTD is retrieved automatically based on +the DOCTYPE of the parsed document. All you have to do is use a +parser that has DTD validation enabled:: + + >>> parser = etree.XMLParser(dtd_validation=True) + +Obviously, a request for validation enables the DTD loading feature. +There are two other options that enable loading the DTD, but that do +not perform any validation. The first is the ``load_dtd`` keyword +option, which simply loads the DTD into the parser and makes it +available to the document as external subset. You can retrieve the +DTD from the parsed document using the ``docinfo`` property of the +result ElementTree object. The internal subset is available as +``internalDTD``, the external subset is provided as ``externalDTD``. + +The third way way to activate DTD loading is with the +``attribute_defaults`` option, which loads the DTD and weaves +attribute default values into the document. Again, no validation is +performed unless explicitly requested. + +XML schema is supported in a similar way, but requires an explicit +schema to be provided:: + + >>> schema_root = etree.XML('''\ + ... + ... + ... + ... ''') + >>> schema = etree.XMLSchema(schema_root) + + >>> parser = etree.XMLParser(schema = schema) + >>> root = etree.fromstring("5", parser) + +If the validation fails (be it for a DTD or an XML schema), the parser +will raise an exception:: + + >>> root = etree.fromstring("not int", parser) + Traceback (most recent call last): + XMLSyntaxError: Element 'a': 'not int' is not a valid value of the atomic type 'xs:integer'. + + DTD --- -There are two places in lxml where DTDs are supported: parsers and the DTD -class. If you pass a keyword option to a parser that requires DTD loading, -lxml will automatically include the DTD in the parsing process. If you pass -the keyword for DTD validation, lxml (or rather libxml2) will use this DTD -right inside the parser and report failure or success when parsing terminates. - -The parser support for DTDs depends on internal or external subsets of the XML -file. This means that the XML file itself must either contain a DTD or must -reference a DTD to make this work. If you want to validate an XML document -against a DTD that is not referenced by the document itself, you can use the -``DTD`` class. +As described above, the parser support for DTDs depends on internal or +external subsets of the XML file. This means that the XML file itself +must either contain a DTD or must reference a DTD to make this work. +If you want to validate an XML document against a DTD that is not +referenced by the document itself, you can use the ``DTD`` class. To use the ``DTD`` class, you must first pass a filename or file-like object into the constructor to parse a DTD:: Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Fri Jan 11 09:51:17 2008 @@ -272,14 +272,15 @@ Other keyword arguments: * encoding - override the document encoding + * schema - an XMLSchema to validate against """ cdef object _source cdef readonly object root - def __init__(self, source, events=("end",), tag=None, + def __init__(self, source, events=("end",), *, tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, - html=False): + html=False, XMLSchema schema=None): cdef _IterparseContext context cdef char* c_encoding cdef int parse_options @@ -318,7 +319,7 @@ if remove_blank_text: parse_options = parse_options | xmlparser.XML_PARSE_NOBLANKS - _BaseParser.__init__(self, parse_options, html, + _BaseParser.__init__(self, parse_options, html, schema, remove_comments, remove_pis, None, filename, encoding) Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Fri Jan 11 09:51:17 2008 @@ -375,9 +375,13 @@ cdef class _ParserContext(_ResolverContext) cdef class _SaxParserContext(_ParserContext) cdef class _TargetParserContext(_SaxParserContext) +cdef class _ParserSchemaValidationContext +cdef class _Validator +cdef class XMLSchema(_Validator) cdef class _ParserContext(_ResolverContext): cdef _ErrorLog _error_log + cdef _ParserSchemaValidationContext _validator cdef xmlparser.xmlParserCtxt* _c_ctxt cdef python.PyThread_type_lock _lock @@ -390,6 +394,7 @@ cdef _ParserContext _copy(self): cdef _ParserContext context context = self.__class__() + context._validator = self._validator.copy() _initParserContext(context, self._resolvers._copy(), NULL) return context @@ -414,11 +419,15 @@ if result == 0: raise ParserError, "parser locking failed" self._error_log.connect() + if self._validator is not None: + self._validator.connect(self._c_ctxt) return 0 cdef int cleanup(self) except -1: self._resetParserContext() self.clear() + if self._validator is not None: + self._validator.disconnect() self._error_log.disconnect() if config.ENABLE_THREADING and self._lock is not NULL: python.PyThread_release_lock(self._lock) @@ -487,7 +496,10 @@ c_ctxt.myDoc = NULL if result is not NULL: - if recover or (c_ctxt.wellFormed and \ + if context._validator is not None and \ + not context._validator.isvalid(): + well_formed = 0 # actually not 'valid', but anyway ... + elif recover or (c_ctxt.wellFormed and \ c_ctxt.lastError.level < xmlerror.XML_ERR_ERROR): well_formed = 1 elif not c_ctxt.replaceEntities and not c_ctxt.validate \ @@ -535,16 +547,15 @@ cdef bint _for_html cdef bint _remove_comments cdef bint _remove_pis + cdef XMLSchema _schema cdef object _filename cdef object _target cdef object _default_encoding cdef int _default_encoding_int - def __init__(self, int parse_options, bint for_html, - remove_comments, remove_pis, - target, filename, encoding): + def __init__(self, int parse_options, bint for_html, XMLSchema schema, + remove_comments, remove_pis, target, filename, encoding): cdef int c_encoding - cdef xmlparser.xmlParserCtxt* pctxt if not isinstance(self, HTMLParser) and \ not isinstance(self, XMLParser) and \ not isinstance(self, iterparse): @@ -556,6 +567,7 @@ self._for_html = for_html self._remove_comments = remove_comments self._remove_pis = remove_pis + self._schema = schema self._resolvers = _ResolverRegistry() @@ -575,6 +587,9 @@ cdef xmlparser.xmlParserCtxt* pctxt if self._parser_context is None: self._parser_context = self._createContext(self._target) + if self._schema is not None: + self._parser_context._validator = \ + self._schema._newSaxValidator() pctxt = self._newParserCtxt() if pctxt is NULL: python.PyErr_NoMemory() @@ -591,6 +606,9 @@ cdef xmlparser.xmlParserCtxt* pctxt if self._push_parser_context is None: self._push_parser_context = self._createContext(self._target) + if self._schema is not None: + self._push_parser_context._validator = \ + self._schema._newSaxValidator() pctxt = self._newPushParserCtxt() if pctxt is NULL: python.PyErr_NoMemory() @@ -1439,6 +1457,7 @@ Other keyword arguments: * encoding - override the document encoding * target - a parser target object that will receive the parse events + * schema - an XMLSchema to validate against Note that you should avoid sharing parsers between threads. While this is not harmful, it is more efficient to use separate parsers. This does not @@ -1448,7 +1467,8 @@ load_dtd=False, no_network=True, ns_clean=False, recover=False, remove_blank_text=False, compact=True, resolve_entities=True, remove_comments=False, - remove_pis=False, target=None, encoding=None): + remove_pis=False, target=None, encoding=None, + XMLSchema schema=None): cdef int parse_options parse_options = _XML_DEFAULT_PARSE_OPTIONS if load_dtd: @@ -1472,7 +1492,7 @@ if not resolve_entities: parse_options = parse_options ^ xmlparser.XML_PARSE_NOENT - _BaseParser.__init__(self, parse_options, 0, + _BaseParser.__init__(self, parse_options, 0, schema, remove_comments, remove_pis, target, None, encoding) @@ -1487,7 +1507,7 @@ load_dtd=False, no_network=True, ns_clean=False, recover=False, remove_blank_text=False, compact=True, resolve_entities=True, remove_comments=True, - remove_pis=True, target=None, encoding=None): + remove_pis=True, target=None, encoding=None, schema=None): XMLParser.__init__(self, attribute_defaults=attribute_defaults, dtd_validation=dtd_validation, @@ -1501,7 +1521,8 @@ remove_comments=remove_comments, remove_pis=remove_pis, target=target, - encoding=encoding) + encoding=encoding, + schema=schema) cdef XMLParser __DEFAULT_XML_PARSER @@ -1561,13 +1582,15 @@ Other keyword arguments: * encoding - override the document encoding * target - a parser target object that will receive the parse events + * schema - an XMLSchema to validate against Note that you should avoid sharing parsers between threads for performance reasons. """ - def __init__(self, recover=True, no_network=True, remove_blank_text=False, - compact=True, remove_comments=False, remove_pis=False, - target=None, encoding=None): + def __init__(self, *, recover=True, no_network=True, + remove_blank_text=False, compact=True, remove_comments=False, + remove_pis=False, target=None, encoding=None, + XMLSchema schema=None): cdef int parse_options parse_options = _HTML_DEFAULT_PARSE_OPTIONS if remove_blank_text: @@ -1579,7 +1602,7 @@ if not compact: parse_options = parse_options ^ htmlparser.HTML_PARSE_COMPACT - _BaseParser.__init__(self, parse_options, 1, + _BaseParser.__init__(self, parse_options, 1, schema, remove_comments, remove_pis, target, None, encoding) Modified: lxml/trunk/src/lxml/tests/common_imports.py ============================================================================== --- lxml/trunk/src/lxml/tests/common_imports.py (original) +++ lxml/trunk/src/lxml/tests/common_imports.py Fri Jan 11 09:51:17 2008 @@ -53,9 +53,9 @@ def tearDown(self): gc.collect() - def parse(self, text): + def parse(self, text, parser=None): f = StringIO(text) - return etree.parse(f) + return etree.parse(f, parser=parser) def _rootstring(self, tree): return etree.tostring(tree.getroot()).replace(' ', '').replace('\n', '') Modified: lxml/trunk/src/lxml/tests/test_dtd.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_dtd.py (original) +++ lxml/trunk/src/lxml/tests/test_dtd.py Fri Jan 11 09:51:17 2008 @@ -26,6 +26,13 @@ dtd = etree.DTD(StringIO("")) self.assert_(dtd.validate(root)) + def test_dtd_parse_invalid(self): + fromstring = etree.fromstring + parser = etree.XMLParser(dtd_validation=True) + xml = '' % fileInTestDir("test.dtd") + self.assertRaises(etree.XMLSyntaxError, + fromstring, xml, parser=parser) + def test_dtd_invalid(self): root = etree.XML("") dtd = etree.DTD(StringIO("")) Modified: lxml/trunk/src/lxml/tests/test_xmlschema.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xmlschema.py (original) +++ lxml/trunk/src/lxml/tests/test_xmlschema.py Fri Jan 11 09:51:17 2008 @@ -26,6 +26,26 @@ self.assert_(schema.validate(tree_valid)) self.assert_(not schema.validate(tree_invalid)) + def test_xmlschema_parse(self): + schema = self.parse(''' + + + + + + + + +''') + schema = etree.XMLSchema(schema) + parser = etree.XMLParser(schema=schema) + + tree_valid = self.parse('', parser=parser) + self.assertEquals('a', tree_valid.getroot().tag) + + self.assertRaises(etree.XMLSyntaxError, + self.parse, '', parser=parser) + def test_xmlschema_elementtree_error(self): self.assertRaises(ValueError, etree.XMLSchema, etree.ElementTree()) Modified: lxml/trunk/src/lxml/xmlparser.pxd ============================================================================== --- lxml/trunk/src/lxml/xmlparser.pxd (original) +++ lxml/trunk/src/lxml/xmlparser.pxd Fri Jan 11 09:51:17 2008 @@ -91,6 +91,7 @@ xmlError lastError xmlNode* node xmlSAXHandler* sax + void* userData int* spaceTab int spaceMax bint html Modified: lxml/trunk/src/lxml/xmlschema.pxd ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxd (original) +++ lxml/trunk/src/lxml/xmlschema.pxd Fri Jan 11 09:51:17 2008 @@ -1,10 +1,11 @@ -cimport tree +from xmlparser cimport xmlSAXHandler from tree cimport xmlDoc cdef extern from "libxml/xmlschemas.h": ctypedef struct xmlSchema ctypedef struct xmlSchemaParserCtxt + ctypedef struct xmlSchemaSAXPlugStruct ctypedef struct xmlSchemaValidCtxt cdef xmlSchemaValidCtxt* xmlSchemaNewValidCtxt(xmlSchema* schema) nogil @@ -15,3 +16,9 @@ cdef void xmlSchemaFree(xmlSchema* schema) nogil cdef void xmlSchemaFreeParserCtxt(xmlSchemaParserCtxt* ctxt) nogil cdef void xmlSchemaFreeValidCtxt(xmlSchemaValidCtxt* ctxt) nogil + + cdef xmlSchemaSAXPlugStruct* xmlSchemaSAXPlug(xmlSchemaValidCtxt* ctxt, + xmlSAXHandler** sax, + void** data) nogil + cdef int xmlSchemaSAXUnplug(xmlSchemaSAXPlugStruct* sax_plug) + cdef int xmlSchemaIsValid(xmlSchemaValidCtxt* ctxt) Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Fri Jan 11 09:51:17 2008 @@ -105,8 +105,53 @@ self._error_log.disconnect() if ret == -1: - raise XMLSchemaValidateError, "Internal error in XML Schema validation." + raise XMLSchemaValidateError( + "Internal error in XML Schema validation.") if ret == 0: return True else: return False + + cdef _ParserSchemaValidationContext _newSaxValidator(self): + cdef _ParserSchemaValidationContext context + context = NEW_SCHEMA_CONTEXT(_ParserSchemaValidationContext) + context._schema = self + context._valid_ctxt = NULL + context._sax_plug = NULL + return context + +cdef class _ParserSchemaValidationContext: + cdef XMLSchema _schema + cdef xmlschema.xmlSchemaValidCtxt* _valid_ctxt + cdef xmlschema.xmlSchemaSAXPlugStruct* _sax_plug + + def __dealloc__(self): + if self._sax_plug: + self.disconnect() + if self._valid_ctxt: + xmlschema.xmlSchemaFreeValidCtxt(self._valid_ctxt) + + cdef _ParserSchemaValidationContext copy(self): + return self._schema._newSaxValidator() + + cdef int connect(self, xmlparser.xmlParserCtxt* c_ctxt) except -1: + if self._valid_ctxt is NULL: + self._valid_ctxt = xmlschema.xmlSchemaNewValidCtxt( + self._schema._c_schema) + if self._valid_ctxt is NULL: + raise XMLSchemaError, "Failed to create validation context" + self._sax_plug = xmlschema.xmlSchemaSAXPlug( + self._valid_ctxt, &c_ctxt.sax, &c_ctxt.userData) + + cdef void disconnect(self): + xmlschema.xmlSchemaSAXUnplug(self._sax_plug) + self._sax_plug = NULL + + cdef bint isvalid(self): + if self._valid_ctxt is NULL: + return 1 # valid + return xmlschema.xmlSchemaIsValid(self._valid_ctxt) + +cdef extern from "etree_defs.h": + # macro call to 't->tp_new()' for fast instantiation + cdef _ParserSchemaValidationContext NEW_SCHEMA_CONTEXT "PY_NEW" (object t) From scoder at codespeak.net Fri Jan 11 09:51:21 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 09:51:21 +0100 (CET) Subject: [Lxml-checkins] r50511 - lxml/trunk Message-ID: <20080111085121.0ECF316850A@codespeak.net> Author: scoder Date: Fri Jan 11 09:51:20 2008 New Revision: 50511 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3227 at delle: sbehnel | 2008-01-10 21:14:50 +0100 changelog cleanup and 2.0beta1 release date Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 11 09:51:20 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -Under development -================= +2.0beta1 (2008-01-11) +===================== Features added -------------- @@ -14,14 +14,14 @@ selection make their Element container accessible through a ``getparent()`` method. -* ``XSLT`` objects are now usable in any thread - at the cost of a - deep copy if they were not created in that thread. +* ``XSLT`` objects are usable in any thread - at the cost of a deep + copy if they were not created in that thread. -* Invalid entity names and character references will now be rejected - by the ``Entity()`` factory. +* Invalid entity names and character references will be rejected by + the ``Entity()`` factory. -* ``entity.text`` now returns the textual representation of the - entity, e.g. ``&``. +* ``entity.text`` returns the textual representation of the entity, + e.g. ``&``. Bugs fixed ---------- From scoder at codespeak.net Fri Jan 11 11:55:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 11:55:09 +0100 (CET) Subject: [Lxml-checkins] r50512 - in lxml/trunk: . doc src/lxml src/lxml/tests Message-ID: <20080111105509.6BF3D1684D7@codespeak.net> Author: scoder Date: Fri Jan 11 11:55:07 2008 New Revision: 50512 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/tutorial.txt lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/python.pxd lxml/trunk/src/lxml/tests/test_xpathevaluator.py Log: r3237 at delle: sbehnel | 2008-01-11 11:54:56 +0100 subtyping PyStringObject does not work in Cython/Pyrex, so XPath string results will just have to be unicode Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 11 11:55:07 2008 @@ -12,7 +12,8 @@ * XPath string results of the ``text()`` function and attribute selection make their Element container accessible through a - ``getparent()`` method. + ``getparent()`` method. As a side-effect, they are now always + unicode objects (even ASCII strings). * ``XSLT`` objects are usable in any thread - at the cost of a deep copy if they were not created in that thread. Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Fri Jan 11 11:55:07 2008 @@ -281,13 +281,39 @@ >>> print html.xpath("string()") # lxml.etree only! TEXTTAIL >>> print html.xpath("//text()") # lxml.etree only! - ['TEXT', 'TAIL'] + [u'TEXT', u'TAIL'] If you want to use this more often, you can wrap it in a function:: >>> build_text_list = etree.XPath("//text()") # lxml.etree only! >>> print build_text_list(html) - ['TEXT', 'TAIL'] + [u'TEXT', u'TAIL'] + +Note that the ``text()`` function in XPath always returns unicode +strings. This is because it is actually a special object that knows +about its origins. You can ask it where it came from through its +``getparent()`` method, just as you would with Elements:: + + >>> texts = build_text_list(html) + >>> print texts[0] + TEXT + >>> parent = texts[0].getparent() + >>> print parent.tag + body + + >>> print texts[1] + TAIL + >>> print texts[1].getparent().tag + br + +You can also find out if it's normal text content or tail text:: + + >>> print texts[0].is_text + True + >>> print texts[1].is_text + False + >>> print texts[1].is_tail + True .. _XPath: xpathxslt.html#xpath Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Fri Jan 11 11:55:07 2008 @@ -560,16 +560,7 @@ ################################################################################ # special str/unicode subclasses -cdef class _ElementUnicodeResult(python.unicode): - cdef _Element parent - cdef readonly object is_tail - cdef readonly object is_text - cdef readonly object is_attribute - - def getparent(self): - return self.parent - -cdef class _ElementStringResult(python.str): +cdef class _ElementStringResult(python.unicode): cdef _Element parent cdef readonly object is_tail cdef readonly object is_text @@ -579,22 +570,17 @@ return self.parent cdef object _newElementStringResult(_Document doc, xmlNode* c_node): - cdef _ElementUnicodeResult element_unicode - cdef _ElementStringResult element_str + cdef _ElementStringResult result cdef xmlNode* c_element cdef char* s - cdef bint is_attribute, is_tail, is_utf8 + cdef bint is_attribute, is_tail if c_node.type == tree.XML_ATTRIBUTE_NODE: is_attribute = 1 is_tail = 0 s = tree.xmlNodeGetContent(c_node) - is_utf8 = isutf8(s) try: - if is_utf8: - value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) - else: - value = s + value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) finally: tree.xmlFree(s) c_element = NULL @@ -602,12 +588,8 @@ #assert c_node.type == tree.XML_TEXT_NODE, "invalid node type" is_attribute = 0 # tail text? - is_utf8 = isutf8(c_node.content) - if is_utf8: - value = python.PyUnicode_DecodeUTF8( - c_node.content, cstd.strlen(c_node.content), NULL) - else: - value = c_node.content + value = python.PyUnicode_DecodeUTF8( + c_node.content, cstd.strlen(c_node.content), NULL) c_element = _previousElement(c_node) is_tail = c_element is not NULL @@ -620,20 +602,12 @@ if c_element is NULL: return value - if is_utf8: - element_unicode = _ElementUnicodeResult(value) - element_unicode.parent = _fakeDocElementFactory(doc, c_element) - element_unicode.is_attribute = is_attribute - element_unicode.is_tail = is_tail - element_unicode.is_text = not (is_tail or is_attribute) - return element_unicode - else: - element_str = _ElementStringResult(value) - element_str.parent = _fakeDocElementFactory(doc, c_element) - element_str.is_attribute = is_attribute - element_str.is_tail = is_tail - element_str.is_text = not (is_tail or is_attribute) - return element_str + result = _ElementStringResult(value) + result.parent = _fakeDocElementFactory(doc, c_element) + result.is_attribute = is_attribute + result.is_tail = is_tail + result.is_text = not (is_tail or is_attribute) + return result ################################################################################ # callbacks for XPath/XSLT extension functions Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Jan 11 11:55:07 2008 @@ -19,9 +19,6 @@ ctypedef class __builtin__.unicode [object PyUnicodeObject]: pass - ctypedef class __builtin__.str [object PyStringObject]: - pass - cdef FILE* PyFile_AsFile(object p) cdef int PyFile_Check(object p) cdef object PyFile_Name(object p) Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Fri Jan 11 11:55:07 2008 @@ -67,16 +67,33 @@ self.assertEquals(['Foo', 'Bar'], tree.xpath('/a/b/text()')) + def test_xpath_list_text_parent(self): + tree = self.parse('FooBarBarFoo') + root = tree.getroot() + self.assertEquals(['FooBar', 'BarFoo'], + tree.xpath('/a/b/text()')) + self.assertEquals([root[0], root[1]], + [r.getparent() for r in tree.xpath('/a/b/text()')]) + + def test_xpath_list_unicode_text_parent(self): + xml = u'FooBar\u0680\u3120BarFoo\u0680\u3120' + tree = self.parse(xml.encode('utf-8')) + root = tree.getroot() + self.assertEquals([u'FooBar\u0680\u3120', u'BarFoo\u0680\u3120'], + tree.xpath('/a/b/text()')) + self.assertEquals([root[0], root[1]], + [r.getparent() for r in tree.xpath('/a/b/text()')]) + def test_xpath_list_attribute(self): tree = self.parse('') self.assertEquals(['B'], tree.xpath('/a/@b')) def test_xpath_list_attribute_parent(self): - tree = self.parse('') + tree = self.parse('') results = tree.xpath('/a/@c') self.assertEquals(1, len(results)) - self.assertEquals('C', results[0]) + self.assertEquals('CqWeRtZuI', results[0]) self.assertEquals(tree.getroot().tag, results[0].getparent().tag) def test_xpath_list_comment(self): From scoder at codespeak.net Fri Jan 11 15:18:57 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 15:18:57 +0100 (CET) Subject: [Lxml-checkins] r50515 - in lxml/trunk: . doc Message-ID: <20080111141857.146FE1684C7@codespeak.net> Author: scoder Date: Fri Jan 11 15:18:56 2008 New Revision: 50515 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/api.txt Log: r3239 at delle: sbehnel | 2008-01-11 15:17:27 +0100 fix doctest for libxml2 2.6.31 Modified: lxml/trunk/doc/api.txt ============================================================================== --- lxml/trunk/doc/api.txt (original) +++ lxml/trunk/doc/api.txt Fri Jan 11 15:18:56 2008 @@ -308,10 +308,10 @@ >>> notxml = etree.tostring(unicode_root, encoding="UTF-16LE", ... xml_declaration=False) - >>> root = etree.XML(notxml) + >>> root = etree.XML(notxml) #doctest: +ELLIPSIS Traceback (most recent call last): ... - XMLSyntaxError: StartTag: invalid element name, line 1, column 2 + XMLSyntaxError: ... XInclude and ElementInclude From scoder at codespeak.net Fri Jan 11 15:20:45 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 15:20:45 +0100 (CET) Subject: [Lxml-checkins] r50516 - lxml/trunk Message-ID: <20080111142045.2F9131684C7@codespeak.net> Author: scoder Date: Fri Jan 11 15:20:44 2008 New Revision: 50516 Modified: lxml/trunk/ (props changed) lxml/trunk/update-error-constants.py Log: r3241 at delle: sbehnel | 2008-01-11 15:20:34 +0100 API usage fix Modified: lxml/trunk/update-error-constants.py ============================================================================== --- lxml/trunk/update-error-constants.py (original) +++ lxml/trunk/update-error-constants.py Fri Jan 11 15:20:44 2008 @@ -65,7 +65,8 @@ PARSE_ENUM_NAME = re.compile('\s*enum\s+(\w+)\s*{', re.I).match PARSE_ENUM_VALUE = re.compile('\s*=\s+([0-9]+)\s*(?::\s*(.*))?').match tree = etree.parse(html_file) - xpath = etree.XPathEvaluator(tree, {'html' : 'http://www.w3.org/1999/xhtml'}) + xpath = etree.XPathEvaluator( + tree, namespaces={'html' : 'http://www.w3.org/1999/xhtml'}) enum_dict = {} enums = xpath.evaluate("//html:pre[@class = 'programlisting' and contains(text(), 'Enum') and html:a[@name]]") From scoder at codespeak.net Fri Jan 11 16:21:31 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 16:21:31 +0100 (CET) Subject: [Lxml-checkins] r50519 - in lxml/trunk: . src/lxml/html/tests src/lxml/tests Message-ID: <20080111152131.B850F1684C7@codespeak.net> Author: scoder Date: Fri Jan 11 16:21:30 2008 New Revision: 50519 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/tests/test_clean.py lxml/trunk/src/lxml/tests/test_etree.py Log: r3243 at delle: sbehnel | 2008-01-11 16:21:15 +0100 test fixes Modified: lxml/trunk/src/lxml/html/tests/test_clean.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_clean.py (original) +++ lxml/trunk/src/lxml/html/tests/test_clean.py Fri Jan 11 16:21:30 2008 @@ -5,6 +5,6 @@ def test_suite(): suite = unittest.TestSuite() suite.addTests([doctest.DocFileSuite('test_clean.txt')]) - if LIBXML_VERSION <= (2,6,28): + if LIBXML_VERSION <= (2,6,28) or LIBXML_VERSION >= (2,6,31): suite.addTests([doctest.DocFileSuite('test_clean_embed.txt')]) return suite Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Fri Jan 11 16:21:30 2008 @@ -1353,8 +1353,7 @@ '' % ns_href, self.etree.tostring(two)) - def _test_namespaces_after_serialize(self): - # FIXME: this currently fails - fix serializer.pxi! + def test_namespaces_after_serialize(self): parse = self.etree.parse tostring = self.etree.tostring @@ -1363,9 +1362,7 @@ StringIO('' % ns_href)) baz = one.getroot()[0][0] - print tostring(baz) parsed = parse(StringIO( tostring(baz) )).getroot() - self.assertEquals('{%s}baz' % ns_href, parsed.tag) def test_element_nsmap(self): From scoder at codespeak.net Fri Jan 11 16:23:05 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 16:23:05 +0100 (CET) Subject: [Lxml-checkins] r50520 - lxml/tag/lxml-2.0beta1 Message-ID: <20080111152305.571C31684C7@codespeak.net> Author: scoder Date: Fri Jan 11 16:23:04 2008 New Revision: 50520 Added: lxml/tag/lxml-2.0beta1/ - copied from r50518, lxml/trunk/ Log: tag for lxml 2.0beta1 From scoder at codespeak.net Fri Jan 11 16:26:31 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 16:26:31 +0100 (CET) Subject: [Lxml-checkins] r50521 - lxml/tag/lxml-2.0alpha6 Message-ID: <20080111152631.8B8291684C7@codespeak.net> Author: scoder Date: Fri Jan 11 16:26:31 2008 New Revision: 50521 Added: lxml/tag/lxml-2.0alpha6/ - copied from r49929, lxml/trunk/ Log: tag for lxml 2.0alpha6 From scoder at codespeak.net Fri Jan 11 16:36:31 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 11 Jan 2008 16:36:31 +0100 (CET) Subject: [Lxml-checkins] r50522 - in lxml/trunk: . doc Message-ID: <20080111153631.245271684C7@codespeak.net> Author: scoder Date: Fri Jan 11 16:36:30 2008 New Revision: 50522 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r3245 at delle: sbehnel | 2008-01-11 16:36:16 +0100 doc update Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Fri Jan 11 16:36:30 2008 @@ -131,9 +131,11 @@ Download -------- -The best way to download binary versions is to visit `lxml at the Python -Package Index`_. It has the source, eggs and installers for various platforms. -The source distribution is signed with `this key`_. +The best way to download lxml is to visit `lxml at the Python Package +Index`_ (PyPI). It has the source that compiles on various platforms. +The source distribution is signed with `this key`_. Binary builds for +MS Windows usually become available through PyPI a few days after a +source release. .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc From scoder at codespeak.net Sat Jan 12 11:37:42 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 12 Jan 2008 11:37:42 +0100 (CET) Subject: [Lxml-checkins] r50527 - in lxml/trunk: . doc Message-ID: <20080112103742.1DBEB168548@codespeak.net> Author: scoder Date: Sat Jan 12 11:37:40 2008 New Revision: 50527 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r3249 at delle: sbehnel | 2008-01-11 18:19:02 +0100 doc update Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sat Jan 12 11:37:40 2008 @@ -135,7 +135,8 @@ Index`_ (PyPI). It has the source that compiles on various platforms. The source distribution is signed with `this key`_. Binary builds for MS Windows usually become available through PyPI a few days after a -source release. +source release. If you can't wait, consider trying a less recent +version first. .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc From scoder at codespeak.net Sat Jan 12 11:37:46 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 12 Jan 2008 11:37:46 +0100 (CET) Subject: [Lxml-checkins] r50528 - in lxml/trunk: . src/lxml/html/tests Message-ID: <20080112103746.2C389168549@codespeak.net> Author: scoder Date: Sat Jan 12 11:37:45 2008 New Revision: 50528 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/tests/test_basic.py Log: r3250 at delle: sbehnel | 2008-01-12 11:37:28 +0100 run doctests from lxmlhtml.txt Modified: lxml/trunk/src/lxml/html/tests/test_basic.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_basic.py (original) +++ lxml/trunk/src/lxml/html/tests/test_basic.py Sat Jan 12 11:37:45 2008 @@ -4,6 +4,7 @@ def test_suite(): suite = unittest.TestSuite() suite.addTests([doctest.DocFileSuite('test_basic.txt')]) + suite.addTests([doctest.DocFileSuite('../../../../doc/lxmlhtml.txt')]) return suite if __name__ == '__main__': From scoder at codespeak.net Sat Jan 12 19:41:33 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 12 Jan 2008 19:41:33 +0100 (CET) Subject: [Lxml-checkins] r50533 - in lxml/trunk: . src/lxml/html Message-ID: <20080112184133.67C63168549@codespeak.net> Author: scoder Date: Sat Jan 12 19:41:32 2008 New Revision: 50533 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/clean.py Log: r3254 at delle: sbehnel | 2008-01-12 19:41:17 +0100 code cleanup Modified: lxml/trunk/src/lxml/html/clean.py ============================================================================== --- lxml/trunk/src/lxml/html/clean.py (original) +++ lxml/trunk/src/lxml/html/clean.py Sat Jan 12 19:41:32 2008 @@ -44,7 +44,7 @@ # execution: _javascript_scheme_re = re.compile( r'\s*(?:javascript|jscript|livescript|vbscript|about|mocha):', re.I) -_whitespace_re = re.compile(r'\s+') +_substitute_whitespace = re.compile(r'\s+').sub # FIXME: should data: be blocked? # FIXME: check against: http://msdn2.microsoft.com/en-us/library/ms537512.aspx @@ -57,15 +57,6 @@ _find_external_links = etree.XPath( "descendant-or-self::a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']") -def clean_html(html, **kw): - """ - Like clean(), but takes a text input document, and returns a text - document. - """ - doc = fromstring(html) - clean(doc, **kw) - return tostring(doc) - class Cleaner(object): """ Instances cleans the document of each of the possible offending @@ -205,7 +196,7 @@ doc = doc.getroot() # Normalize a case that IE treats like , and that # can confuse either this step or later steps. - for el in doc.getiterator('image'): + for el in doc.iter('image'): el.tag = 'img' if not self.comments: # Of course, if we were going to kill comments anyway, we don't @@ -221,7 +212,7 @@ kill_tags.add('script') if self.safe_attrs_only: safe_attrs = set(defs.safe_attrs) - for el in doc.getiterator(): + for el in doc.iter(): attrib = el.attrib for aname in attrib.keys(): if aname not in safe_attrs: @@ -229,7 +220,7 @@ if self.javascript: if not self.safe_attrs_only: # safe_attrs handles events attributes itself - for el in doc.getiterator(): + for el in doc.iter(): attrib = el.attrib for aname in attrib.keys(): if aname.startswith('on'): @@ -248,7 +239,7 @@ del el.attrib['style'] elif new != old: el.set('style', new) - for el in list(doc.getiterator('style')): + for el in list(doc.iter('style')): if el.get('type', '').lower().strip() == 'text/javascript': el.drop_tree() continue @@ -277,7 +268,7 @@ elif self.style or self.javascript: # We must get rid of included stylesheets if Javascript is not # allowed, as you can put Javascript in them - for el in list(doc.getiterator('link')): + for el in list(doc.iter('link')): if 'stylesheet' in el.get('rel', '').lower(): # Note this kills alternate stylesheets as well el.drop_tree() @@ -289,7 +280,7 @@ # FIXME: is really embedded? # We should get rid of any tags not inside ; # These are not really valid anyway. - for el in list(doc.getiterator('param')): + for el in list(doc.iter('param')): found_parent = False parent = el.getparent() while parent is not None and parent.tag not in ('applet', 'object'): @@ -312,7 +303,7 @@ _remove = [] _kill = [] - for el in doc.getiterator(): + for el in doc.iter(): if el.tag in kill_tags: if self.allow_element(el): continue @@ -349,7 +340,7 @@ allow_tags = set(defs.tags) if allow_tags: bad = [] - for el in doc.getiterator(): + for el in doc.iter(): if el.tag not in allow_tags: bad.append(el) for el in bad: @@ -408,7 +399,7 @@ def _kill_elements(self, doc, condition, iterate=None): bad = [] - for el in doc.getiterator(iterate): + for el in doc.iter(iterate): if condition(el): bad.append(el) for el in bad: @@ -416,13 +407,13 @@ def _remove_javascript_link(self, link): # links like "j a v a s c r i p t:" might be interpreted in IE - new = _whitespace_re.sub('', link) + new = _substitute_whitespace('', link) if _javascript_scheme_re.search(new): # FIXME: should this be None to delete? return '' return link - _decomment_re = re.compile(r'/\*.*?\*/', re.S) + _substitute_comments = re.compile(r'/\*.*?\*/', re.S).sub def _has_sneaky_javascript(self, style): """ @@ -435,9 +426,9 @@ that and remove only the Javascript from the style; this catches more sneaky attempts. """ - style = self._decomment_re.sub('', style) + style = self._substitute_comments('', style) style = style.replace('\\', '') - style = _whitespace_re.sub('', style) + style = _substitute_whitespace('', style) style = style.lower() if 'javascript:' in style: return True From scoder at codespeak.net Sat Jan 12 20:03:41 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 12 Jan 2008 20:03:41 +0100 (CET) Subject: [Lxml-checkins] r50534 - in lxml/trunk: . src/lxml Message-ID: <20080112190341.83449168544@codespeak.net> Author: scoder Date: Sat Jan 12 20:03:41 2008 New Revision: 50534 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/doctestcompare.py Log: r3256 at delle: sbehnel | 2008-01-12 19:47:21 +0100 do not use recovering HTML parser in doctestcompare Modified: lxml/trunk/src/lxml/doctestcompare.py ============================================================================== --- lxml/trunk/src/lxml/doctestcompare.py (original) +++ lxml/trunk/src/lxml/doctestcompare.py Sat Jan 12 20:03:41 2008 @@ -28,7 +28,6 @@ """ from lxml import etree -from lxml.html import document_fromstring import re import doctest import cgi @@ -51,6 +50,11 @@ def norm_whitespace(v): return _norm_whitespace_re.sub(' ', v) +_html_parser = etree.HTMLParser(recover=False) + +def html_fromstring(html): + return etree.fromstring(html, _html_parser) + # We use this to distinguish repr()s from elements: _repr_re = re.compile(r'^<[^>]+ (at|object) ') _norm_whitespace_re = re.compile(r'[ \t\n][ \t\n]+') @@ -90,12 +94,12 @@ if NOPARSE_MARKUP & optionflags: return None if PARSE_HTML & optionflags: - parser = document_fromstring + parser = html_fromstring elif PARSE_XML & optionflags: parser = etree.XML elif (want.strip().lower().startswith(' Author: scoder Date: Sat Jan 12 20:03:44 2008 New Revision: 50535 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/doctestcompare.py Log: r3257 at delle: sbehnel | 2008-01-12 20:03:30 +0100 remove blank text in HTML doctest parsing Modified: lxml/trunk/src/lxml/doctestcompare.py ============================================================================== --- lxml/trunk/src/lxml/doctestcompare.py (original) +++ lxml/trunk/src/lxml/doctestcompare.py Sat Jan 12 20:03:44 2008 @@ -50,7 +50,7 @@ def norm_whitespace(v): return _norm_whitespace_re.sub(' ', v) -_html_parser = etree.HTMLParser(recover=False) +_html_parser = etree.HTMLParser(recover=False, remove_blank_text=True) def html_fromstring(html): return etree.fromstring(html, _html_parser) From scoder at codespeak.net Mon Jan 14 19:54:20 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 14 Jan 2008 19:54:20 +0100 (CET) Subject: [Lxml-checkins] r50612 - in lxml/trunk: . src/lxml/html/tests Message-ID: <20080114185420.86285168564@codespeak.net> Author: scoder Date: Mon Jan 14 19:54:19 2008 New Revision: 50612 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/tests/test_forms.txt Log: r3260 at delle: sbehnel | 2008-01-14 07:22:48 +0100 doctest fixes Modified: lxml/trunk/src/lxml/html/tests/test_forms.txt ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_forms.txt (original) +++ lxml/trunk/src/lxml/html/tests/test_forms.txt Mon Jan 14 19:54:19 2008 @@ -39,7 +39,7 @@ 'http://example.org/test' >>> f.method 'GET' ->>> f.inputs +>>> f.inputs # doctest:+NOPARSE_MARKUP >>> hidden = f.inputs['hidden_field'] >>> hidden.checkable @@ -68,10 +68,10 @@ >>> checkbox2.value 'good' >>> group = f.inputs['check_group'] ->>> group.value +>>> group.value # doctest:+NOPARSE_MARKUP >>> group.value.add('1') ->>> group.value +>>> group.value # doctest:+NOPARSE_MARKUP >>> print tostring(group[0]) @@ -110,7 +110,7 @@ >>> select.value_options [None, '', '1'] >>> select = f.inputs['select2'] ->>> select.value +>>> select.value # doctest:+NOPARSE_MARKUP >>> select.value.update(['2', '3']) >>> select.value.remove('3') @@ -124,7 +124,7 @@ >>> print urllib.urlencode(f.form_values()) hidden_field=new+value&text_field=text_value&single_checkbox=on&single_checkbox2=good&check_group=1&check_group=2&check_group=3&textarea_field=some+text&select1=&select2=1&select2=2&select2=3 >>> fields = f.fields ->>> fields +>>> fields # doctest:+NOPARSE_MARKUP >>> for name, value in fields.items(): ... print '%s: %r' % (name, value) From scoder at codespeak.net Mon Jan 14 19:54:30 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 14 Jan 2008 19:54:30 +0100 (CET) Subject: [Lxml-checkins] r50613 - in lxml/trunk: . src/lxml/html/tests Message-ID: <20080114185430.DF91E168564@codespeak.net> Author: scoder Date: Mon Jan 14 19:54:30 2008 New Revision: 50613 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/tests/test_basic.py Log: r3261 at delle: sbehnel | 2008-01-14 07:23:28 +0100 lxmlhtml.txt doesn't work as doctest Modified: lxml/trunk/src/lxml/html/tests/test_basic.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_basic.py (original) +++ lxml/trunk/src/lxml/html/tests/test_basic.py Mon Jan 14 19:54:30 2008 @@ -4,7 +4,6 @@ def test_suite(): suite = unittest.TestSuite() suite.addTests([doctest.DocFileSuite('test_basic.txt')]) - suite.addTests([doctest.DocFileSuite('../../../../doc/lxmlhtml.txt')]) return suite if __name__ == '__main__': From scoder at codespeak.net Fri Jan 18 15:57:16 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 18 Jan 2008 15:57:16 +0100 (CET) Subject: [Lxml-checkins] r50751 - in lxml/trunk: . doc Message-ID: <20080118145716.35129169E1E@codespeak.net> Author: scoder Date: Fri Jan 18 15:57:15 2008 New Revision: 50751 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/lxml2.txt Log: r3264 at delle: sbehnel | 2008-01-16 10:43:18 +0100 doc update Modified: lxml/trunk/doc/lxml2.txt ============================================================================== --- lxml/trunk/doc/lxml2.txt (original) +++ lxml/trunk/doc/lxml2.txt Fri Jan 18 15:57:15 2008 @@ -21,6 +21,17 @@ extensions. Wherever possible, lxml 1.3 comes close to the semantics of lxml 2.0, so that migrating should be easier for code that currently runs with 1.3. +One of the important internal changes was the switch from the Pyrex_ +compiler to Cython_, which provides better optimisation and improved +support for newer Python language features. This allows the code of +lxml to become more Python-like again, while the performance improves +as Cython continues its own development. The code simplification, +which will continue throughout the 2.x series, will hopefully make it +even easier for users to contribute. + +.. _Cython: http://www.cython.org/ +.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ + Changes in etree and objectify ============================== From scoder at codespeak.net Fri Jan 18 15:57:20 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 18 Jan 2008 15:57:20 +0100 (CET) Subject: [Lxml-checkins] r50752 - in lxml/trunk: . src/lxml Message-ID: <20080118145720.08F80169E1D@codespeak.net> Author: scoder Date: Fri Jan 18 15:57:19 2008 New Revision: 50752 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/lxml.etree.pyx Log: r3265 at delle: sbehnel | 2008-01-18 00:20:37 +0100 error reporting fixes Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Jan 18 15:57:19 2008 @@ -728,7 +728,7 @@ if seqlength != slicelength: raise ValueError( "attempt to assign sequence of size %d " - "to extended slice of size %d" % (seqlength, c)) + "to extended slice of size %d" % (seqlength, slicelength)) if c_node is NULL: # no children yet => add all elements straight away Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Jan 18 15:57:19 2008 @@ -575,7 +575,7 @@ """ cdef xmlNode* c_node cdef xmlNode* c_next - cdef Py_ssize_t index, step, slicelength + cdef Py_ssize_t step, slicelength if python.PySlice_Check(x): # slice deletion if _isFullSlice(x): @@ -594,7 +594,7 @@ # item deletion c_node = _findChild(self._c_node, x) if c_node is NULL: - raise IndexError, index + raise IndexError("index out of range: %d" % x) _removeText(c_node.next) _removeNode(self._doc, c_node) From scoder at codespeak.net Sat Jan 19 14:22:10 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 19 Jan 2008 14:22:10 +0100 (CET) Subject: [Lxml-checkins] r50778 - lxml/trunk Message-ID: <20080119132210.6AB2F16851D@codespeak.net> Author: scoder Date: Sat Jan 19 14:22:09 2008 New Revision: 50778 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3268 at delle: sbehnel | 2008-01-19 14:21:46 +0100 do not use close_fds in Popen() as it is not portable Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Sat Jan 19 14:22:09 2008 @@ -135,7 +135,7 @@ _, rf, ef = os.popen3(cmd) else: # Python 2.4+ - p = subprocess.Popen(cmd, shell=True, close_fds=True, + p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) rf, ef = p.stdout, p.stderr errors = ef.read() From scoder at codespeak.net Sat Jan 19 14:36:33 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 19 Jan 2008 14:36:33 +0100 (CET) Subject: [Lxml-checkins] r50779 - lxml/trunk Message-ID: <20080119133633.C6D9616850C@codespeak.net> Author: scoder Date: Sat Jan 19 14:36:31 2008 New Revision: 50779 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/Makefile Log: r3270 at delle: sbehnel | 2008-01-19 14:25:11 +0100 do not remove generated .c files in 'make clean', use 'make realclean' instead Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Jan 19 14:36:31 2008 @@ -8,6 +8,22 @@ Features added -------------- +Bugs fixed +---------- + +Other changes +------------- + +* ``make clean`` no longer removes the .c files (use ``make + realclean`` instead) + + +2.0beta1 (2008-01-11) +===================== + +Features added +-------------- + * Parse-time XML schema validation (``schema`` parser keyword). * XPath string results of the ``text()`` function and attribute Modified: lxml/trunk/Makefile ============================================================================== --- lxml/trunk/Makefile (original) +++ lxml/trunk/Makefile Sat Jan 19 14:36:31 2008 @@ -52,9 +52,10 @@ ftest: ftest_inplace clean: - find . \( -name '*.o' -o -name '*.c' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; + find . \( -name '*.o' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; rm -rf build realclean: clean + find . -name '*.c' -exec rm -f {} \; rm -f TAGS $(PYTHON) setup.py clean -a From scoder at codespeak.net Sun Jan 20 12:56:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 20 Jan 2008 12:56:28 +0100 (CET) Subject: [Lxml-checkins] r50796 - in lxml/trunk: . doc Message-ID: <20080120115628.4595D168559@codespeak.net> Author: scoder Date: Sun Jan 20 12:56:26 2008 New Revision: 50796 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3272 at delle: sbehnel | 2008-01-20 12:04:35 +0100 FAQ fix Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sun Jan 20 12:56:26 2008 @@ -42,7 +42,7 @@ 6.2 Why can't lxml parse my XML from unicode strings? 6.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ? 6.4 Why can't I just delete parents or clear the root node in iterparse()? - 6.5 How do I output null bytes in XML text? + 6.5 How do I output null characters in XML text? 7 XPath and Document Traversal 7.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)? 7.2 Why doesn't ``findall()`` support full XPath expressions? @@ -609,12 +609,12 @@ .. _`iterparse section`: api.html#iterparse-and-iterwalk -How do I output null bytes in XML text? +How do I output null characters in XML text? --------------------------------------- Don't. What you would produce is not well-formed XML. XML parsers -will refuse to parse a document that contains null bytes. The right -way to embed binary data in XML is using a text encoding such as +will refuse to parse a document that contains null characters. The +right way to embed binary data in XML is using a text encoding such as uuencode or base64. From scoder at codespeak.net Mon Jan 21 19:39:20 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 21 Jan 2008 19:39:20 +0100 (CET) Subject: [Lxml-checkins] r50848 - in lxml/trunk: . src/lxml/html Message-ID: <20080121183920.2C42E16856B@codespeak.net> Author: scoder Date: Mon Jan 21 19:39:18 2008 New Revision: 50848 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/diff.py Log: r3274 at delle: sbehnel | 2008-01-21 11:24:48 +0100 fix Py2.4-isms Modified: lxml/trunk/src/lxml/html/diff.py ============================================================================== --- lxml/trunk/src/lxml/html/diff.py (original) +++ lxml/trunk/src/lxml/html/diff.py Mon Jan 21 19:39:18 2008 @@ -320,7 +320,7 @@ name, pos, tag = tag_stack.pop() balanced[pos] = tag elif tag_stack: - start.extend(tag for name, pos, tag in tag_stack) + start.extend([tag for name, pos, tag in tag_stack]) tag_stack = [] end.append(chunk) else: @@ -702,8 +702,8 @@ The text representation of the start tag for a tag. """ return '<%s%s>' % ( - el.tag, ''.join(' %s="%s"' % (name, cgi.escape(value, True)) - for name, value in el.attrib.items())) + el.tag, ''.join([' %s="%s"' % (name, cgi.escape(value, True)) + for name, value in el.attrib.items())]) def end_tag(el): """ The text representation of an end tag for a tag. Includes From scoder at codespeak.net Mon Jan 21 19:39:26 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 21 Jan 2008 19:39:26 +0100 (CET) Subject: [Lxml-checkins] r50849 - in lxml/trunk: . src/lxml Message-ID: <20080121183926.730F316856D@codespeak.net> Author: scoder Date: Mon Jan 21 19:39:25 2008 New Revision: 50849 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/python.pxd Log: r3275 at delle: sbehnel | 2008-01-21 11:29:56 +0100 fix Py2.4-isms Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Jan 21 19:39:25 2008 @@ -11,6 +11,8 @@ Bugs fixed ---------- +* Some Python 2.4-isms slipped through in beta1. + Other changes ------------- Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Mon Jan 21 19:39:25 2008 @@ -846,7 +846,7 @@ prefix = None else: prefix = funicode(c_ns.prefix) - if not python.PyDict_Contains(nsmap, prefix): + if not python.PyDict_GetItem(nsmap, prefix): python.PyDict_SetItem( nsmap, prefix, funicode(c_ns.href)) c_ns = c_ns.next Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Mon Jan 21 19:39:25 2008 @@ -66,7 +66,7 @@ cdef void PyDict_Clear(object d) cdef object PyDict_Copy(object d) cdef object PyDictProxy_New(object d) - cdef int PyDict_Contains(object d, object key) except -1 + # cdef int PyDict_Contains(object d, object key) except -1 # Python 2.4+ cdef Py_ssize_t PyDict_Size(object d) cdef object PySequence_List(object o) cdef object PySequence_Tuple(object o) From scoder at codespeak.net Mon Jan 21 19:39:45 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 21 Jan 2008 19:39:45 +0100 (CET) Subject: [Lxml-checkins] r50850 - in lxml/trunk: . src/lxml/html src/lxml/html/tests src/lxml/tests Message-ID: <20080121183945.8B55E16856B@codespeak.net> Author: scoder Date: Mon Jan 21 19:39:44 2008 New Revision: 50850 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/diff.py lxml/trunk/src/lxml/html/tests/test_autolink.py lxml/trunk/src/lxml/html/tests/test_basic.py lxml/trunk/src/lxml/html/tests/test_clean.py lxml/trunk/src/lxml/html/tests/test_diff.py lxml/trunk/src/lxml/html/tests/test_elementsoup.py lxml/trunk/src/lxml/html/tests/test_feedparser_data.py lxml/trunk/src/lxml/html/tests/test_formfill.py lxml/trunk/src/lxml/html/tests/test_forms.py lxml/trunk/src/lxml/html/tests/test_rewritelinks.py lxml/trunk/src/lxml/tests/test_css.py lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/tests/test_objectify.py lxml/trunk/src/lxml/tests/test_xpathevaluator.py Log: r3276 at delle: sbehnel | 2008-01-21 14:51:38 +0100 run HTML doctests only under Python 2.4+, fix some 2.4-isms in the tests Modified: lxml/trunk/src/lxml/html/diff.py ============================================================================== --- lxml/trunk/src/lxml/html/diff.py (original) +++ lxml/trunk/src/lxml/html/diff.py Mon Jan 21 19:39:44 2008 @@ -703,7 +703,7 @@ """ return '<%s%s>' % ( el.tag, ''.join([' %s="%s"' % (name, cgi.escape(value, True)) - for name, value in el.attrib.items())]) + for name, value in el.attrib.items()])) def end_tag(el): """ The text representation of an end tag for a tag. Includes Modified: lxml/trunk/src/lxml/html/tests/test_autolink.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_autolink.py (original) +++ lxml/trunk/src/lxml/html/tests/test_autolink.py Mon Jan 21 19:39:44 2008 @@ -1,9 +1,10 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_autolink.txt')]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_autolink.txt')]) return suite if __name__ == '__main__': Modified: lxml/trunk/src/lxml/html/tests/test_basic.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_basic.py (original) +++ lxml/trunk/src/lxml/html/tests/test_basic.py Mon Jan 21 19:39:44 2008 @@ -1,9 +1,10 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_basic.txt')]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_basic.txt')]) return suite if __name__ == '__main__': Modified: lxml/trunk/src/lxml/html/tests/test_clean.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_clean.py (original) +++ lxml/trunk/src/lxml/html/tests/test_clean.py Mon Jan 21 19:39:44 2008 @@ -1,10 +1,11 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest from lxml.etree import LIBXML_VERSION def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_clean.txt')]) - if LIBXML_VERSION <= (2,6,28) or LIBXML_VERSION >= (2,6,31): - suite.addTests([doctest.DocFileSuite('test_clean_embed.txt')]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_clean.txt')]) + if LIBXML_VERSION <= (2,6,28) or LIBXML_VERSION >= (2,6,31): + suite.addTests([doctest.DocFileSuite('test_clean_embed.txt')]) return suite Modified: lxml/trunk/src/lxml/html/tests/test_diff.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_diff.py (original) +++ lxml/trunk/src/lxml/html/tests/test_diff.py Mon Jan 21 19:39:44 2008 @@ -1,12 +1,13 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest from lxml.html import diff def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_diff.txt'), - doctest.DocTestSuite(diff)]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_diff.txt'), + doctest.DocTestSuite(diff)]) return suite if __name__ == '__main__': Modified: lxml/trunk/src/lxml/html/tests/test_elementsoup.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_elementsoup.py (original) +++ lxml/trunk/src/lxml/html/tests/test_elementsoup.py Mon Jan 21 19:39:44 2008 @@ -1,4 +1,4 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest try: @@ -10,8 +10,9 @@ def test_suite(): suite = unittest.TestSuite() - if BS_INSTALLED: - suite.addTests([doctest.DocFileSuite('../../../../doc/elementsoup.txt')]) + if sys.version_info >= (2,4): + if BS_INSTALLED: + suite.addTests([doctest.DocFileSuite('../../../../doc/elementsoup.txt')]) return suite if __name__ == '__main__': Modified: lxml/trunk/src/lxml/html/tests/test_feedparser_data.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_feedparser_data.py (original) +++ lxml/trunk/src/lxml/html/tests/test_feedparser_data.py Mon Jan 21 19:39:44 2008 @@ -1,9 +1,11 @@ +import sys import os import re import rfc822 import unittest from lxml.tests.common_imports import doctest -from lxml.doctestcompare import LHTMLOutputChecker +if sys.version_info >= (2,4): + from lxml.doctestcompare import LHTMLOutputChecker from lxml.html.clean import clean, Cleaner @@ -75,15 +77,16 @@ def test_suite(): suite = unittest.TestSuite() - for dir in feed_dirs: - for fn in os.listdir(dir): - fn = os.path.join(dir, fn) - if fn.endswith('.data'): - case = FeedTestCase(fn) - suite.addTests([case]) - # This is my lazy way of stopping on first error: - try: - case.runTest() - except: - break + if sys.version_info >= (2,4): + for dir in feed_dirs: + for fn in os.listdir(dir): + fn = os.path.join(dir, fn) + if fn.endswith('.data'): + case = FeedTestCase(fn) + suite.addTests([case]) + # This is my lazy way of stopping on first error: + try: + case.runTest() + except: + break return suite Modified: lxml/trunk/src/lxml/html/tests/test_formfill.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_formfill.py (original) +++ lxml/trunk/src/lxml/html/tests/test_formfill.py Mon Jan 21 19:39:44 2008 @@ -1,7 +1,8 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_formfill.txt')]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_formfill.txt')]) return suite Modified: lxml/trunk/src/lxml/html/tests/test_forms.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_forms.py (original) +++ lxml/trunk/src/lxml/html/tests/test_forms.py Mon Jan 21 19:39:44 2008 @@ -1,9 +1,10 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_forms.txt')]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_forms.txt')]) return suite if __name__ == '__main__': Modified: lxml/trunk/src/lxml/html/tests/test_rewritelinks.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_rewritelinks.py (original) +++ lxml/trunk/src/lxml/html/tests/test_rewritelinks.py Mon Jan 21 19:39:44 2008 @@ -1,9 +1,10 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest def test_suite(): suite = unittest.TestSuite() - suite.addTests([doctest.DocFileSuite('test_rewritelinks.txt')]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_rewritelinks.txt')]) return suite if __name__ == '__main__': Modified: lxml/trunk/src/lxml/tests/test_css.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_css.py (original) +++ lxml/trunk/src/lxml/tests/test_css.py Mon Jan 21 19:39:44 2008 @@ -1,4 +1,4 @@ -import unittest +import unittest, sys from lxml.tests.common_imports import doctest from lxml import html from lxml import cssselect @@ -61,10 +61,10 @@ self.index = index unittest.TestCase.__init__(self) - @classmethod def all(cls): for i in range(len(cls.selectors)): yield cls(i) + all = classmethod(all) def runTest(self): f = open(doc_fn, 'rb') Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Mon Jan 21 19:39:44 2008 @@ -1736,9 +1736,8 @@ def test_sourceline_iterparse_end(self): iterparse = self.etree.iterparse - lines = list( - el.sourceline for (event, el) in - iterparse(fileInTestDir('include/test_xinclude.xml'))) + lines = [ el.sourceline for (event, el) in + iterparse(fileInTestDir('include/test_xinclude.xml')) ] self.assertEquals( [2, 3, 1], @@ -1746,10 +1745,9 @@ def test_sourceline_iterparse_start(self): iterparse = self.etree.iterparse - lines = list( - el.sourceline for (event, el) in - iterparse(fileInTestDir('include/test_xinclude.xml'), - events=("start",))) + lines = [ el.sourceline for (event, el) in + iterparse(fileInTestDir('include/test_xinclude.xml'), + events=("start",)) ] self.assertEquals( [1, 2, 3], Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Mon Jan 21 19:39:44 2008 @@ -37,8 +37,8 @@ # None: xsi:nil="true" } -xsitype2objclass = dict(( (v, k) for k in objectclass2xsitype - for v in objectclass2xsitype[k] )) +xsitype2objclass = dict([ (v, k) for k in objectclass2xsitype + for v in objectclass2xsitype[k] ]) objectclass2pytype = { # objectify built-in @@ -50,7 +50,8 @@ # None: xsi:nil="true" } -pytype2objclass = dict(( (objectclass2pytype[k], k) for k in objectclass2pytype)) +pytype2objclass = dict([ (objectclass2pytype[k], k) + for k in objectclass2pytype]) xml_str = '''\ Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Mon Jan 21 19:39:44 2008 @@ -4,10 +4,10 @@ Test cases related to XPath evaluation and the XPath class """ -import unittest, doctest +import unittest from StringIO import StringIO -from common_imports import etree, HelperTestCase +from common_imports import etree, HelperTestCase, doctest class ETreeXPathTestCase(HelperTestCase): """XPath tests etree""" From scoder at codespeak.net Mon Jan 21 19:40:00 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 21 Jan 2008 19:40:00 +0100 (CET) Subject: [Lxml-checkins] r50851 - in lxml/trunk: . src/lxml/tests Message-ID: <20080121184000.BA6D416856B@codespeak.net> Author: scoder Date: Mon Jan 21 19:39:59 2008 New Revision: 50851 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/tests/test_css.py lxml/trunk/src/lxml/tests/test_objectify.py Log: r3277 at delle: sbehnel | 2008-01-21 16:40:29 +0100 switch off some more doctests under Python 2.3 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Jan 21 19:39:59 2008 @@ -11,7 +11,8 @@ Bugs fixed ---------- -* Some Python 2.4-isms slipped through in beta1. +* Some Python 2.4-isms prevented lxml from building/running under + Python 2.3. Other changes ------------- @@ -19,6 +20,8 @@ * ``make clean`` no longer removes the .c files (use ``make realclean`` instead) +* The test suite now skips most doctests under Python 2.3. + 2.0beta1 (2008-01-11) ===================== Modified: lxml/trunk/src/lxml/tests/test_css.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_css.py (original) +++ lxml/trunk/src/lxml/tests/test_css.py Mon Jan 21 19:39:59 2008 @@ -112,7 +112,8 @@ def test_suite(): suite = unittest.TestSuite() - for fn in 'test_css.txt', 'test_css_select.txt': - suite.addTests([doctest.DocFileSuite(fn)]) + if sys.version_info >= (2,4): + suite.addTests([doctest.DocFileSuite('test_css_select.txt')]) + suite.addTests([doctest.DocFileSuite('test_css.txt')]) suite.addTests(list(CSSTestCase.all())) return suite Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Mon Jan 21 19:39:59 2008 @@ -5,7 +5,7 @@ """ -import unittest, operator +import unittest, operator, sys from common_imports import etree, StringIO, HelperTestCase, fileInTestDir from common_imports import SillyFileLike, canonicalize, doctest @@ -2071,8 +2071,9 @@ def test_suite(): suite = unittest.TestSuite() suite.addTests([unittest.makeSuite(ObjectifyTestCase)]) - suite.addTests( - [doctest.DocFileSuite('../../../doc/objectify.txt')]) + if sys.version_info >= (2,4): + suite.addTests( + [doctest.DocFileSuite('../../../doc/objectify.txt')]) return suite if __name__ == '__main__': From scoder at codespeak.net Wed Jan 23 11:25:01 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 11:25:01 +0100 (CET) Subject: [Lxml-checkins] r50902 - in lxml/trunk: . src/lxml/html/tests Message-ID: <20080123102501.5405E1684D9@codespeak.net> Author: scoder Date: Wed Jan 23 11:24:59 2008 New Revision: 50902 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/tests/test_clean.txt Log: r3282 at delle: sbehnel | 2008-01-21 22:23:44 +0100 fix doctests Modified: lxml/trunk/src/lxml/html/tests/test_clean.txt ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_clean.txt (original) +++ lxml/trunk/src/lxml/html/tests/test_clean.txt Wed Jan 23 11:24:59 2008 @@ -3,29 +3,28 @@ >>> from lxml.html import usedoctest >>> doc = ''' -... -... -... -... -... -... -... -... a link -... another link -...

a paragraph

-...
secret EVIL!
-... of EVIL! -... -...
-... Password: -...
-... annoying EVIL! -... spam spam SPAM! -... -... +... +... +... +... +... +... +... +... a link +... another link +...

a paragraph

+...
secret EVIL!
+... of EVIL! +... +...
+... Password: +...
+... spam spam SPAM! +... +... ... ''' >>> print doc @@ -49,9 +48,8 @@
Password:
- annoying EVIL! spam spam SPAM! - + @@ -76,9 +74,8 @@
Password:
- annoying EVIL! spam spam SPAM! - + @@ -94,7 +91,6 @@
secret EVIL!
of EVIL! Password: - annoying EVIL! spam spam SPAM! @@ -112,7 +108,6 @@
secret EVIL!
of EVIL! Password: - annoying EVIL! spam spam SPAM! From scoder at codespeak.net Wed Jan 23 11:25:04 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 11:25:04 +0100 (CET) Subject: [Lxml-checkins] r50903 - lxml/trunk Message-ID: <20080123102504.4F08F1684DA@codespeak.net> Author: scoder Date: Wed Jan 23 11:25:03 2008 New Revision: 50903 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3283 at delle: sbehnel | 2008-01-21 22:24:06 +0100 changelog fix Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Jan 23 11:25:03 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -2.0beta1 (2008-01-11) -===================== +Under development +================= Features added -------------- From scoder at codespeak.net Wed Jan 23 11:25:08 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 11:25:08 +0100 (CET) Subject: [Lxml-checkins] r50904 - in lxml/trunk: . src/lxml/html Message-ID: <20080123102508.4600A1684DB@codespeak.net> Author: scoder Date: Wed Jan 23 11:25:07 2008 New Revision: 50904 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/clean.py Log: r3284 at delle: sbehnel | 2008-01-22 08:44:43 +0100 missing import Modified: lxml/trunk/src/lxml/html/clean.py ============================================================================== --- lxml/trunk/src/lxml/html/clean.py (original) +++ lxml/trunk/src/lxml/html/clean.py Wed Jan 23 11:25:07 2008 @@ -1,4 +1,5 @@ import re +import copy import urlparse from lxml import etree from lxml.html import defs From scoder at codespeak.net Wed Jan 23 11:25:12 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 11:25:12 +0100 (CET) Subject: [Lxml-checkins] r50905 - lxml/trunk Message-ID: <20080123102512.EDC1C1684DE@codespeak.net> Author: scoder Date: Wed Jan 23 11:25:12 2008 New Revision: 50905 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3285 at delle: sbehnel | 2008-01-22 09:28:55 +0100 changelog cleanup Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Jan 23 11:25:12 2008 @@ -11,17 +11,19 @@ Bugs fixed ---------- +* Missing import in ``lxml.html.clean``. + * Some Python 2.4-isms prevented lxml from building/running under Python 2.3. Other changes ------------- +* The test suite now skips most doctests under Python 2.3. + * ``make clean`` no longer removes the .c files (use ``make realclean`` instead) -* The test suite now skips most doctests under Python 2.3. - 2.0beta1 (2008-01-11) ===================== From scoder at codespeak.net Wed Jan 23 16:11:50 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 16:11:50 +0100 (CET) Subject: [Lxml-checkins] r50924 - in lxml/trunk: . src/lxml Message-ID: <20080123151150.EB9DC16847A@codespeak.net> Author: scoder Date: Wed Jan 23 16:11:49 2008 New Revision: 50924 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/dtd.pxi lxml/trunk/src/lxml/relaxng.pxi lxml/trunk/src/lxml/xmlschema.pxi Log: r3290 at delle: sbehnel | 2008-01-23 13:03:09 +0100 keyword-only arguments in validators Modified: lxml/trunk/src/lxml/dtd.pxi ============================================================================== --- lxml/trunk/src/lxml/dtd.pxi (original) +++ lxml/trunk/src/lxml/dtd.pxi Wed Jan 23 16:11:49 2008 @@ -27,7 +27,7 @@ catalog. """ cdef tree.xmlDtd* _c_dtd - def __init__(self, file=None, external_id=None): + def __init__(self, file=None, *, external_id=None): self._c_dtd = NULL if file is not None: if python._isString(file): Modified: lxml/trunk/src/lxml/relaxng.pxi ============================================================================== --- lxml/trunk/src/lxml/relaxng.pxi (original) +++ lxml/trunk/src/lxml/relaxng.pxi Wed Jan 23 16:11:49 2008 @@ -21,10 +21,12 @@ cdef class RelaxNG(_Validator): """Turn a document into a Relax NG validator. - Can also load from filesystem directly given file object or filename. + + Either pass a schema as Element or ElementTree, or pass a file or + filename through the ``file`` keyword argument. """ cdef relaxng.xmlRelaxNG* _c_schema - def __init__(self, etree=None, file=None): + def __init__(self, etree=None, *, file=None): cdef _Document doc cdef _Element root_node cdef xmlNode* c_node Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Wed Jan 23 16:11:49 2008 @@ -21,9 +21,12 @@ cdef class XMLSchema(_Validator): """Turn a document into an XML Schema validator. + + Either pass a schema as Element or ElementTree, or pass a file or + filename through the ``file`` keyword argument. """ cdef xmlschema.xmlSchema* _c_schema - def __init__(self, etree=None, file=None): + def __init__(self, etree=None, *, file=None): cdef _Document doc cdef _Element root_node cdef xmlDoc* fake_c_doc From scoder at codespeak.net Wed Jan 23 17:10:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 17:10:03 +0100 (CET) Subject: [Lxml-checkins] r50930 - in lxml/trunk: . src/lxml Message-ID: <20080123161003.7311B168469@codespeak.net> Author: scoder Date: Wed Jan 23 17:10:02 2008 New Revision: 50930 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/dtd.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/relaxng.pxi lxml/trunk/src/lxml/xmlschema.pxi Log: r3292 at delle: sbehnel | 2008-01-23 16:39:27 +0100 cleanup in validation code, paste local error log into exceptions during schema parsing Modified: lxml/trunk/src/lxml/dtd.pxi ============================================================================== --- lxml/trunk/src/lxml/dtd.pxi (original) +++ lxml/trunk/src/lxml/dtd.pxi Wed Jan 23 17:10:02 2008 @@ -29,21 +29,27 @@ cdef tree.xmlDtd* _c_dtd def __init__(self, file=None, *, external_id=None): self._c_dtd = NULL + _Validator.__init__(self) if file is not None: if python._isString(file): + self._error_log.connect() self._c_dtd = xmlparser.xmlParseDTD(NULL, _cstr(file)) + self._error_log.disconnect() elif hasattr(file, 'read'): self._c_dtd = _parseDtdFromFilelike(file) else: - raise DTDParseError, "parsing from file objects is not supported" + raise DTDParseError("file must be a filename or file-like object") elif external_id is not None: + self._error_log.connect() self._c_dtd = xmlparser.xmlParseDTD(external_id, NULL) + self._error_log.disconnect() else: - raise DTDParseError, "either filename or external ID required" + raise DTDParseError("either filename or external ID required") if self._c_dtd is NULL: - raise DTDParseError, "error parsing DTD" - _Validator.__init__(self) + raise DTDParseError( + self._error_log._buildExceptionMessage("error parsing DTD"), + error_log=self._error_log) def __dealloc__(self): tree.xmlFreeDtd(self._c_dtd) @@ -77,7 +83,7 @@ self._error_log.disconnect() if ret == -1: - raise DTDValidateError, "Internal error in DTD validation" + raise DTDValidateError("Internal error in DTD validation") if ret == 1: return True else: @@ -87,15 +93,19 @@ cdef tree.xmlDtd* _parseDtdFromFilelike(file) except NULL: cdef _ExceptionContext exc_context cdef _FileReaderContext dtd_parser + cdef _ErrorLog error_log cdef tree.xmlDtd* c_dtd exc_context = _ExceptionContext() dtd_parser = _FileReaderContext(file, exc_context, None, None) + error_log = _ErrorLog() + error_log.connect() c_dtd = dtd_parser._readDtd() + error_log.disconnect() exc_context._raise_if_stored() if c_dtd is NULL: - raise DTDParseError, "error parsing DTD" + raise DTDParseError("error parsing DTD", error_log=error_log) return c_dtd cdef extern from "etree_defs.h": Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Wed Jan 23 17:10:02 2008 @@ -93,9 +93,12 @@ """Main exception base class for lxml. All other exceptions inherit from this one. """ - def __init__(self, *args): + def __init__(self, *args, error_log=None): _initError(self, *args) - self.error_log = __copyGlobalErrorLog() + if error_log is None: + self.error_log = __copyGlobalErrorLog() + else: + self.error_log = error_log.copy() cdef object _LxmlError _LxmlError = LxmlError @@ -2370,14 +2373,14 @@ def assertValid(self, etree): "Raises DocumentInvalid if the document does not comply with the schema." if not self(etree): - raise DocumentInvalid, self._error_log._buildExceptionMessage( - "Document does not comply with schema") + raise DocumentInvalid(self._error_log._buildExceptionMessage( + "Document does not comply with schema")) def assert_(self, etree): "Raises AssertionError if the document does not comply with the schema." if not self(etree): - raise AssertionError, self._error_log._buildExceptionMessage( - "Document does not comply with schema") + raise AssertionError(self._error_log._buildExceptionMessage( + "Document does not comply with schema")) property error_log: def __get__(self): Modified: lxml/trunk/src/lxml/relaxng.pxi ============================================================================== --- lxml/trunk/src/lxml/relaxng.pxi (original) +++ lxml/trunk/src/lxml/relaxng.pxi Wed Jan 23 17:10:02 2008 @@ -76,8 +76,10 @@ if _LIBXML_VERSION_INT < 20624: relaxng.xmlRelaxNGFreeParserCtxt(parser_ctxt) _destroyFakeDoc(doc._c_doc, fake_c_doc) - raise RelaxNGParseError, self._error_log._buildExceptionMessage( - "Document is not valid Relax NG") + raise RelaxNGParseError( + self._error_log._buildExceptionMessage( + "Document is not valid Relax NG"), + error_log=self._error_log) if fake_c_doc is not NULL: _destroyFakeDoc(doc._c_doc, fake_c_doc) Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Wed Jan 23 17:10:02 2008 @@ -46,7 +46,7 @@ c_href = _getNs(c_node) if c_href is NULL or \ cstd.strcmp(c_href, 'http://www.w3.org/2001/XMLSchema') != 0: - raise XMLSchemaParseError, "Document is not XML Schema" + raise XMLSchemaParseError("Document is not XML Schema") fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) self._error_log.connect() @@ -60,7 +60,7 @@ self._error_log.connect() parser_ctxt = xmlschema.xmlSchemaNewParserCtxt(_cstr(filename)) else: - raise XMLSchemaParseError, "No tree or file given" + raise XMLSchemaParseError("No tree or file given") if parser_ctxt is not NULL: self._c_schema = xmlschema.xmlSchemaParse(parser_ctxt) @@ -73,8 +73,10 @@ _destroyFakeDoc(doc._c_doc, fake_c_doc) if self._c_schema is NULL: - raise XMLSchemaParseError, self._error_log._buildExceptionMessage( - "Document is not valid XML Schema") + raise XMLSchemaParseError( + self._error_log._buildExceptionMessage( + "Document is not valid XML Schema"), + error_log=self._error_log) def __dealloc__(self): xmlschema.xmlSchemaFree(self._c_schema) From scoder at codespeak.net Wed Jan 23 17:10:06 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 17:10:06 +0100 (CET) Subject: [Lxml-checkins] r50931 - lxml/trunk Message-ID: <20080123161006.946E516846A@codespeak.net> Author: scoder Date: Wed Jan 23 17:10:06 2008 New Revision: 50931 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3293 at delle: sbehnel | 2008-01-23 16:40:20 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Jan 23 17:10:06 2008 @@ -8,6 +8,8 @@ Features added -------------- +* More accurate exception messages in validator creation. + Bugs fixed ---------- @@ -19,6 +21,9 @@ Other changes ------------- +* ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source + file/filename through the ``file`` keyyword argument. + * The test suite now skips most doctests under Python 2.3. * ``make clean`` no longer removes the .c files (use ``make From scoder at codespeak.net Wed Jan 23 17:10:10 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 17:10:10 +0100 (CET) Subject: [Lxml-checkins] r50932 - in lxml/trunk: . doc Message-ID: <20080123161010.1F7A916846C@codespeak.net> Author: scoder Date: Wed Jan 23 17:10:09 2008 New Revision: 50932 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/validation.txt Log: r3294 at delle: sbehnel | 2008-01-23 16:52:42 +0100 example how to load a DTD from a catalog Modified: lxml/trunk/doc/validation.txt ============================================================================== --- lxml/trunk/doc/validation.txt (original) +++ lxml/trunk/doc/validation.txt Wed Jan 23 17:10:09 2008 @@ -106,6 +106,17 @@ >>> print dtd.error_log.filter_from_errors()[0] :1:0:ERROR:VALID:DTD_NOT_EMPTY: Element b was declared EMPTY this one has content +As an alternative to parsing from a file, you can use the +``external_id`` keyword argument to parse from a catalog:: + + >>> docbook_doctype = "-//OASIS//DTD DocBook XML V4.2//EN" + >>> dtd = etree.DTD(external_id = docbook_doctype) # requires catalog support + + >>> root = etree.XML("
") + >>> dtd.assertValid(root) # doctest: +ELLIPSIS + Traceback (most recent call last): + DocumentInvalid: Element article content does not follow the DTD ... + RelaxNG ------- From scoder at codespeak.net Wed Jan 23 17:10:13 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 17:10:13 +0100 (CET) Subject: [Lxml-checkins] r50933 - lxml/trunk Message-ID: <20080123161013.ECF94168472@codespeak.net> Author: scoder Date: Wed Jan 23 17:10:13 2008 New Revision: 50933 Modified: lxml/trunk/ (props changed) lxml/trunk/setupinfo.py Log: r3295 at delle: sbehnel | 2008-01-23 16:53:56 +0100 fix building without objectify Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Wed Jan 23 17:10:13 2008 @@ -24,7 +24,8 @@ source_extension = ".c" if OPTION_WITHOUT_OBJECTIFY: - modules = [ entry for entry in EXT_MODULES if entry[0] != 'objectify' ] + modules = [ entry for entry in EXT_MODULES + if 'objectify' not in entry[0] ] else: modules = EXT_MODULES From scoder at codespeak.net Wed Jan 23 17:10:17 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 23 Jan 2008 17:10:17 +0100 (CET) Subject: [Lxml-checkins] r50934 - in lxml/trunk: . doc Message-ID: <20080123161017.5457316847B@codespeak.net> Author: scoder Date: Wed Jan 23 17:10:16 2008 New Revision: 50934 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/validation.txt Log: r3296 at delle: sbehnel | 2008-01-23 17:09:34 +0100 doc cleanup Modified: lxml/trunk/doc/validation.txt ============================================================================== --- lxml/trunk/doc/validation.txt (original) +++ lxml/trunk/doc/validation.txt Wed Jan 23 17:10:16 2008 @@ -109,8 +109,8 @@ As an alternative to parsing from a file, you can use the ``external_id`` keyword argument to parse from a catalog:: - >>> docbook_doctype = "-//OASIS//DTD DocBook XML V4.2//EN" - >>> dtd = etree.DTD(external_id = docbook_doctype) # requires catalog support + >>> docbook = "-//OASIS//DTD DocBook XML V4.2//EN" + >>> dtd = etree.DTD(external_id = docbook) # requires catalog support >>> root = etree.XML("
") >>> dtd.assertValid(root) # doctest: +ELLIPSIS From scoder at codespeak.net Thu Jan 24 15:11:47 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 24 Jan 2008 15:11:47 +0100 (CET) Subject: [Lxml-checkins] r50965 - in lxml/trunk: . doc Message-ID: <20080124141147.1EF9516846C@codespeak.net> Author: scoder Date: Thu Jan 24 15:11:46 2008 New Revision: 50965 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/validation.txt Log: r3302 at delle: sbehnel | 2008-01-23 23:27:18 +0100 doctest fix Modified: lxml/trunk/doc/validation.txt ============================================================================== --- lxml/trunk/doc/validation.txt (original) +++ lxml/trunk/doc/validation.txt Thu Jan 24 15:11:46 2008 @@ -115,7 +115,7 @@ >>> root = etree.XML("
") >>> dtd.assertValid(root) # doctest: +ELLIPSIS Traceback (most recent call last): - DocumentInvalid: Element article content does not follow the DTD ... + DocumentInvalid: Element article content does not follow the DTD, ... RelaxNG From scoder at codespeak.net Thu Jan 24 15:11:58 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 24 Jan 2008 15:11:58 +0100 (CET) Subject: [Lxml-checkins] r50966 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080124141158.4BC08168466@codespeak.net> Author: scoder Date: Thu Jan 24 15:11:57 2008 New Revision: 50966 Added: lxml/trunk/src/lxml/xinclude.pxi Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/classlookup.pxi lxml/trunk/src/lxml/docloader.pxi lxml/trunk/src/lxml/dtd.pxi lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/iterparse.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/lxml.objectify.pyx lxml/trunk/src/lxml/lxml.pyclasslookup.pyx lxml/trunk/src/lxml/nsclasses.pxi lxml/trunk/src/lxml/objectpath.pxi lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/relaxng.pxi lxml/trunk/src/lxml/schematron.pxi lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/tests/test_xpathevaluator.py lxml/trunk/src/lxml/xmlid.pxi lxml/trunk/src/lxml/xmlschema.pxi lxml/trunk/src/lxml/xpath.pxi lxml/trunk/src/lxml/xslt.pxi Log: r3303 at delle: sbehnel | 2008-01-24 15:11:15 +0100 exception cleanup, let them carry local error logs where possible Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Jan 24 15:11:57 2008 @@ -21,6 +21,9 @@ Other changes ------------- +* Exceptions carry only the part of the error log that is related to + the operation that caused the error. + * ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source file/filename through the ``file`` keyyword argument. Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Thu Jan 24 15:11:57 2008 @@ -26,9 +26,9 @@ elif isinstance(input, _Document): doc = <_Document>input else: - raise TypeError, "Invalid input object: %s" % type(input) + raise TypeError("Invalid input object: %s" % type(input)) if doc is None: - raise ValueError, "Input object has no document: %s" % type(input) + raise ValueError("Input object has no document: %s" % type(input)) else: return doc @@ -46,9 +46,9 @@ elif isinstance(input, _Document): node = (<_Document>input).getroot() else: - raise TypeError, "Invalid input object: %s" % type(input) + raise TypeError("Invalid input object: %s" % type(input)) if node is None: - raise ValueError, "Input object has no element: %s" % type(input) + raise ValueError("Input object has no element: %s" % type(input)) else: return node @@ -153,7 +153,7 @@ c_node = _createElement(c_doc, name_utf) if c_node is NULL: - python.PyErr_NoMemory() + return python.PyErr_NoMemory() tree.xmlAddChild(parent._c_node, c_node) if text is not None: @@ -173,7 +173,7 @@ cdef xmlNs* c_ns # 'extra' is not checked here (expected to be a keyword dict) if attrib is not None and not hasattr(attrib, 'items'): - raise TypeError, "Invalid attribute dictionary: %s" % type(attrib) + raise TypeError("Invalid attribute dictionary: %s" % type(attrib)) if extra is not None and extra: if attrib is None: attrib = extra @@ -264,7 +264,7 @@ else: c_href = _cstr(ns) if _delAttributeFromNsName(element._c_node, c_href, _cstr(tag)): - raise KeyError, key + raise KeyError(key) return 0 cdef int _delAttributeFromNsName(xmlNode* c_node, char* c_href, char* c_name): @@ -944,7 +944,7 @@ assert isutf8py(s) != -1, \ "All strings must be XML compatible, either Unicode or ASCII" else: - raise TypeError, "Argument must be string or unicode." + raise TypeError("Argument must be string or unicode.") return s cdef object _encodeFilename(object filename): @@ -956,7 +956,7 @@ return python.PyUnicode_AsEncodedString( filename, _C_FILENAME_ENCODING, NULL) else: - raise TypeError, "Argument must be string or unicode." + raise TypeError("Argument must be string or unicode.") cdef object _encodeFilenameUTF8(object filename): """Recode filename as UTF-8. Tries ASCII, local filesystem encoding and @@ -985,7 +985,7 @@ if python.PyUnicode_Check(filename): return python.PyUnicode_AsUTF8String(filename) else: - raise TypeError, "Argument must be string or unicode." + raise TypeError("Argument must be string or unicode.") cdef _getNsTag(tag): """Given a tag, find namespace URI and tag name. @@ -1003,16 +1003,16 @@ c_tag = c_tag + 1 c_ns_end = cstd.strchr(c_tag, c'}') if c_ns_end is NULL: - raise ValueError, "Invalid tag name" + raise ValueError("Invalid tag name") nslen = c_ns_end - c_tag taglen = python.PyString_GET_SIZE(tag) - nslen - 2 if taglen == 0: - raise ValueError, "Empty tag name" + raise ValueError("Empty tag name") if nslen > 0: ns = python.PyString_FromStringAndSize(c_tag, nslen) tag = python.PyString_FromStringAndSize(c_ns_end+1, taglen) elif python.PyString_GET_SIZE(tag) == 0: - raise ValueError, "Empty tag name" + raise ValueError("Empty tag name") return ns, tag cdef int _pyXmlNameIsValid(name_utf8): Modified: lxml/trunk/src/lxml/classlookup.pxi ============================================================================== --- lxml/trunk/src/lxml/classlookup.pxi (original) +++ lxml/trunk/src/lxml/classlookup.pxi Thu Jan 24 15:11:57 2008 @@ -106,28 +106,28 @@ elif issubclass(element, ElementBase): self.element_class = element else: - raise TypeError, "element class must be subclass of ElementBase" + raise TypeError("element class must be subclass of ElementBase") if comment is None: self.comment_class = _Comment elif issubclass(comment, CommentBase): self.comment_class = comment else: - raise TypeError, "comment class must be subclass of CommentBase" + raise TypeError("comment class must be subclass of CommentBase") if entity is None: self.entity_class = _Entity elif issubclass(entity, EntityBase): self.entity_class = entity else: - raise TypeError, "Entity class must be subclass of EntityBase" + raise TypeError("Entity class must be subclass of EntityBase") if pi is None: self.pi_class = None # special case, see below elif issubclass(pi, PIBase): self.pi_class = pi else: - raise TypeError, "PI class must be subclass of PIBase" + raise TypeError("PI class must be subclass of PIBase") cdef object _lookupDefaultElementClass(state, _Document _doc, xmlNode* c_node): "Trivial class lookup function that always returns the default class." Modified: lxml/trunk/src/lxml/docloader.pxi ============================================================================== --- lxml/trunk/src/lxml/docloader.pxi (original) +++ lxml/trunk/src/lxml/docloader.pxi Thu Jan 24 15:11:57 2008 @@ -68,7 +68,7 @@ try: f.read except AttributeError: - raise TypeError, "Argument is not a file-like object" + raise TypeError("Argument is not a file-like object") doc_ref = _InputDocument() doc_ref._type = PARSER_DATA_FILE doc_ref._filename = _getFilenameForFile(f) Modified: lxml/trunk/src/lxml/dtd.pxi ============================================================================== --- lxml/trunk/src/lxml/dtd.pxi (original) +++ lxml/trunk/src/lxml/dtd.pxi Thu Jan 24 15:11:57 2008 @@ -49,7 +49,7 @@ if self._c_dtd is NULL: raise DTDParseError( self._error_log._buildExceptionMessage("error parsing DTD"), - error_log=self._error_log) + self._error_log) def __dealloc__(self): tree.xmlFreeDtd(self._c_dtd) @@ -72,7 +72,8 @@ valid_ctxt = dtdvalid.xmlNewValidCtxt() if valid_ctxt is NULL: self._error_log.disconnect() - raise DTDError, "Failed to create validation context" + raise DTDError("Failed to create validation context", + self._error_log) c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) with nogil: @@ -83,7 +84,8 @@ self._error_log.disconnect() if ret == -1: - raise DTDValidateError("Internal error in DTD validation") + raise DTDValidateError("Internal error in DTD validation", + self._error_log) if ret == 1: return True else: @@ -105,7 +107,7 @@ exc_context._raise_if_stored() if c_dtd is NULL: - raise DTDParseError("error parsing DTD", error_log=error_log) + raise DTDParseError("error parsing DTD", error_log) return c_dtd cdef extern from "etree_defs.h": Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Thu Jan 24 15:11:57 2008 @@ -57,8 +57,8 @@ for extension in extensions: for (ns_uri, name), function in extension.items(): if name is None: - raise ValueError, \ - "extensions must have non empty names" + raise ValueError( + "extensions must have non empty names") ns_utf = self._to_utf(ns_uri) name_utf = self._to_utf(name) python.PyDict_SetItem( @@ -72,11 +72,11 @@ ns = [] for prefix, ns_uri in namespaces: if prefix is None or not prefix: - raise TypeError, \ - "empty namespace prefix is not supported in XPath" + raise TypeError( + "empty namespace prefix is not supported in XPath") if ns_uri is None or not ns_uri: - raise TypeError, \ - "setting default namespace is not supported in XPath" + raise TypeError( + "setting default namespace is not supported in XPath") prefix_utf = self._to_utf(prefix) ns_uri_utf = self._to_utf(ns_uri) python.PyList_Append(ns, (prefix_utf, ns_uri_utf)) @@ -139,7 +139,7 @@ cdef addNamespace(self, prefix, ns_uri): if prefix is None: - raise TypeError, "empty prefix is not supported in XPath" + raise TypeError("empty prefix is not supported in XPath") prefix_utf = self._to_utf(prefix) ns_uri_utf = self._to_utf(ns_uri) new_item = (prefix_utf, ns_uri_utf) @@ -161,7 +161,7 @@ cdef registerNamespace(self, prefix, ns_uri): if prefix is None: - raise TypeError, "empty prefix is not supported in XPath" + raise TypeError("empty prefix is not supported in XPath") prefix_utf = self._to_utf(prefix) ns_uri_utf = self._to_utf(ns_uri) python.PyList_Append(self._global_namespaces, prefix_utf) @@ -279,17 +279,17 @@ def __get__(self): cdef xmlNode* c_node if self._xpathCtxt is NULL: - raise XPathError, \ - "XPath context is only usable during the evaluation" + raise XPathError( + "XPath context is only usable during the evaluation") c_node = self._xpathCtxt.node if c_node is NULL: - raise XPathError, "no context node" + raise XPathError("no context node") if c_node.doc != self._xpathCtxt.doc: - raise XPathError, \ - "document-external context nodes are not supported" + raise XPathError( + "document-external context nodes are not supported") if self._doc is None: - raise XPathError, \ - "document context is missing" + raise XPathError( + "document context is missing") return _elementFactory(self._doc, c_node) property eval_context: @@ -475,15 +475,15 @@ xpath.xmlXPathNodeSetAdd(resultSet, node._c_node) else: xpath.xmlXPathFreeNodeSet(resultSet) - raise XPathResultError, "This is not a node: %s" % element + raise XPathResultError("This is not a node: %r" % element) else: - raise XPathResultError, "Unknown return type: %s" % type(obj) + raise XPathResultError("Unknown return type: %s" % type(obj)) return xpath.xmlXPathWrapNodeSet(resultSet) cdef object _unwrapXPathObject(xpath.xmlXPathObject* xpathObj, _Document doc): if xpathObj.type == xpath.XPATH_UNDEFINED: - raise XPathResultError, "Undefined xpath result" + raise XPathResultError("Undefined xpath result") elif xpathObj.type == xpath.XPATH_NODESET: return _createNodeSetResult(xpathObj, doc) elif xpathObj.type == xpath.XPATH_BOOLEAN: @@ -503,7 +503,7 @@ elif xpathObj.type == xpath.XPATH_XSLT_TREE: raise NotImplementedError else: - raise XPathResultError, "Unknown xpath result %s" % str(xpathObj.type) + raise XPathResultError("Unknown xpath result %s" % str(xpathObj.type)) cdef object _createNodeSetResult(xpath.xmlXPathObject* xpathObj, _Document doc): cdef xmlNode* c_node @@ -543,8 +543,8 @@ c_node.type == tree.XML_XINCLUDE_END: continue else: - raise NotImplementedError, \ - "Not yet implemented result node type: %d" % c_node.type + raise NotImplementedError( + "Not yet implemented result node type: %d" % c_node.type) python.PyList_Append(result, value) return result Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Thu Jan 24 15:11:57 2008 @@ -340,7 +340,7 @@ return _IterparseContext() def copy(self): - raise TypeError, "iterparse parsers cannot be copied" + raise TypeError("iterparse parsers cannot be copied") def __iter__(self): return self @@ -366,7 +366,7 @@ data = self._source.read(__ITERPARSE_CHUNK_SIZE) if not python.PyString_Check(data): self._source = None - raise TypeError, "reading file objects must return plain strings" + raise TypeError("reading file objects must return plain strings") elif data: if self._for_html: error = htmlparser.htmlParseChunk( Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Jan 24 15:11:57 2008 @@ -2,7 +2,6 @@ from tree cimport xmlDoc, xmlNode, xmlAttr, xmlNs, _isElement, _getNs from python cimport callable, _cstr, _isString cimport xpath -cimport xinclude cimport c14n cimport cstd @@ -93,18 +92,17 @@ """Main exception base class for lxml. All other exceptions inherit from this one. """ - def __init__(self, *args, error_log=None): - _initError(self, *args) + def __init__(self, message, error_log=None): + _initError(self, message) if error_log is None: - self.error_log = __copyGlobalErrorLog() - else: - self.error_log = error_log.copy() + error_log = __copyGlobalErrorLog() + self.error_log = error_log.copy() cdef object _LxmlError _LxmlError = LxmlError -def _superError(obj, *args): - super(_LxmlError, obj).__init__(*args) +def _superError(obj, message): + super(_LxmlError, obj).__init__(message) cdef object _initError if isinstance(_LxmlError, type): @@ -121,11 +119,6 @@ """ pass -class XIncludeError(LxmlError): - """Error during XInclude processing. - """ - pass - class C14NError(LxmlError): """Error during C14N serialisation. """ @@ -447,7 +440,7 @@ self._doc = _documentOrRaise(tree) root_name, public_id, system_url = self._doc.getdoctype() if not root_name and (public_id or system_url): - raise ValueError, "Could not find root node" + raise ValueError("Could not find root node") property root_name: "Returns the name of the root node as defined by the DOCTYPE." @@ -564,7 +557,7 @@ element = value c_node = _findChild(self._c_node, x) if c_node is NULL: - raise IndexError, "list index out of range" + raise IndexError("list index out of range") c_next = element._c_node.next _removeText(c_node.next) tree.xmlReplaceNode(c_node, element._c_node) @@ -719,7 +712,7 @@ cdef xmlNode* c_next c_node = element._c_node if c_node.parent is not self._c_node: - raise ValueError, "Element is not a child of this node." + raise ValueError("Element is not a child of this node.") c_next = element._c_node.next tree.xmlUnlinkNode(c_node) _moveTail(c_next, c_node) @@ -736,7 +729,7 @@ cdef xmlNode* c_new_next c_old_node = old_element._c_node if c_old_node.parent is not self._c_node: - raise ValueError, "Element is not a child of this node." + raise ValueError("Element is not a child of this node.") c_old_next = c_old_node.next c_new_node = new_element._c_node c_new_next = c_new_node.next @@ -893,7 +886,7 @@ # indexing c_node = _findChild(self._c_node, x) if c_node is NULL: - raise IndexError, "list index out of range" + raise IndexError("list index out of range") return _elementFactory(self._doc, c_node) def __len__(self): @@ -935,7 +928,7 @@ cdef xmlNode* c_start_node c_child = x._c_node if c_child.parent is not self._c_node: - raise ValueError, "Element is not a child of this node." + raise ValueError("Element is not a child of this node.") # handle the unbounded search straight away (normal case) if stop is None and (start is None or start == 0): @@ -958,7 +951,7 @@ c_stop = stop if c_stop == 0 or \ c_start >= c_stop and (c_stop > 0 or c_start < 0): - raise ValueError, "list.index(x): x not in slice" + raise ValueError("list.index(x): x not in slice") # for negative slice indices, check slice before searching index if c_start < 0 or c_stop < 0: @@ -976,9 +969,9 @@ if c_start_node == c_child: # found! before slice end? if c_stop < 0 and l <= -c_stop: - raise ValueError, "list.index(x): x not in slice" + raise ValueError("list.index(x): x not in slice") elif c_start < 0: - raise ValueError, "list.index(x): x not in slice" + raise ValueError("list.index(x): x not in slice") # now determine the index backwards from child c_child = c_child.prev @@ -1003,9 +996,9 @@ else: return k if c_start != 0 or c_stop != 0: - raise ValueError, "list.index(x): x not in slice" + raise ValueError("list.index(x): x not in slice") else: - raise ValueError, "list.index(x): x not in list" + raise ValueError("list.index(x): x not in list") def get(self, key, default=None): """Gets an element attribute. @@ -1404,7 +1397,7 @@ """Relocate the ElementTree to a new root node. """ if root._c_node.type != tree.XML_ELEMENT_NODE: - raise TypeError, "Only elements can be the root of an ElementTree" + raise TypeError("Only elements can be the root of an ElementTree") self._context_node = root self._doc = None @@ -1476,12 +1469,12 @@ cdef char* c_path doc = self._context_node._doc if element._doc is not doc: - raise ValueError, "Element is not in this tree." + raise ValueError("Element is not in this tree.") c_doc = _fakeRootDoc(doc._c_doc, self._context_node._c_node) c_path = tree.xmlGetNodePath(element._c_node) _destroyFakeDoc(doc._c_doc, c_doc) if c_path is NULL: - raise LxmlError, "Error creating node path." + python.PyErr_NoMemory() path = c_path tree.xmlFree(c_path) return path @@ -1641,23 +1634,7 @@ """ cdef int result self._assertHasRoot() - # We cannot pass the XML_PARSE_NOXINCNODE option as this would free - # the XInclude nodes - there may still be Python references to them! - # Therefore, we allow XInclude nodes to be converted to - # XML_XINCLUDE_START nodes. XML_XINCLUDE_END nodes are added as - # siblings. Tree traversal will simply ignore them as they are not - # typed as elements. The included fragment is added between the two, - # i.e. as a sibling, which does not conflict with traversal. - with nogil: - if self._context_node._doc._parser is not None: - result = xinclude.xmlXIncludeProcessTreeFlags( - self._context_node._c_node, - self._context_node._doc._parser._parse_options) - else: - result = xinclude.xmlXIncludeProcessTree( - self._context_node._c_node) - if result == -1: - raise XIncludeError, "XInclude processing failed" + XInclude()(self._context_node) def write_c14n(self, file): """C14N write of document. Always writes UTF-8. @@ -1700,12 +1677,12 @@ def pop(self, key, *default): if python.PyTuple_GET_SIZE(default) > 1: - raise TypeError, "pop expected at most 2 arguments, got %d" % \ - (python.PyTuple_GET_SIZE(default)+1) + raise TypeError("pop expected at most 2 arguments, got %d" % ( + python.PyTuple_GET_SIZE(default)+1)) result = _getAttributeValue(self._element, key, None) if result is None: if python.PyTuple_GET_SIZE(default) == 0: - raise KeyError, key + raise KeyError(key) else: result = python.PyTuple_GET_ITEM(default, 0) python.Py_INCREF(result) @@ -1726,7 +1703,7 @@ def __getitem__(self, key): result = _getAttributeValue(self._element, key, None) if result is None: - raise KeyError, key + raise KeyError(key) else: return result @@ -2286,7 +2263,8 @@ return _tostring((<_ElementTree>element_or_tree)._context_node, encoding, method, write_declaration, 1, pretty_print) else: - raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) + raise TypeError("Type '%s' cannot be serialized." % + type(element_or_tree)) def tostringlist(element_or_tree, *args, **kwargs): """Serialize an element to an encoded string representation of its XML @@ -2316,7 +2294,8 @@ return _tounicode((<_ElementTree>element_or_tree)._context_node, method, 1, pretty_print) else: - raise TypeError, "Type '%s' cannot be serialized." % type(element_or_tree) + raise TypeError("Type '%s' cannot be serialized." % + type(element_or_tree)) def parse(source, _BaseParser parser=None): """Return an ElementTree object loaded with source elements. If no parser @@ -2344,6 +2323,7 @@ include "serializer.pxi" # XML output functions include "iterparse.pxi" # incremental XML parsing include "xmlid.pxi" # XMLID and IDDict +include "xinclude.pxi" # XInclude include "extensions.pxi" # XPath/XSLT extension functions include "xpath.pxi" # XPath evaluation include "xslt.pxi" # XSL transformations @@ -2374,7 +2354,8 @@ "Raises DocumentInvalid if the document does not comply with the schema." if not self(etree): raise DocumentInvalid(self._error_log._buildExceptionMessage( - "Document does not comply with schema")) + "Document does not comply with schema"), + self._error_log) def assert_(self, etree): "Raises AssertionError if the document does not comply with the schema." Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Thu Jan 24 15:11:57 2008 @@ -196,8 +196,8 @@ # properties are looked up /after/ __setattr__, so we must emulate them if tag == 'text' or tag == 'pyval': # read-only ! - raise TypeError, "attribute '%s' of '%s' objects is not writable"% \ - (tag, _typename(self)) + raise TypeError("attribute '%s' of '%s' objects is not writable" % + (tag, _typename(self))) elif tag == 'tail': cetree.setTailText(self._c_node, value) return @@ -256,7 +256,7 @@ if key == 0: return self else: - raise IndexError, key + raise IndexError(key) if key < 0: c_node = c_parent.last else: @@ -264,7 +264,7 @@ c_node = _findFollowingSibling( c_node, tree._getNs(c_self_node), c_self_node.name, key) if c_node is NULL: - raise IndexError, key + raise IndexError(key) return elementFactory(self._doc, c_node) def __setitem__(self, key, value): @@ -299,7 +299,7 @@ c_parent = c_self_node.parent if c_parent is NULL: # the 'root[i] = ...' case - raise TypeError, "assignment to root element is invalid" + raise TypeError("assignment to root element is invalid") if python.PySlice_Check(key): # slice assignment @@ -338,7 +338,7 @@ c_node = _findFollowingSibling( c_node, tree._getNs(c_self_node), c_self_node.name, key) if c_node is NULL: - raise IndexError, key + raise IndexError(key) element = elementFactory(self._doc, c_node) _replaceElement(element, value) @@ -351,7 +351,7 @@ &start, &stop, &step, &slicelength) parent = self.getparent() if parent is None: - raise TypeError, "deleting slices of root element not supported" + raise TypeError("deleting slices of root element not supported") if step < 0: del_items = list(self)[start:stop:step] else: @@ -363,7 +363,7 @@ # normal index deletion parent = self.getparent() if parent is None: - raise TypeError, "deleting items not supported by root element" + raise TypeError("deleting items not supported by root element") sibling = self.__getitem__(key) parent.remove(sibling) @@ -466,8 +466,8 @@ cdef object _lookupChildOrRaise(_Element parent, tag): element = _lookupChild(parent, tag) if element is None: - raise AttributeError, "no such child: " + \ - _buildChildTag(parent, tag) + raise AttributeError("no such child: " + + _buildChildTag(parent, tag)) return element cdef object _buildChildTag(_Element parent, tag): @@ -712,7 +712,7 @@ elif isinstance(other, StringElement): return _numericValueOf(self) * textOf((other)._c_node) else: - raise TypeError, "invalid types for * operator" + raise TypeError("invalid types for * operator") def __mod__(self, other): return _strValueOf(self) % other @@ -756,7 +756,7 @@ if c_str[1] == c'\0' or text == "true" or text.lower() == "true": # '1' or 't' or 'true' return 1 - raise ValueError, "Invalid boolean value: '%s'" % text + raise ValueError("Invalid boolean value: '%s'" % text) def __nonzero__(self): if self._boolval(): @@ -836,13 +836,13 @@ cdef object _schema_types def __init__(self, name, type_check, type_class, stringify=None): if not python._isString(name): - raise TypeError, "Type name must be a string" + raise TypeError("Type name must be a string") if type_check is not None and not callable(type_check): - raise TypeError, "Type check function must be callable (or None)" + raise TypeError("Type check function must be callable (or None)") if name != TREE_PYTYPE_NAME and \ not issubclass(type_class, ObjectifiedDataElement): - raise TypeError, \ - "Data classes must inherit from ObjectifiedDataElement" + raise TypeError( + "Data classes must inherit from ObjectifiedDataElement") self.name = name self._type = type_class self.type_check = type_check @@ -864,7 +864,7 @@ ignored. Raises ValueError if the dependencies cannot be fulfilled. """ if self.name == TREE_PYTYPE_NAME: - raise ValueError, "Cannot register tree type" + raise ValueError("Cannot register tree type") if self.type_check is not None: for item in _TYPE_CHECKS: if item[0] is self.type_check: @@ -886,7 +886,7 @@ if last_pos == -1: _TYPE_CHECKS.append(entry) elif first_pos > last_pos: - raise ValueError, "inconsistent before/after dependencies" + raise ValueError("inconsistent before/after dependencies") else: _TYPE_CHECKS.insert(last_pos, entry) @@ -1620,7 +1620,7 @@ elif isinstance(new_parser, etree.XMLParser): objectify_parser = new_parser else: - raise TypeError, "parser must inherit from lxml.etree.XMLParser" + raise TypeError("parser must inherit from lxml.etree.XMLParser") cdef _Element _makeElement(tag, text, attrib, nsmap): return cetree.makeElement(tag, None, objectify_parser, text, None, attrib, nsmap) @@ -1734,7 +1734,7 @@ prefix, name = _xsi.split(':', 1) ns = nsmap.get(prefix) if ns != XML_SCHEMA_NS: - raise ValueError, "XSD types require the XSD namespace" + raise ValueError("XSD types require the XSD namespace") elif nsmap is _DEFAULT_NSMAP: name = _xsi _xsi = 'xsd:' + _xsi @@ -1746,7 +1746,7 @@ _xsi = prefix + ':' + _xsi break else: - raise ValueError, "XSD types require the XSD namespace" + raise ValueError("XSD types require the XSD namespace") python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_TYPE_ATTR, _xsi) if _pytype is None: # allow using unregistered or even wrong xsi:type names Modified: lxml/trunk/src/lxml/lxml.pyclasslookup.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.pyclasslookup.pyx (original) +++ lxml/trunk/src/lxml/lxml.pyclasslookup.pyx Thu Jan 24 15:11:57 2008 @@ -87,7 +87,7 @@ cdef tree.xmlNode* c_node c_node = cetree.findChild(self._c_node, index) if c_node is NULL: - raise IndexError, "list index out of range" + raise IndexError("list index out of range") return _newProxy(self._source_proxy, c_node) def __getslice__(self, Py_ssize_t start, Py_ssize_t stop): Modified: lxml/trunk/src/lxml/nsclasses.pxi ============================================================================== --- lxml/trunk/src/lxml/nsclasses.pxi (original) +++ lxml/trunk/src/lxml/nsclasses.pxi Thu Jan 24 15:11:57 2008 @@ -52,14 +52,14 @@ cdef python.PyObject* dict_result dict_result = python.PyDict_GetItem(self._entries, name) if dict_result is NULL: - raise KeyError, "Name not registered." + raise KeyError("Name not registered.") return dict_result cdef object _getForString(self, char* name): cdef python.PyObject* dict_result dict_result = python.PyDict_GetItemString(self._entries, name) if dict_result is NULL: - raise KeyError, "Name not registered." + raise KeyError("Name not registered.") return dict_result def __iter__(self): @@ -78,8 +78,8 @@ "Dictionary-like registry for namespace implementation classes" def __setitem__(self, name, item): if not python.PyType_Check(item) or not issubclass(item, ElementBase): - raise NamespaceRegistryError, \ - "Registered element classes must be subtypes of ElementBase" + raise NamespaceRegistryError( + "Registered element classes must be subtypes of ElementBase") if name is not None: name = _utf8(name) self._entries[name] = item @@ -173,11 +173,11 @@ cdef class _FunctionNamespaceRegistry(_NamespaceRegistry): def __setitem__(self, name, item): if not callable(item): - raise NamespaceRegistryError, \ - "Registered functions must be callable." + raise NamespaceRegistryError( + "Registered functions must be callable.") if not name: - raise ValueError, \ - "extensions must have non empty names" + raise ValueError( + "extensions must have non empty names") self._entries[_utf8(name)] = item def __repr__(self): Modified: lxml/trunk/src/lxml/objectpath.pxi ============================================================================== --- lxml/trunk/src/lxml/objectpath.pxi (original) +++ lxml/trunk/src/lxml/objectpath.pxi Thu Jan 24 15:11:57 2008 @@ -49,7 +49,7 @@ python.Py_INCREF(default) use_default = 1 elif use_default > 1: - raise TypeError, "invalid number of arguments: needs one or two" + raise TypeError("invalid number of arguments: needs one or two") return _findObjectPath(root, self._c_path, self._path_len, default, use_default) @@ -107,15 +107,15 @@ # path '.child' => ignore root python.PyList_Append(new_path, _RELATIVE_PATH_SEGMENT) elif index != 0: - raise ValueError, "index not allowed on root node" + raise ValueError("index not allowed on root node") elif not has_dot: - raise ValueError, "invalid path" + raise ValueError("invalid path") python.PyList_Append(new_path, (ns, name, index)) path_pos = match.end() if python.PyList_GET_SIZE(new_path) == 0 or \ python.PyString_GET_SIZE(path) > path_pos: - raise ValueError, "invalid path" + raise ValueError("invalid path") return new_path cdef _parseObjectPathList(path): @@ -140,17 +140,17 @@ else: index_end = cstd.strchr(index_pos + 1, c']') if index_end is NULL: - raise ValueError, "index must be enclosed in []" + raise ValueError("index must be enclosed in []") index = python.PyNumber_Int( python.PyString_FromStringAndSize( index_pos + 1, (index_end - index_pos - 1))) if python.PyList_GET_SIZE(new_path) == 0 and index != 0: - raise ValueError, "index not allowed on root node" + raise ValueError("index not allowed on root node") name = python.PyString_FromStringAndSize( c_name, (index_pos - c_name)) python.PyList_Append(new_path, (ns, name, index)) if python.PyList_GET_SIZE(new_path) == 0: - raise ValueError, "invalid path" + raise ValueError("invalid path") return new_path cdef _ObjectPath* _buildObjectPathSegments(path_list) except NULL: @@ -188,8 +188,9 @@ if c_href is NULL or c_href[0] == c'\0': c_href = tree._getNs(c_node) if not cetree.tagMatches(c_node, c_href, c_name): - raise ValueError, "root element does not match: need %s, got %s" % \ - (cetree.namespacedNameFromNsName(c_href, c_name), root.tag) + raise ValueError( + "root element does not match: need %s, got %s" % + (cetree.namespacedNameFromNsName(c_href, c_name), root.tag)) while c_node is not NULL: c_path_len = c_path_len - 1 @@ -214,7 +215,7 @@ return default_value else: tag = cetree.namespacedNameFromNsName(c_href, c_name) - raise AttributeError, "no such child: " + tag + raise AttributeError("no such child: " + tag) cdef _createObjectPath(_Element root, _ObjectPath* c_path, Py_ssize_t c_path_len, int replace, value): @@ -229,7 +230,7 @@ cdef char* c_name cdef Py_ssize_t c_index if c_path_len == 1: - raise TypeError, "cannot update root node" + raise TypeError("cannot update root node") c_node = root._c_node c_name = c_path[0].name @@ -237,8 +238,9 @@ if c_href is NULL or c_href[0] == c'\0': c_href = tree._getNs(c_node) if not cetree.tagMatches(c_node, c_href, c_name): - raise ValueError, "root element does not match: need %s, got %s" % \ - (cetree.namespacedNameFromNsName(c_href, c_name), root.tag) + raise ValueError( + "root element does not match: need %s, got %s" % + (cetree.namespacedNameFromNsName(c_href, c_name), root.tag)) while c_path_len > 1: c_path_len = c_path_len - 1 @@ -257,8 +259,8 @@ if c_child is not NULL: c_node = c_child elif c_index != 0: - raise TypeError, \ - "creating indexed path attributes is not supported" + raise TypeError( + "creating indexed path attributes is not supported") elif c_path_len == 1: _appendValue(cetree.elementFactory(root._doc, c_node), cetree.namespacedNameFromNsName(c_href, c_name), Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Thu Jan 24 15:11:57 2008 @@ -269,7 +269,8 @@ if remaining <= 0: self._bytes = self._filelike.read(c_size) if not python.PyString_Check(self._bytes): - raise TypeError, "reading file objects must return plain strings" + raise TypeError( + "reading file objects must return plain strings") remaining = python.PyString_GET_SIZE(self._bytes) self._bytes_read = 0 if remaining == 0: @@ -417,7 +418,7 @@ result = python.PyThread_acquire_lock( self._lock, python.WAIT_LOCK) if result == 0: - raise ParserError, "parser locking failed" + raise ParserError("parser locking failed") self._error_log.connect() if self._validator is not None: self._validator.connect(self._c_ctxt) @@ -559,7 +560,7 @@ if not isinstance(self, HTMLParser) and \ not isinstance(self, XMLParser) and \ not isinstance(self, iterparse): - raise TypeError, "This class cannot be instantiated" + raise TypeError("This class cannot be instantiated") self._parse_options = parse_options self._filename = filename @@ -579,7 +580,7 @@ c_encoding = tree.xmlParseCharEncoding(_cstr(encoding)) if c_encoding == tree.XML_CHAR_ENCODING_ERROR or \ c_encoding == tree.XML_CHAR_ENCODING_NONE: - raise LookupError, "unknown encoding: '%s'" % encoding + raise LookupError("unknown encoding: '%s'" % encoding) self._default_encoding = encoding self._default_encoding_int = c_encoding @@ -758,7 +759,7 @@ cdef xmlparser.xmlParserCtxt* pctxt cdef char* c_encoding if c_len > python.INT_MAX: - raise ParserError, "string is too long to parse it with libxml2" + raise ParserError("string is too long to parse it with libxml2") context = self._getParserContext() context.prepare() @@ -891,13 +892,13 @@ py_buffer_len = python.PyString_GET_SIZE(data) elif python.PyUnicode_Check(data): if _UNICODE_ENCODING is NULL: - raise ParserError, \ - "Unicode parsing is not supported on this platform" + raise ParserError( + "Unicode parsing is not supported on this platform") c_encoding = _UNICODE_ENCODING c_data = python.PyUnicode_AS_DATA(data) py_buffer_len = python.PyUnicode_GET_DATA_SIZE(data) else: - raise TypeError, "Parsing requires string data" + raise TypeError("Parsing requires string data") context = self._getPushParserContext() pctxt = context._c_ctxt @@ -1719,13 +1720,13 @@ cdef xmlDoc* c_doc if python.PyUnicode_Check(text): if _hasEncodingDeclaration(text): - raise ValueError, \ - "Unicode strings with encoding declaration are not supported." + raise ValueError( + "Unicode strings with encoding declaration are not supported.") # pass native unicode only if libxml2 can handle it if _UNICODE_ENCODING is NULL: text = python.PyUnicode_AsUTF8String(text) elif not python.PyString_Check(text): - raise ValueError, "can only parse strings" + raise ValueError("can only parse strings") if python.PyUnicode_Check(url): url = python.PyUnicode_AsUTF8String(url) c_doc = _parseDoc(text, url, parser) Modified: lxml/trunk/src/lxml/relaxng.pxi ============================================================================== --- lxml/trunk/src/lxml/relaxng.pxi (original) +++ lxml/trunk/src/lxml/relaxng.pxi Thu Jan 24 15:11:57 2008 @@ -46,7 +46,7 @@ if c_href is NULL or \ cstd.strcmp(c_href, 'http://relaxng.org/ns/structure/1.0') != 0: - raise RelaxNGParseError, "Document is not Relax NG" + raise RelaxNGParseError("Document is not Relax NG") self._error_log.connect() fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc) @@ -59,13 +59,16 @@ self._error_log.connect() parser_ctxt = relaxng.xmlRelaxNGNewParserCtxt(_cstr(filename)) else: - raise RelaxNGParseError, "No tree or file given" + raise RelaxNGParseError("No tree or file given") if parser_ctxt is NULL: self._error_log.disconnect() if fake_c_doc is not NULL: _destroyFakeDoc(doc._c_doc, fake_c_doc) - raise RelaxNGParseError, "Document is not parsable as Relax NG" + raise RelaxNGParseError( + self._error_log._buildExceptionMessage( + "Document is not parsable as Relax NG"), + self._error_log) self._c_schema = relaxng.xmlRelaxNGParse(parser_ctxt) self._error_log.disconnect() @@ -79,7 +82,7 @@ raise RelaxNGParseError( self._error_log._buildExceptionMessage( "Document is not valid Relax NG"), - error_log=self._error_log) + self._error_log) if fake_c_doc is not NULL: _destroyFakeDoc(doc._c_doc, fake_c_doc) @@ -114,7 +117,9 @@ self._error_log.disconnect() if ret == -1: - raise RelaxNGValidateError, "Internal error in Relax NG validation" + raise RelaxNGValidateError( + "Internal error in Relax NG validation", + self._error_log) if ret == 0: return True else: Modified: lxml/trunk/src/lxml/schematron.pxi ============================================================================== --- lxml/trunk/src/lxml/schematron.pxi (original) +++ lxml/trunk/src/lxml/schematron.pxi Thu Jan 24 15:11:57 2008 @@ -80,6 +80,7 @@ cdef xmlDoc* c_doc cdef char* c_href cdef schematron.xmlSchematronParserCtxt* parser_ctxt + _Validator.__init__(self) if not config.ENABLE_SCHEMATRON: raise SchematronError( "lxml.etree was compiled without Schematron support.") @@ -88,6 +89,7 @@ doc = _documentOrRaise(etree) root_node = _rootNodeOrRaise(etree) c_doc = _copyDocRoot(doc._c_doc, root_node._c_node) + self._error_log.connect() parser_ctxt = schematron.xmlSchematronNewDocParserCtxt(c_doc) elif file is not None: filename = _getFilenameForFile(file) @@ -95,21 +97,24 @@ # XXX assume a string object filename = file filename = _encodeFilename(filename) + self._error_log.connect() parser_ctxt = schematron.xmlSchematronNewParserCtxt(_cstr(filename)) c_doc = NULL else: raise SchematronParseError("No tree or file given") if parser_ctxt is NULL: + self._error_log.disconnect() python.PyErr_NoMemory() self._c_schema = schematron.xmlSchematronParse(parser_ctxt) + self._error_log.disconnect() schematron.xmlSchematronFreeParserCtxt(parser_ctxt) if self._c_schema is NULL: raise SchematronParseError( - "Document is not a valid Schematron schema") - _Validator.__init__(self) + "Document is not a valid Schematron schema", + self._error_log) def __dealloc__(self): schematron.xmlSchematronFree(self._c_schema) @@ -136,7 +141,7 @@ valid_ctxt = schematron.xmlSchematronNewValidCtxt( self._c_schema, options) if valid_ctxt is NULL: - raise SchematronError("Failed to create validation context") + return python.PyErr_NoMemory() self._error_log.connect() c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) @@ -149,7 +154,8 @@ if ret == -1: raise SchematronValidateError( - "Internal error in Schematron validation") + "Internal error in Schematron validation", + self._error_log) if ret == 0: return True else: Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Thu Jan 24 15:11:57 2008 @@ -15,7 +15,7 @@ return OUTPUT_METHOD_HTML if method == "text": return OUTPUT_METHOD_TEXT - raise ValueError, "unknown output method %r" % method + raise ValueError("unknown output method %r" % method) cdef _textToString(xmlNode* c_node, encoding): cdef char* c_text @@ -67,12 +67,12 @@ # encoding during output enchandler = tree.xmlFindCharEncodingHandler(c_enc) if enchandler is NULL: - raise LookupError, python.PyString_FromFormat( - "unknown encoding: '%s'", c_enc) + raise LookupError(python.PyString_FromFormat( + "unknown encoding: '%s'", c_enc)) c_buffer = tree.xmlAllocOutputBuffer(enchandler) if c_buffer is NULL: tree.xmlCharEncCloseFunc(enchandler) - raise LxmlError, "Failed to create output buffer" + return python.PyErr_NoMemory() with nogil: _writeNodeToBuffer(c_buffer, element._c_node, c_enc, c_method, @@ -108,7 +108,7 @@ return python.PyUnicode_FromEncodedObject(text, 'utf-8', 'strict') c_buffer = tree.xmlAllocOutputBuffer(NULL) if c_buffer is NULL: - raise LxmlError, "Failed to create output buffer" + return python.PyErr_NoMemory() with nogil: _writeNodeToBuffer(c_buffer, element._c_node, NULL, c_method, 0, @@ -285,13 +285,13 @@ _writeFilelikeWriter, _closeFilelikeWriter, self, enchandler) if c_buffer is NULL: - raise IOError, "Could not create I/O writer context." + raise IOError("Could not create I/O writer context.") return c_buffer cdef int write(self, char* c_buffer, int size): try: if self._filelike is None: - raise IOError, "File is already closed" + raise IOError("File is already closed") py_buffer = python.PyString_FromStringAndSize(c_buffer, size) self._filelike.write(py_buffer) return size @@ -335,22 +335,22 @@ return enchandler = tree.xmlFindCharEncodingHandler(c_enc) if enchandler is NULL: - raise LookupError, python.PyString_FromFormat( - "unknown encoding: '%s'", c_enc) + raise LookupError(python.PyString_FromFormat( + "unknown encoding: '%s'", c_enc)) if _isString(f): filename8 = _encodeFilename(f) c_buffer = tree.xmlOutputBufferCreateFilename( _cstr(filename8), enchandler, 0) if c_buffer is NULL: - python.PyErr_SetFromErrno(IOError) + return python.PyErr_SetFromErrno(IOError) state = python.PyEval_SaveThread() elif hasattr(f, 'write'): writer = _FilelikeWriter(f) c_buffer = writer._createOutputBuffer(enchandler) else: tree.xmlCharEncCloseFunc(enchandler) - raise TypeError, "File or filename expected, got '%s'" % type(f) + raise TypeError("File or filename expected, got '%s'" % type(f)) _writeNodeToBuffer(c_buffer, element._c_node, c_enc, c_method, write_xml_declaration, write_doctype, pretty_print) @@ -386,7 +386,7 @@ writer.error_log.disconnect() tree.xmlOutputBufferClose(c_buffer) else: - raise TypeError, "File or filename expected, got '%s'" % type(f) + raise TypeError("File or filename expected, got '%s'" % type(f)) finally: _destroyFakeDoc(c_base_doc, c_doc) @@ -399,14 +399,14 @@ errors = writer.error_log if len(errors): message = errors[0].message - raise C14NError, message + raise C14NError(message) # dump node to file (mainly for debug) cdef _dumpToFile(f, xmlNode* c_node, bint pretty_print): cdef tree.xmlOutputBuffer* c_buffer if not python.PyFile_Check(f): - raise ValueError, "Not a file" + raise ValueError("not a file") c_buffer = tree.xmlOutputBufferCreateFile(python.PyFile_AsFile(f), NULL) tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, pretty_print, NULL) _writeTail(c_buffer, c_node, NULL, 0) Modified: lxml/trunk/src/lxml/tests/test_xpathevaluator.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xpathevaluator.py (original) +++ lxml/trunk/src/lxml/tests/test_xpathevaluator.py Thu Jan 24 15:11:57 2008 @@ -618,7 +618,7 @@ >>> e.evaluate("resultTypesTest()") Traceback (most recent call last): ... - XPathResultError: This is not a node: x + XPathResultError: This is not a node: 'x' >>> try: ... e.evaluate("resultTypesTest2()") ... except etree.XPathResultError: Added: lxml/trunk/src/lxml/xinclude.pxi ============================================================================== --- (empty file) +++ lxml/trunk/src/lxml/xinclude.pxi Thu Jan 24 15:11:57 2008 @@ -0,0 +1,45 @@ +# XInclude processing + +cimport xinclude + +class XIncludeError(LxmlError): + """Error during XInclude processing. + """ + pass + +cdef class XInclude: + """XInclude processor. + + Create an instance and call it on an Element to run XInclude + processing. + """ + cdef _ErrorLog _error_log + def __init__(self): + self._error_log = _ErrorLog() + + property error_log: + def __get__(self): + return self._error_log.copy() + + def __call__(self, _Element node not None): + # We cannot pass the XML_PARSE_NOXINCNODE option as this would free + # the XInclude nodes - there may still be Python references to them! + # Therefore, we allow XInclude nodes to be converted to + # XML_XINCLUDE_START nodes. XML_XINCLUDE_END nodes are added as + # siblings. Tree traversal will simply ignore them as they are not + # typed as elements. The included fragment is added between the two, + # i.e. as a sibling, which does not conflict with traversal. + self._error_log.connect() + with nogil: + if node._doc._parser is not None: + result = xinclude.xmlXIncludeProcessTreeFlags( + node._c_node, node._doc._parser._parse_options) + else: + result = xinclude.xmlXIncludeProcessTree(node._c_node) + self._error_log.disconnect() + + if result == -1: + raise XIncludeError( + self._error_log._buildExceptionMessage( + "XInclude processing failed"), + self._error_log) Modified: lxml/trunk/src/lxml/xmlid.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlid.pxi (original) +++ lxml/trunk/src/lxml/xmlid.pxi Thu Jan 24 15:11:57 2008 @@ -62,7 +62,7 @@ cdef _Document doc doc = _documentOrRaise(etree) if doc._c_doc.ids is NULL: - raise ValueError, "No ID dictionary available." + raise ValueError("No ID dictionary available.") self._doc = doc self._keys = None self._items = None @@ -78,10 +78,10 @@ id_utf = _utf8(id_name) c_id = tree.xmlHashLookup(c_ids, _cstr(id_utf)) if c_id is NULL: - raise KeyError, "Key not found." + raise KeyError("key not found.") c_attr = c_id.attr if c_attr is NULL or c_attr.parent is NULL: - raise KeyError, "ID attribute not found." + raise KeyError("ID attribute not found.") return _elementFactory(self._doc, c_attr.parent) def get(self, id_name): Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Thu Jan 24 15:11:57 2008 @@ -76,7 +76,7 @@ raise XMLSchemaParseError( self._error_log._buildExceptionMessage( "Document is not valid XML Schema"), - error_log=self._error_log) + self._error_log) def __dealloc__(self): xmlschema.xmlSchemaFree(self._c_schema) @@ -99,7 +99,7 @@ valid_ctxt = xmlschema.xmlSchemaNewValidCtxt(self._c_schema) if valid_ctxt is NULL: self._error_log.disconnect() - raise XMLSchemaError, "Failed to create validation context" + return python.PyErr_NoMemory() c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) with nogil: @@ -111,7 +111,8 @@ self._error_log.disconnect() if ret == -1: raise XMLSchemaValidateError( - "Internal error in XML Schema validation.") + "Internal error in XML Schema validation.", + self._error_log) if ret == 0: return True else: @@ -144,7 +145,7 @@ self._valid_ctxt = xmlschema.xmlSchemaNewValidCtxt( self._schema._c_schema) if self._valid_ctxt is NULL: - raise XMLSchemaError, "Failed to create validation context" + return python.PyErr_NoMemory() self._sax_plug = xmlschema.xmlSchemaSAXPlug( self._valid_ctxt, &c_ctxt.sax, &c_ctxt.userData) Modified: lxml/trunk/src/lxml/xpath.pxi ============================================================================== --- lxml/trunk/src/lxml/xpath.pxi (original) +++ lxml/trunk/src/lxml/xpath.pxi Thu Jan 24 15:11:57 2008 @@ -154,7 +154,7 @@ result = python.PyThread_acquire_lock( self._eval_lock, python.WAIT_LOCK) if result == 0: - raise ParserError, "parser locking failed" + raise ParserError("parser locking failed") return 0 cdef void _unlock(self): @@ -167,9 +167,10 @@ if entries: message = entries._buildExceptionMessage(None) if message is not None: - raise XPathSyntaxError, message - raise XPathSyntaxError, self._error_log._buildExceptionMessage( - "Error in xpath expression") + raise XPathSyntaxError(message, self._error_log) + raise XPathSyntaxError(self._error_log._buildExceptionMessage( + "Error in xpath expression"), + self._error_log) cdef _raise_eval_error(self): cdef _BaseErrorLog entries @@ -179,9 +180,10 @@ if entries: message = entries._buildExceptionMessage(None) if message is not None: - raise XPathEvalError, message - raise XPathEvalError, self._error_log._buildExceptionMessage( - "Error in xpath expression") + raise XPathEvalError(message, self._error_log) + raise XPathEvalError(self._error_log._buildExceptionMessage( + "Error in xpath expression"), + self._error_log) cdef object _handle_result(self, xpath.xmlXPathObject* xpathObj, _Document doc): if self._context._exc._has_raised(): Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Thu Jan 24 15:11:57 2008 @@ -183,7 +183,7 @@ read_network=True, write_network=True): self._prefs = xslt.xsltNewSecurityPrefs() if self._prefs is NULL: - raise XSLTError, "Error preparing access control context" + python.PyErr_NoMemory() self._setAccess(xslt.XSLT_SECPREF_READ_FILE, read_file) self._setAccess(xslt.XSLT_SECPREF_WRITE_FILE, write_file) self._setAccess(xslt.XSLT_SECPREF_CREATE_DIRECTORY, create_dir) @@ -230,8 +230,8 @@ if extensions is not None: for ns, prefix in extensions: if ns is None: - raise XSLTExtensionError, \ - "extensions must not have empty namespaces" + raise XSLTExtensionError( + "extensions must not have empty namespaces") _BaseContext.__init__(self, namespaces, extensions, enable_regexp) cdef register_context(self, xslt.xsltTransformContext* xsltCtxt, @@ -308,11 +308,13 @@ # last error seems to be the most accurate here if self._error_log.last_error is not None and \ self._error_log.last_error.message: - raise XSLTParseError(self._error_log.last_error.message) + raise XSLTParseError(self._error_log.last_error.message, + self._error_log) else: raise XSLTParseError( self._error_log._buildExceptionMessage( - "Cannot parse stylesheet")) + "Cannot parse stylesheet"), + self._error_log) c_doc._private = NULL # no longer used! self._c_style = c_style @@ -415,7 +417,7 @@ message = "Error applying stylesheet, line %d" % error.line else: message = "Error applying stylesheet" - raise XSLTApplyError, message + raise XSLTApplyError(message, self._error_log) finally: if resolver_context is not None: resolver_context.clear() @@ -513,7 +515,7 @@ r = xslt.xsltSaveResultToString(s, l, doc._c_doc, self._xslt._c_style) if r == -1: - raise XSLTSaveError, "Error saving XSLT result to string" + python.PyErr_NoMemory() def __str__(self): cdef char* s @@ -618,10 +620,10 @@ cdef char* c_href cdef xmlAttr* c_attr if self._c_node.content is NULL: - raise ValueError, "PI lacks content" + raise ValueError("PI lacks content") hrefs_utf = _FIND_PI_HREF(' ' + self._c_node.content) if len(hrefs_utf) != 1: - raise ValueError, "malformed PI attributes" + raise ValueError("malformed PI attributes") href_utf = hrefs_utf[0] c_href = _cstr(href_utf) @@ -649,19 +651,20 @@ # try XPath search root = _findStylesheetByID(self._doc, funicode(c_href)) if not root: - raise ValueError, "reference to non-existing embedded stylesheet" + raise ValueError("reference to non-existing embedded stylesheet") elif len(root) > 1: - raise ValueError, "ambiguous reference to embedded stylesheet" + raise ValueError("ambiguous reference to embedded stylesheet") result_node = root[0] return _elementTreeFactory(result_node._doc, result_node) def set(self, key, value): if key != "href": - raise AttributeError, "only setting the 'href' attribute is supported on XSLT-PIs" + raise AttributeError( + "only setting the 'href' attribute is supported on XSLT-PIs") if value is None: attrib = "" elif '"' in value or '>' in value: - raise ValueError, "Invalid URL, must not contain '\"' or '>'" + raise ValueError("Invalid URL, must not contain '\"' or '>'") else: attrib = ' href="%s"' % value text = ' ' + self.text From scoder at codespeak.net Thu Jan 24 21:56:58 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 24 Jan 2008 21:56:58 +0100 (CET) Subject: [Lxml-checkins] r50995 - in lxml/trunk: . doc Message-ID: <20080124205658.3883216844C@codespeak.net> Author: scoder Date: Thu Jan 24 21:56:56 2008 New Revision: 50995 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3306 at delle: sbehnel | 2008-01-24 21:56:35 +0100 FAQ update Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Thu Jan 24 21:56:56 2008 @@ -157,10 +157,30 @@ >>> print etree.tostring(root[0]) texttail -This is a huge simplification for the tree model as it avoids text nodes to -appear in the list of children and makes access to them quick and simple. So -this is a benefit in most applications and simplifies many, many XML tree -algorithms. +Here is an example that shows why the opposite behaviour would be even +more unexpected:: + + >>> root = et.Element("test") + + >>> root.text = "TEXT" + >>> et.tostring(root) + TEXT + + >>> et.tail = "TAIL" + >>> et.tostring(root) + TEXTTAIL + + >>> et.tail = None + >>> et.tostring(root) + TEXT + +Just imagine a Python list where you append an item and it doesn't +show up when you look at the list. + +The ``.tail`` property is a huge simplification for the tree model as +it avoids text nodes to appear in the list of children and makes +access to them quick and simple. So this is a benefit in most +applications and simplifies many, many XML tree algorithms. However, in document-like XML (and especially HTML), the above result can be unexpected to new users and can sometimes require a bit more overhead. A good From scoder at codespeak.net Thu Jan 24 21:58:49 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 24 Jan 2008 21:58:49 +0100 (CET) Subject: [Lxml-checkins] r50996 - in lxml/trunk: . doc Message-ID: <20080124205849.433E21683EA@codespeak.net> Author: scoder Date: Thu Jan 24 21:58:48 2008 New Revision: 50996 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3308 at delle: sbehnel | 2008-01-24 21:58:32 +0100 FAQ fix Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Thu Jan 24 21:58:48 2008 @@ -630,7 +630,7 @@ How do I output null characters in XML text? ---------------------------------------- +-------------------------------------------- Don't. What you would produce is not well-formed XML. XML parsers will refuse to parse a document that contains null characters. The From scoder at codespeak.net Thu Jan 24 22:14:45 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 24 Jan 2008 22:14:45 +0100 (CET) Subject: [Lxml-checkins] r50999 - in lxml/trunk: . doc Message-ID: <20080124211445.8AC2116844D@codespeak.net> Author: scoder Date: Thu Jan 24 22:14:45 2008 New Revision: 50999 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3310 at delle: sbehnel | 2008-01-24 22:14:19 +0100 FAQ fix Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Thu Jan 24 22:14:45 2008 @@ -157,21 +157,21 @@ >>> print etree.tostring(root[0]) texttail -Here is an example that shows why the opposite behaviour would be even -more unexpected:: +Here is an example that shows why not serialising the tail would be +even more surprising from an object point of view:: - >>> root = et.Element("test") + >>> root = etree.Element("test") >>> root.text = "TEXT" - >>> et.tostring(root) + >>> etree.tostring(root) TEXT - >>> et.tail = "TAIL" - >>> et.tostring(root) + >>> etree.tail = "TAIL" + >>> etree.tostring(root) TEXTTAIL - >>> et.tail = None - >>> et.tostring(root) + >>> etree.tail = None + >>> etree.tostring(root) TEXT Just imagine a Python list where you append an item and it doesn't From scoder at codespeak.net Thu Jan 24 22:32:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 24 Jan 2008 22:32:09 +0100 (CET) Subject: [Lxml-checkins] r51000 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080124213209.A44941683E9@codespeak.net> Author: scoder Date: Thu Jan 24 22:32:09 2008 New Revision: 51000 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_elementtree.py Log: r3312 at delle: sbehnel | 2008-01-24 22:31:48 +0100 implementation of 'del el.text' and 'del el.tail', currently disabled due to unclear semantics Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Jan 24 22:32:09 2008 @@ -787,6 +787,10 @@ _resolveQNameText(self, value), 'UTF-8', 'strict') _setNodeText(self._c_node, value) + # using 'del el.text' is the wrong thing to do + #def __del__(self): + # _setNodeText(self._c_node, None) + property tail: """Text after this element's end tag, but before the next sibling element's start tag. This is either a string or the value None, if @@ -798,6 +802,10 @@ def __set__(self, value): _setTailText(self._c_node, value) + # using 'del el.tail' is the wrong thing to do + #def __del__(self): + # _setTailText(self._c_node, None) + # not in ElementTree, read-only property prefix: """Namespace prefix or None. Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Thu Jan 24 22:32:09 2008 @@ -224,6 +224,33 @@ self.assertXML('tail', a) + def _test_del_tail(self): + # this is discouraged for ET compat, should not be tested... + XML = self.etree.XML + + root = XML('This is mixed content.') + self.assertEquals(1, len(root)) + self.assertEquals('This is ', root.text) + self.assertEquals(None, root.tail) + self.assertEquals('mixed', root[0].text) + self.assertEquals(' content.', root[0].tail) + + del root[0].tail + + self.assertEquals(1, len(root)) + self.assertEquals('This is ', root.text) + self.assertEquals(None, root.tail) + self.assertEquals('mixed', root[0].text) + self.assertEquals(None, root[0].tail) + + root[0].tail = "TAIL" + + self.assertEquals(1, len(root)) + self.assertEquals('This is ', root.text) + self.assertEquals(None, root.tail) + self.assertEquals('mixed', root[0].text) + self.assertEquals('TAIL', root[0].tail) + def test_ElementTree(self): Element = self.etree.Element ElementTree = self.etree.ElementTree From lxml-checkins at codespeak.net Fri Jan 25 06:36:20 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 25 Jan 2008 06:36:20 +0100 (CET) Subject: [Lxml-checkins] January 76% OFF Message-ID: <20080125093610.20367.qmail@bzq-79-182-140-252.red.bezeqint.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080125/94fcbfd4/attachment.htm From scoder at codespeak.net Fri Jan 25 10:36:01 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 10:36:01 +0100 (CET) Subject: [Lxml-checkins] r51012 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080125093601.576D0168461@codespeak.net> Author: scoder Date: Fri Jan 25 10:35:59 2008 New Revision: 51012 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/tests/test_etree.py Log: r3314 at delle: sbehnel | 2008-01-25 07:08:35 +0100 'with_tail' keyword in serialiser functions Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 25 10:35:59 2008 @@ -8,6 +8,8 @@ Features added -------------- +* ``with_tail`` option in serialiser functions. + * More accurate exception messages in validator creation. Bugs fixed Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Jan 25 10:35:59 2008 @@ -1444,7 +1444,7 @@ return None def write(self, file, *, encoding=None, method="xml", - pretty_print=False, xml_declaration=None): + pretty_print=False, xml_declaration=None, with_tail=True): """Write the tree to a file or file-like object. Defaults to ASCII encoding and writing a declaration as needed. @@ -1467,7 +1467,7 @@ write_declaration = encoding not in \ ('US-ASCII', 'ASCII', 'UTF8', 'UTF-8') _tofilelike(file, self._context_node, encoding, method, - write_declaration, 1, pretty_print) + write_declaration, 1, pretty_print, with_tail) def getpath(self, _Element element not None): """Returns a structural, absolute XPath expression to find that element. @@ -2233,14 +2233,14 @@ """ return isinstance(element, _Element) -def dump(_Element elem not None, *, pretty_print=True): +def dump(_Element elem not None, *, pretty_print=True, with_tail=True): """Writes an element tree or element structure to sys.stdout. This function should be used for debugging only. """ - _dumpToFile(sys.stdout, elem._c_node, pretty_print) + _dumpToFile(sys.stdout, elem._c_node, pretty_print, with_tail) def tostring(element_or_tree, *, encoding=None, method="xml", - xml_declaration=None, pretty_print=False): + xml_declaration=None, pretty_print=False, with_tail=True): """Serialize an element to an encoded string representation of its XML tree. @@ -2253,6 +2253,10 @@ The keyword argument 'method' selects the output method: 'xml', 'html' or plain 'text'. + + You can prevent the tail text of the element from being serialised + by passing the boolean ``with_tail`` option. This has no impact + on the tail text of children, which will always be serialised. """ cdef bint write_declaration if xml_declaration is None: @@ -2266,10 +2270,11 @@ if isinstance(element_or_tree, _Element): return _tostring(<_Element>element_or_tree, encoding, method, - write_declaration, 0, pretty_print) + write_declaration, 0, pretty_print, with_tail) elif isinstance(element_or_tree, _ElementTree): return _tostring((<_ElementTree>element_or_tree)._context_node, - encoding, method, write_declaration, 1, pretty_print) + encoding, method, write_declaration, 1, pretty_print, + with_tail) else: raise TypeError("Type '%s' cannot be serialized." % type(element_or_tree)) @@ -2283,7 +2288,8 @@ """ return [tostring(element_or_tree, *args, **kwargs)] -def tounicode(element_or_tree, *, method="xml", pretty_print=False): +def tounicode(element_or_tree, *, method="xml", pretty_print=False, + with_tail=True): """Serialize an element to the Python unicode representation of its XML tree. @@ -2295,12 +2301,17 @@ The keyword argument 'method' selects the output method: 'xml', 'html' or plain 'text'. + + You can prevent the tail text of the element from being serialised + by passing the boolean ``with_tail`` option. This has no impact + on the tail text of children, which will always be serialised. """ if isinstance(element_or_tree, _Element): - return _tounicode(<_Element>element_or_tree, method, 0, pretty_print) + return _tounicode(<_Element>element_or_tree, method, 0, pretty_print, + with_tail) elif isinstance(element_or_tree, _ElementTree): return _tounicode((<_ElementTree>element_or_tree)._context_node, - method, 1, pretty_print) + method, 1, pretty_print, with_tail) else: raise TypeError("Type '%s' cannot be serialized." % type(element_or_tree)) Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Fri Jan 25 10:35:59 2008 @@ -17,7 +17,7 @@ return OUTPUT_METHOD_TEXT raise ValueError("unknown output method %r" % method) -cdef _textToString(xmlNode* c_node, encoding): +cdef _textToString(xmlNode* c_node, encoding, bint with_tail): cdef char* c_text with nogil: c_text = tree.xmlNodeGetContent(c_node) @@ -27,7 +27,7 @@ text = c_text tree.xmlFree(c_text) - if _hasTail(c_node): + if with_tail and _hasTail(c_node): tail = _collectText(c_node.next) if tail: text = text + tail @@ -43,7 +43,7 @@ cdef _tostring(_Element element, encoding, method, bint write_xml_declaration, bint write_complete_document, - bint pretty_print): + bint pretty_print, bint with_tail): """Serialize an element to an encoded string representation of its XML tree. """ @@ -62,7 +62,7 @@ c_enc = _cstr(encoding) c_method = _findOutputMethod(method) if c_method == OUTPUT_METHOD_TEXT: - return _textToString(element._c_node, encoding) + return _textToString(element._c_node, encoding, with_tail) # it is necessary to *and* find the encoding handler *and* use # encoding during output enchandler = tree.xmlFindCharEncodingHandler(c_enc) @@ -77,7 +77,7 @@ with nogil: _writeNodeToBuffer(c_buffer, element._c_node, c_enc, c_method, write_xml_declaration, write_complete_document, - pretty_print) + pretty_print, with_tail) tree.xmlOutputBufferFlush(c_buffer) if c_buffer.conv is not NULL: c_result_buffer = c_buffer.conv @@ -92,8 +92,8 @@ tree.xmlOutputBufferClose(c_buffer) return result -cdef _tounicode(_Element element, method, - bint write_complete_document, bint pretty_print): +cdef _tounicode(_Element element, method, bint write_complete_document, + bint pretty_print, bint with_tail): """Serialize an element to the Python unicode representation of its XML tree. """ @@ -104,7 +104,7 @@ return None c_method = _findOutputMethod(method) if c_method == OUTPUT_METHOD_TEXT: - text = _textToString(element._c_node, None) + text = _textToString(element._c_node, None, with_tail) return python.PyUnicode_FromEncodedObject(text, 'utf-8', 'strict') c_buffer = tree.xmlAllocOutputBuffer(NULL) if c_buffer is NULL: @@ -112,7 +112,7 @@ with nogil: _writeNodeToBuffer(c_buffer, element._c_node, NULL, c_method, 0, - write_complete_document, pretty_print) + write_complete_document, pretty_print, with_tail) tree.xmlOutputBufferFlush(c_buffer) if c_buffer.conv is not NULL: c_result_buffer = c_buffer.conv @@ -132,7 +132,7 @@ xmlNode* c_node, char* encoding, int c_method, bint write_xml_declaration, bint write_complete_document, - bint pretty_print) nogil: + bint pretty_print, bint with_tail) nogil: cdef xmlDoc* c_doc cdef xmlNode* c_nsdecl_node c_doc = c_node.doc @@ -169,7 +169,8 @@ tree.xmlFreeNode(c_nsdecl_node) # write tail, trailing comments, etc. - _writeTail(c_buffer, c_node, encoding, pretty_print) + if with_tail: + _writeTail(c_buffer, c_node, encoding, pretty_print) if write_complete_document: _writeNextSiblings(c_buffer, c_node, encoding, pretty_print) if pretty_print: @@ -312,7 +313,7 @@ cdef _tofilelike(f, _Element element, encoding, method, bint write_xml_declaration, bint write_doctype, - bint pretty_print): + bint pretty_print, bint with_tail): cdef python.PyThreadState* state cdef _FilelikeWriter writer cdef tree.xmlOutputBuffer* c_buffer @@ -328,10 +329,10 @@ if _isString(f): filename8 = _encodeFilename(f) f = open(filename8, 'wb') - f.write(_textToString(element._c_node, encoding)) + f.write(_textToString(element._c_node, encoding, with_tail)) f.close() else: - f.write(_textToString(element._c_node, encoding)) + f.write(_textToString(element._c_node, encoding, with_tail)) return enchandler = tree.xmlFindCharEncodingHandler(c_enc) if enchandler is NULL: @@ -353,7 +354,8 @@ raise TypeError("File or filename expected, got '%s'" % type(f)) _writeNodeToBuffer(c_buffer, element._c_node, c_enc, c_method, - write_xml_declaration, write_doctype, pretty_print) + write_xml_declaration, write_doctype, + pretty_print, with_tail) tree.xmlOutputBufferClose(c_buffer) tree.xmlCharEncCloseFunc(enchandler) if writer is None: @@ -403,13 +405,14 @@ # dump node to file (mainly for debug) -cdef _dumpToFile(f, xmlNode* c_node, bint pretty_print): +cdef _dumpToFile(f, xmlNode* c_node, bint pretty_print, bint with_tail): cdef tree.xmlOutputBuffer* c_buffer if not python.PyFile_Check(f): raise ValueError("not a file") c_buffer = tree.xmlOutputBufferCreateFile(python.PyFile_AsFile(f), NULL) tree.xmlNodeDumpOutput(c_buffer, c_node.doc, c_node, 0, pretty_print, NULL) - _writeTail(c_buffer, c_node, NULL, 0) + if with_tail: + _writeTail(c_buffer, c_node, NULL, 0) if not pretty_print: # not written yet tree.xmlOutputBufferWriteString(c_buffer, '\n') Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Fri Jan 25 10:35:59 2008 @@ -1916,6 +1916,26 @@ result = tostring(a, pretty_print=True) self.assertEquals(result, "\n \n \n\n") + def test_tostring_with_tail(self): + tostring = self.etree.tostring + Element = self.etree.Element + SubElement = self.etree.SubElement + + a = Element('a') + a.tail = "aTAIL" + b = SubElement(a, 'b') + b.tail = "bTAIL" + c = SubElement(a, 'c') + + result = tostring(a) + self.assertEquals(result, "bTAILaTAIL") + + result = tostring(a, with_tail=False) + self.assertEquals(result, "bTAIL") + + result = tostring(a, with_tail=True) + self.assertEquals(result, "bTAILaTAIL") + def test_tostring_method_text_encoding(self): tostring = self.etree.tostring Element = self.etree.Element From scoder at codespeak.net Fri Jan 25 10:36:05 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 10:36:05 +0100 (CET) Subject: [Lxml-checkins] r51013 - in lxml/trunk: . doc Message-ID: <20080125093605.F25A7168469@codespeak.net> Author: scoder Date: Fri Jan 25 10:36:04 2008 New Revision: 51013 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt Log: r3315 at delle: sbehnel | 2008-01-25 09:54:09 +0100 link fix Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Fri Jan 25 10:36:04 2008 @@ -46,7 +46,7 @@ * ElementTree: - * ElementTree_ API + * `ElementTree API`_ * compatibility_ and differences of lxml.etree @@ -104,6 +104,7 @@ including custom element class support. .. _ElementTree: http://effbot.org/zone/element-index.htm +.. _`ElementTree API`: http://effbot.org/zone/element-index.htm#documentation .. _cElementTree: http://effbot.org/zone/celementtree.htm .. _`lxml.etree Tutorial`: tutorial.html From scoder at codespeak.net Fri Jan 25 10:36:10 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 10:36:10 +0100 (CET) Subject: [Lxml-checkins] r51014 - in lxml/trunk: . doc Message-ID: <20080125093610.A647316846C@codespeak.net> Author: scoder Date: Fri Jan 25 10:36:09 2008 New Revision: 51014 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/tutorial.txt Log: r3316 at delle: sbehnel | 2008-01-25 09:54:41 +0100 tutorial update: tostring(with_tail=False) and ElementPath Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Fri Jan 25 10:36:09 2008 @@ -17,7 +17,9 @@ 1.1 Elements are lists 1.2 Elements carry attributes 1.3 Elements contain text - 1.4 Tree iteration + 1.4 Using XPath to find text + 1.5 Tree iteration + 1.6 Serialisation 2 The ElementTree class 3 Parsing from strings and files 3.1 The fromstring() function @@ -29,9 +31,6 @@ 4 Namespaces 5 The E-factory 6 ElementPath - 6.1 findall() - 6.2 find() - 6.3 findtext() A common way to import ``lxml.etree`` is as follows:: @@ -273,10 +272,42 @@ >>> print etree.tostring(html) TEXT
TAIL -These two properties are enough to represent any text content in an XML -document. If you want to read the text without the intermediate tags, -however, you have to recursively concatenate all ``text`` and ``tail`` -attributes in the correct order. A simpler way to do this is XPath_:: +The two properties ``.text`` and ``.tail`` are enough to represent any +text content in an XML document. This way, the ElementTree API does +not require any `special text nodes`_ in addition to the Element +class, that tend to get in the way fairly often (as you might know +from classic DOM_ APIs). + +However, there are cases where the tail text also gets in the way. +For example, when you serialise an Element from within the tree, you +do not always want its tail text in the result (although you would +still want the tail text of its children). For this purpose, the +``tostring()`` function accepts the keyword argument ``with_tail``:: + + >>> print etree.tostring(br) +
TAIL + >>> print etree.tostring(br, with_tail=False) # lxml.etree only! +
+ +.. _`special text nodes`: http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1312295772 +.. _DOM: http://www.w3.org/TR/DOM-Level-3-Core/core.html + +If you want to read *only* the text, i.e. without any intermediate +tags, you have to recursively concatenate all ``text`` and ``tail`` +attributes in the correct order. Again, the ``tostring()`` function +comes to the rescue, this time using the ``method`` keyword:: + + >>> print etree.tostring(html, method="text") + TEXTTAIL + + +Using XPath to find text +------------------------ + +.. _XPath: xpathxslt.html#xpath + +Another way to extract the text content of a tree is XPath_, which +also allows you to extract the separate text chunks into a list:: >>> print html.xpath("string()") # lxml.etree only! TEXTTAIL @@ -315,8 +346,6 @@ >>> print texts[1].is_tail True -.. _XPath: xpathxslt.html#xpath - Tree iteration -------------- @@ -638,7 +667,9 @@ or whenever data comes in slowly or in chunks and you want to do other things while waiting for the next chunk. -You can reuse the parser by calling its ``feed()`` method again:: +After calling the ``close()`` method (or when an exception was raised +by the parser), you can reuse the parser by calling its ``feed()`` +method again:: >>> parser.feed("") >>> root = parser.close() @@ -814,7 +845,7 @@ The Element creation based on attribute access makes it easy to build up a simple vocabulary for an XML language:: - >>> from lxml.builder import ElementMaker + >>> from lxml.builder import ElementMaker # lxml only ! >>> E = ElementMaker(namespace="http://my.de/fault/namespace", ... nsmap={'p' : "http://my.de/fault/namespace"}) @@ -858,11 +889,50 @@ ElementPath =========== -findall() ---------- +The ElementTree library comes with a simple XPath-like path language +called ElementPath_. The main difference is that you can use the +``{namespace}tag`` notation in ElementPath expressions. However, +advanced features like value comparison and functions are not +available. + +.. _ElementPath: http://effbot.org/zone/element-xpath.htm +.. _`full XPath implementation`: xpathxslt.html#xpath + +In addition to a `full XPath implementation`_, lxml.etree supports the +ElementPath language in the same way ElementTree does, even using +(almost) the same implementation. The API provides four methods here +that you can find on Elements and ElementTrees: + +* ``iterfind()`` iterates over all Elements that match the path + expression + +* ``findall()`` returns a list of matching Elements + +* ``find()`` efficiently returns only the first match + +* ``findtext()`` returns the ``.text`` content of the first match + +Here are some examples:: + + >>> root = etree.XML("aText") + +Find a child of an Element:: + + >>> print root.find("b") + None + >>> print root.find("a").tag + a -find() ------- +Find an Element anywhere in the tree:: -findtext() ----------- + >>> print root.find(".//b").tag + b + >>> [ b.tag for b in root.iterfind(".//b") ] + ['b', 'b'] + +Find Elements with a certain attribute:: + + >>> print root.findall(".//a[@x]")[0].tag + a + >>> print root.findall(".//a[@y]") + [] From scoder at codespeak.net Fri Jan 25 10:36:17 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 10:36:17 +0100 (CET) Subject: [Lxml-checkins] r51015 - in lxml/trunk: . doc src/lxml Message-ID: <20080125093617.517CE168471@codespeak.net> Author: scoder Date: Fri Jan 25 10:36:16 2008 New Revision: 51015 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/tutorial.txt lxml/trunk/doc/xpathxslt.txt lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/python.pxd Log: r3317 at delle: sbehnel | 2008-01-25 10:35:30 +0100 XPath string results are always smart objects, but no longer forced into unicode Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 25 10:36:16 2008 @@ -8,6 +8,12 @@ Features added -------------- +* Plain ASCII XPath string results are no longer forced into unicode + objects (as in 2.0beta1). + +* All XPath string results are 'smart' objects that have a + ``getparent()`` method to retrieve their parent Element. + * ``with_tail`` option in serialiser functions. * More accurate exception messages in validator creation. Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Fri Jan 25 10:36:16 2008 @@ -312,18 +312,18 @@ >>> print html.xpath("string()") # lxml.etree only! TEXTTAIL >>> print html.xpath("//text()") # lxml.etree only! - [u'TEXT', u'TAIL'] + ['TEXT', 'TAIL'] If you want to use this more often, you can wrap it in a function:: >>> build_text_list = etree.XPath("//text()") # lxml.etree only! >>> print build_text_list(html) - [u'TEXT', u'TAIL'] + ['TEXT', 'TAIL'] -Note that the ``text()`` function in XPath always returns unicode -strings. This is because it is actually a special object that knows -about its origins. You can ask it where it came from through its -``getparent()`` method, just as you would with Elements:: +Note that a string result returned by XPath is a special 'smart' +object that knows about its origins. You can ask it where it came +from through its ``getparent()`` method, just as you would with +Elements:: >>> texts = build_text_list(html) >>> print texts[0] @@ -346,6 +346,16 @@ >>> print texts[1].is_tail True +While this works for the results of the ``text()`` function, lxml will +not to tell you the origin of a string value that was constructed by +the XPath functions ``string()`` or ``concat()``:: + + >>> stringify = etree.XPath("string()") + >>> print stringify(html) + TEXTTAIL + >>> print stringify(html).getparent() + None + Tree iteration -------------- Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Fri Jan 25 10:36:16 2008 @@ -136,13 +136,32 @@ * a float, when the XPath expression has a numeric result (integer or float) -* a (unicode) string, when the XPath expression has a string result. +* a 'smart' string (as described below), when the XPath expression has + a string result. -* a list of items, when the XPath expression has a list as result. The items - may include elements (also comments and processing instructions), strings - and tuples. Text nodes and attributes in the result are returned as strings - (the text node content or attribute value). Namespace declarations are - returned as tuples of strings: ``(prefix, URI)``. +* a list of items, when the XPath expression has a list as result. + The items may include Elements (also comments and processing + instructions), strings and tuples. Text nodes and attributes in the + result are returned as 'smart' string values. Namespace + declarations are returned as tuples of strings: ``(prefix, URI)``. + +XPath string results are 'smart' in that they provide a +``getparent()`` method that knows their origin:: + +* for attribute values, ``result.getparent()`` returns the Element + that carries them. An example is ``//foo/@attribute``, where the + parent would be a ``foo`` Element. + +* for the ``text()`` function (as in ``//text()``), it returns the + Element that contains the text or tail that was returned. + +You can distinguish between different text origins with the boolean +properties ``is_text``, ``is_tail`` and ``is_attribute``. + +Note that ``getparent()`` may not always return an Element. For +example, the XPath functions ``string()`` and ``concat()`` will +construct strings that do not have an origin. For them, +``getparent()`` will return None. Generating XPath expressions Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Fri Jan 25 10:36:16 2008 @@ -491,7 +491,8 @@ elif xpathObj.type == xpath.XPATH_NUMBER: return xpathObj.floatval elif xpathObj.type == xpath.XPATH_STRING: - return funicode(xpathObj.stringval) + return _elementStringResultFactory( + funicode(xpathObj.stringval), None, 0, 0) elif xpathObj.type == xpath.XPATH_POINT: raise NotImplementedError elif xpathObj.type == xpath.XPATH_RANGE: @@ -524,7 +525,7 @@ value = _fakeDocElementFactory(doc, c_node) elif c_node.type == tree.XML_TEXT_NODE or \ c_node.type == tree.XML_ATTRIBUTE_NODE: - value = _newElementStringResult(doc, c_node) + value = _buildElementStringResult(doc, c_node) elif c_node.type == tree.XML_NAMESPACE_DECL: s = (c_node).href if s is NULL: @@ -560,7 +561,7 @@ ################################################################################ # special str/unicode subclasses -cdef class _ElementStringResult(python.unicode): +cdef class _ElementUnicodeResult(python.unicode): cdef _Element parent cdef readonly object is_tail cdef readonly object is_text @@ -569,27 +570,56 @@ def getparent(self): return self.parent -cdef object _newElementStringResult(_Document doc, xmlNode* c_node): - cdef _ElementStringResult result +class _ElementStringResult(str): + # we need to use a Python class here, str cannot be C-subclassed + # in Pyrex/Cython + def getparent(self): + return self._parent + +cdef object _elementStringResultFactory(string_value, _Element parent, + bint is_attribute, bint is_tail): + cdef _ElementUnicodeResult uresult + cdef bint is_text + if parent is None: + is_text = 0 + else: + is_text = not (is_tail or is_attribute) + + if python.PyString_CheckExact(string_value): + result = _ElementStringResult(string_value) + result._parent = parent + result.is_attribute = is_attribute + result.is_tail = is_tail + result.is_text = is_text + return result + else: + uresult = _ElementUnicodeResult(string_value) + uresult.parent = parent + uresult.is_attribute = is_attribute + uresult.is_tail = is_tail + uresult.is_text = is_text + return uresult + +cdef object _buildElementStringResult(_Document doc, xmlNode* c_node): + cdef _Element parent cdef xmlNode* c_element cdef char* s - cdef bint is_attribute, is_tail + cdef bint is_attribute, is_text, is_tail if c_node.type == tree.XML_ATTRIBUTE_NODE: is_attribute = 1 is_tail = 0 s = tree.xmlNodeGetContent(c_node) try: - value = python.PyUnicode_DecodeUTF8(s, cstd.strlen(s), NULL) + value = funicode(s) finally: tree.xmlFree(s) c_element = NULL else: #assert c_node.type == tree.XML_TEXT_NODE, "invalid node type" is_attribute = 0 - # tail text? - value = python.PyUnicode_DecodeUTF8( - c_node.content, cstd.strlen(c_node.content), NULL) + # may be tail text or normal text + value = funicode(c_node.content) c_element = _previousElement(c_node) is_tail = c_element is not NULL @@ -599,15 +629,12 @@ while c_element is not NULL and not _isElement(c_element): c_element = c_element.parent - if c_element is NULL: - return value + if c_element is not NULL: + parent = _fakeDocElementFactory(doc, c_element) + + return _elementStringResultFactory( + value, parent, is_attribute, is_tail) - result = _ElementStringResult(value) - result.parent = _fakeDocElementFactory(doc, c_element) - result.is_attribute = is_attribute - result.is_tail = is_tail - result.is_text = not (is_tail or is_attribute) - return result ################################################################################ # callbacks for XPath/XSLT extension functions Modified: lxml/trunk/src/lxml/python.pxd ============================================================================== --- lxml/trunk/src/lxml/python.pxd (original) +++ lxml/trunk/src/lxml/python.pxd Fri Jan 25 10:36:16 2008 @@ -25,6 +25,7 @@ cdef int PyUnicode_Check(object obj) cdef int PyString_Check(object obj) + cdef int PyString_CheckExact(object obj) cdef object PyUnicode_FromEncodedObject(object s, char* encoding, char* errors) From scoder at codespeak.net Fri Jan 25 18:21:56 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 18:21:56 +0100 (CET) Subject: [Lxml-checkins] r51038 - in lxml/trunk: . benchmark Message-ID: <20080125172156.618FD1684F5@codespeak.net> Author: scoder Date: Fri Jan 25 18:21:55 2008 New Revision: 51038 Modified: lxml/trunk/ (props changed) lxml/trunk/benchmark/bench_etree.py Log: r3322 at delle: sbehnel | 2008-01-25 18:21:09 +0100 benchmark to compare findall() and xpath() Modified: lxml/trunk/benchmark/bench_etree.py ============================================================================== --- lxml/trunk/benchmark/bench_etree.py (original) +++ lxml/trunk/benchmark/bench_etree.py Fri Jan 25 18:21:55 2008 @@ -312,6 +312,12 @@ root.findall(".//*[%s]/./%s/./*" % (self.SEARCH_TAG, self.SEARCH_TAG)) @onlylib('lxe') + def bench_xpath_path(self, root): + ns, tag = self.SEARCH_TAG[1:].split('}') + root.xpath(".//*[p:%s]/./p:%s/./*" % (tag,tag), + namespaces = {'p':ns}) + + @onlylib('lxe') def bench_iterfind(self, root): list(root.iterfind(".//*")) From scoder at codespeak.net Fri Jan 25 18:22:01 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 18:22:01 +0100 (CET) Subject: [Lxml-checkins] r51039 - in lxml/trunk: . doc Message-ID: <20080125172201.CEE8F1684F6@codespeak.net> Author: scoder Date: Fri Jan 25 18:22:00 2008 New Revision: 51039 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt lxml/trunk/version.txt Log: r3323 at delle: sbehnel | 2008-01-25 18:21:29 +0100 lxml 2.0beta2 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 25 18:22:00 2008 @@ -2,14 +2,14 @@ lxml changelog ============== -Under development -================= +2.0beta2 (2008-01-25) +===================== Features added -------------- * Plain ASCII XPath string results are no longer forced into unicode - objects (as in 2.0beta1). + objects but are returned as plain strings as before 2.0beta1. * All XPath string results are 'smart' objects that have a ``getparent()`` method to retrieve their parent Element. Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Fri Jan 25 18:22:00 2008 @@ -142,8 +142,8 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 2.0beta1`_, released 2008-01-11 -(`changes for 2.0beta1`_). `Older versions`_ are listed below. +The latest version is `lxml 2.0beta2`_, released 2008-01-25 +(`changes for 2.0beta2`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions @@ -203,6 +203,8 @@ Old Versions ------------ +* `lxml 2.0beta1`_, released 2008-01-11 (`changes for 2.0beta1`_) + * `lxml 2.0alpha6`_, released 2007-12-19 (`changes for 2.0alpha6`_) * `lxml 2.0alpha5`_, released 2007-11-24 (`changes for 2.0alpha5`_) @@ -265,6 +267,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.0beta2`: lxml-2.0beta2.tgz .. _`lxml 2.0beta1`: lxml-2.0beta1.tgz .. _`lxml 2.0alpha6`: lxml-2.0alpha6.tgz .. _`lxml 2.0alpha5`: lxml-2.0alpha5.tgz @@ -297,6 +300,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.0beta2`: changes-2.0beta2.html .. _`changes for 2.0beta1`: changes-2.0beta1.html .. _`changes for 2.0alpha6`: changes-2.0alpha6.html .. _`changes for 2.0alpha5`: changes-2.0alpha5.html Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Fri Jan 25 18:22:00 2008 @@ -1 +1 @@ -2.0beta1 +2.0beta2 From scoder at codespeak.net Fri Jan 25 18:24:13 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 25 Jan 2008 18:24:13 +0100 (CET) Subject: [Lxml-checkins] r51041 - lxml/trunk Message-ID: <20080125172413.AE0791684F4@codespeak.net> Author: scoder Date: Fri Jan 25 18:24:13 2008 New Revision: 51041 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3326 at delle: sbehnel | 2008-01-25 18:23:56 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 25 18:24:13 2008 @@ -9,7 +9,7 @@ -------------- * Plain ASCII XPath string results are no longer forced into unicode - objects but are returned as plain strings as before 2.0beta1. + objects as in 2.0beta1, but are returned as plain strings as before. * All XPath string results are 'smart' objects that have a ``getparent()`` method to retrieve their parent Element. From scoder at codespeak.net Sat Jan 26 13:00:49 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 13:00:49 +0100 (CET) Subject: [Lxml-checkins] r51059 - lxml/branch/lxml-1.3/doc Message-ID: <20080126120049.02CF01684D0@codespeak.net> Author: scoder Date: Sat Jan 26 13:00:49 2008 New Revision: 51059 Modified: lxml/branch/lxml-1.3/doc/element_classes.txt Log: rest fix Modified: lxml/branch/lxml-1.3/doc/element_classes.txt ============================================================================== --- lxml/branch/lxml-1.3/doc/element_classes.txt (original) +++ lxml/branch/lxml-1.3/doc/element_classes.txt Sat Jan 26 13:00:49 2008 @@ -255,7 +255,7 @@ Tree based element class lookup in Python -......................................... +----------------------------------------- Taking more elaborate decisions than allowed by the custom scheme is difficult to achieve in pure Python. It would require access to the tree - before the From scoder at codespeak.net Sat Jan 26 13:01:43 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 13:01:43 +0100 (CET) Subject: [Lxml-checkins] r51060 - lxml/branch/lxml-1.3/src/lxml Message-ID: <20080126120143.625A11684D0@codespeak.net> Author: scoder Date: Sat Jan 26 13:01:41 2008 New Revision: 51060 Modified: lxml/branch/lxml-1.3/src/lxml/proxy.pxi Log: cleanup following lxml 2.0 Modified: lxml/branch/lxml-1.3/src/lxml/proxy.pxi ============================================================================== --- lxml/branch/lxml-1.3/src/lxml/proxy.pxi (original) +++ lxml/branch/lxml-1.3/src/lxml/proxy.pxi Sat Jan 26 13:01:41 2008 @@ -66,11 +66,10 @@ c_new_root = tree.xmlDocCopyNode(c_node, c_doc, 2) # non recursive! tree.xmlDocSetRootElement(c_doc, c_new_root) _copyParentNamespaces(c_node, c_new_root) - _copyParentNamespaces(c_node, c_root) c_new_root.children = c_node.children c_new_root.last = c_node.last - c_new_root.next = c_new_root.prev = c_new_root.parent = NULL + c_new_root.next = c_new_root.prev = NULL # store original node c_doc._private = c_node From scoder at codespeak.net Sat Jan 26 13:06:44 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 13:06:44 +0100 (CET) Subject: [Lxml-checkins] r51061 - in lxml/trunk: . doc Message-ID: <20080126120644.078461684DA@codespeak.net> Author: scoder Date: Sat Jan 26 13:06:44 2008 New Revision: 51061 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/xpathxslt.txt Log: r3328 at delle: sbehnel | 2008-01-25 20:59:28 +0100 rest fix Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Sat Jan 26 13:06:44 2008 @@ -146,7 +146,7 @@ declarations are returned as tuples of strings: ``(prefix, URI)``. XPath string results are 'smart' in that they provide a -``getparent()`` method that knows their origin:: +``getparent()`` method that knows their origin: * for attribute values, ``result.getparent()`` returns the Element that carries them. An example is ``//foo/@attribute``, where the From scoder at codespeak.net Sat Jan 26 13:06:49 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 13:06:49 +0100 (CET) Subject: [Lxml-checkins] r51062 - in lxml/trunk: . doc Message-ID: <20080126120649.0ABBB1684DC@codespeak.net> Author: scoder Date: Sat Jan 26 13:06:48 2008 New Revision: 51062 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt lxml/trunk/doc/lxml2.txt Log: r3329 at delle: sbehnel | 2008-01-26 12:51:27 +0100 doc update Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Jan 26 13:06:48 2008 @@ -1,6 +1,6 @@ -======================================= -lxml - Frequently Asked Questions (FAQ) -======================================= +===================================== +lxml FAQ - Frequently Asked Questions +===================================== .. meta:: :description: Frequently Asked Questions about lxml (FAQ) Modified: lxml/trunk/doc/lxml2.txt ============================================================================== --- lxml/trunk/doc/lxml2.txt (original) +++ lxml/trunk/doc/lxml2.txt Sat Jan 26 13:06:48 2008 @@ -127,7 +127,17 @@ bigger overlap with the XSLT code. The main benefits are improved thread safety in the XPath evaluators and Python RegExp support in standard XPath. +* The string results of an XPath evaluation have become 'smart' string + subclasses. Formerly, there was no easy way to find out where a + string originated from. In lxml 2.0, you can call its + ``getparent()`` method to `find the Element that carries it`_. This + works for attributes (``//@attribute``) and for ``text()`` nodes, + i.e. Element text and tails. Strings that were constructed in the + path expression, e.g. by the ``string()`` function or extension + functions, will return None as their parent. + .. _`E factory`: objectify.html#tree-generation-with-the-e-factory +.. _`find the Element that carries it`: tutorial.html#using-xpath-to-find-text New modules @@ -140,17 +150,16 @@ --------------- A very useful module for doctests based on XML or HTML is -``lxml.doctestcompare``. It provides a relaxed comparison mechanism for XML -and HTML in doctests. Using it is as simple as:: +``lxml.doctestcompare``. It provides a relaxed comparison mechanism +for XML and HTML in doctests. Using it for XML comparisons is as +simple as:: >>> import lxml.usedoctest -for XML comparisons and:: +and for HTML comparisons:: >>> import lxml.html.usedoctest -for HTML comparisons. - lxml.html --------- From scoder at codespeak.net Sat Jan 26 13:06:53 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 13:06:53 +0100 (CET) Subject: [Lxml-checkins] r51063 - in lxml/trunk: . src/lxml/html/tests/hackers-org-data Message-ID: <20080126120653.CF5911684DD@codespeak.net> Author: scoder Date: Sat Jan 26 13:06:53 2008 New Revision: 51063 Added: lxml/trunk/src/lxml/html/tests/hackers-org-data/xml-namespace.data.BROKEN - copied unchanged from r49914, lxml/trunk/src/lxml/html/tests/hackers-org-data/xml-namespace.data Removed: lxml/trunk/src/lxml/html/tests/hackers-org-data/xml-namespace.data Modified: lxml/trunk/ (props changed) Log: r3330 at delle: sbehnel | 2008-01-26 13:04:59 +0100 removed broken test Deleted: /lxml/trunk/src/lxml/html/tests/hackers-org-data/xml-namespace.data ============================================================================== --- /lxml/trunk/src/lxml/html/tests/hackers-org-data/xml-namespace.data Sat Jan 26 13:06:53 2008 +++ (empty file) @@ -1,16 +0,0 @@ -Description: XML namespace. The htc file must be located on the same server as your XSS vector - http://ha.ckers.org/xss.html#XSS_XML_namespace -Note: I don't completely understand the vector here. page_structure is what does this. - - - - - XSS - - ----------- - - -
XSS
- - From scoder at codespeak.net Sat Jan 26 17:44:38 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 17:44:38 +0100 (CET) Subject: [Lxml-checkins] r51071 - in lxml/trunk: . src/lxml/html/tests Message-ID: <20080126164438.53AB91684FF@codespeak.net> Author: scoder Date: Sat Jan 26 17:44:34 2008 New Revision: 51071 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/tests/test_clean.py Log: r3336 at delle: sbehnel | 2008-01-26 17:44:10 +0100 get rid of test failures for older libxml2 versions Modified: lxml/trunk/src/lxml/html/tests/test_clean.py ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_clean.py (original) +++ lxml/trunk/src/lxml/html/tests/test_clean.py Sat Jan 26 17:44:34 2008 @@ -6,6 +6,6 @@ suite = unittest.TestSuite() if sys.version_info >= (2,4): suite.addTests([doctest.DocFileSuite('test_clean.txt')]) - if LIBXML_VERSION <= (2,6,28) or LIBXML_VERSION >= (2,6,31): + if LIBXML_VERSION >= (2,6,31): suite.addTests([doctest.DocFileSuite('test_clean_embed.txt')]) return suite From scoder at codespeak.net Sat Jan 26 21:10:17 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 26 Jan 2008 21:10:17 +0100 (CET) Subject: [Lxml-checkins] r51073 - in lxml/trunk: . doc Message-ID: <20080126201017.6FB191684E1@codespeak.net> Author: scoder Date: Sat Jan 26 21:10:09 2008 New Revision: 51073 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt Log: r3338 at delle: sbehnel | 2008-01-26 21:09:47 +0100 fixed release date Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sat Jan 26 21:10:09 2008 @@ -2,7 +2,7 @@ lxml changelog ============== -2.0beta2 (2008-01-25) +2.0beta2 (2008-01-26) ===================== Features added Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sat Jan 26 21:10:09 2008 @@ -142,7 +142,7 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 2.0beta2`_, released 2008-01-25 +The latest version is `lxml 2.0beta2`_, released 2008-01-26 (`changes for 2.0beta2`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions From scoder at codespeak.net Sun Jan 27 08:59:55 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 27 Jan 2008 08:59:55 +0100 (CET) Subject: [Lxml-checkins] r51075 - lxml/trunk Message-ID: <20080127075955.51997168476@codespeak.net> Author: scoder Date: Sun Jan 27 08:59:53 2008 New Revision: 51075 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r3340 at delle: sbehnel | 2008-01-27 08:59:27 +0100 doc typo Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Jan 27 08:59:53 2008 @@ -33,7 +33,7 @@ the operation that caused the error. * ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source - file/filename through the ``file`` keyyword argument. + file/filename through the ``file`` keyword argument. * The test suite now skips most doctests under Python 2.3. From scoder at codespeak.net Sun Jan 27 10:49:31 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 27 Jan 2008 10:49:31 +0100 (CET) Subject: [Lxml-checkins] r51076 - in lxml/trunk: . doc Message-ID: <20080127094931.985C9168438@codespeak.net> Author: scoder Date: Sun Jan 27 10:49:26 2008 New Revision: 51076 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3342 at delle: sbehnel | 2008-01-27 10:44:53 +0100 FAQ update Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sun Jan 27 10:49:26 2008 @@ -402,6 +402,11 @@ .. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev +Since as a user of lxml you are likely a programmer, you might find +`this article on bug reports`_ an interesting read. + +.. _`this article on bug reports`: http://www.chiark.greenend.org.uk/~sgtatham/bugs.html + Threading ========= From lxml-checkins at codespeak.net Sun Jan 27 20:45:58 2008 From: lxml-checkins at codespeak.net (® Official Site) Date: Sun, 27 Jan 2008 20:45:58 +0100 (CET) Subject: [Lxml-checkins] January 73% OFF Message-ID: <20080127-34548.11659.qmail@user-160u852.cable.mindspring.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080127/9cf5e3f1/attachment.htm From lxml-checkins at codespeak.net Mon Jan 28 01:35:26 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 28 Jan 2008 01:35:26 +0100 (CET) Subject: [Lxml-checkins] January 76% OFF Message-ID: <20080127003508.5351.qmail@mn-10k-dhcp1-5599.dsl.hickorytech.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080128/9e9fd34a/attachment-0001.htm From scoder at codespeak.net Mon Jan 28 09:00:07 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 09:00:07 +0100 (CET) Subject: [Lxml-checkins] r51089 - in lxml/trunk: . src/lxml Message-ID: <20080128080007.D7CCF16855B@codespeak.net> Author: scoder Date: Mon Jan 28 09:00:05 2008 New Revision: 51089 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.objectify.pyx Log: r3344 at delle: sbehnel | 2008-01-28 08:47:14 +0100 makeparser() function in lxml.objectify Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Jan 28 09:00:05 2008 @@ -2,6 +2,22 @@ lxml changelog ============== +Under development +================= + +Features added +-------------- + +* ``makeparser()`` function in ``lxml.objectify`` to create a new + parser with the usual objectify setup. + +Bugs fixed +---------- + +Other changes +------------- + + 2.0beta2 (2008-01-26) ===================== Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Mon Jan 28 09:00:05 2008 @@ -1598,12 +1598,13 @@ cdef object __DEFAULT_PARSER __DEFAULT_PARSER = etree.XMLParser(remove_blank_text=True) -__DEFAULT_PARSER.setElementClassLookup( ObjectifyElementClassLookup() ) +__DEFAULT_PARSER.set_element_class_lookup( ObjectifyElementClassLookup() ) cdef object objectify_parser objectify_parser = __DEFAULT_PARSER def setDefaultParser(new_parser = None): + "This function is deprecated, use ``set_default_parser()`` instead." set_default_parser(new_parser) def set_default_parser(new_parser = None): @@ -1622,6 +1623,16 @@ else: raise TypeError("parser must inherit from lxml.etree.XMLParser") +def makeparser(**kw): + """Create a new XML parser for objectify trees. + + You can pass all keyword arguments that are supported by + ``etree.XMLParser()``. + """ + parser = etree.XMLParser(**kw) + parser.set_element_class_lookup( ObjectifyElementClassLookup() ) + return parser + cdef _Element _makeElement(tag, text, attrib, nsmap): return cetree.makeElement(tag, None, objectify_parser, text, None, attrib, nsmap) From scoder at codespeak.net Mon Jan 28 09:00:11 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 09:00:11 +0100 (CET) Subject: [Lxml-checkins] r51090 - in lxml/trunk: . doc Message-ID: <20080128080011.CEB6716855E@codespeak.net> Author: scoder Date: Mon Jan 28 09:00:10 2008 New Revision: 51090 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/objectify.txt Log: r3345 at delle: sbehnel | 2008-01-28 08:48:04 +0100 cleaned up objectify tutorial, removed confusing details at the beginning Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Jan 28 09:00:10 2008 @@ -16,71 +16,19 @@ used. Python data types are extracted from XML content automatically and made available to the normal Python operators. -This API is very different from the ElementTree API. If it is used, it should -not be mixed with other element implementations, to avoid non-obvious -behaviour. - -The `benchmark page`_ has some hints on performance optimisation of code using -lxml.objectify. - -.. _Amara: http://uche.ogbuji.net/tech/4suite/amara/ -.. _gnosis.xml.objectify: http://gnosis.cx/download/ -.. _`benchmark page`: performance.html#lxml-objectify - -.. contents:: -.. - 1 Setting up lxml.objectify - 2 The lxml.objectify API - 2.1 Creating objectify trees - 2.2 Element access through object attributes - 2.3 Namespace handling - 3 ObjectPath - 4 Python data types - 5 Recursive tree dump - 5.1 Recursive string representation of elements - 6 How data types are matched - 6.1 Type annotations - 6.2 XML Schema datatype annotation - 6.3 The DataElement factory - 6.4 Defining additional data classes - 7 What is different from lxml.etree? - - -Setting up lxml.objectify -========================= - To set up and use ``objectify``, you need both the ``lxml.etree`` module and ``lxml.objectify``:: >>> from lxml import etree >>> from lxml import objectify -The next step is to create a parser that builds objectify documents. The -objectify API is meant for data-centric XML (as opposed to document XML with -mixed content). Therefore, we configure the parser to let it remove -whitespace-only text from the parsed document if it is not enclosed by an XML -element. Note that this alters the document infoset, so if you consider the -removed spaces as data in your specific use case, you should go with a normal -parser and just set the element class lookup. Most applications, however, -will work fine with the following setup:: - - >>> parser = etree.XMLParser(remove_blank_text=True) - - >>> lookup = objectify.ObjectifyElementClassLookup() - >>> parser.setElementClassLookup(lookup) - -If you want additional support for `namespace specific classes`_, you can -register the objectify lookup as a fallback of the namespace lookup. Note, -however, that you have to take care in this case, that the namespace classes -inherit from ``objectify.ObjectifiedElement``, not only from the normal -``lxml.etree.ElementBase``, so that they support the ``objectify`` API. The -above setup code then becomes:: - - >>> lookup = etree.ElementNamespaceClassLookup( - ... objectify.ObjectifyElementClassLookup() ) - >>> parser.setElementClassLookup(lookup) +The objectify API is very different from the ElementTree API. If it +is used, it should not be mixed with other element implementations +(such as trees parsed with ``lxml.etree``), to avoid non-obvious +behaviour. -.. _`namespace specific classes`: element_classes.html#namespace-class-lookup +The `benchmark page`_ has some hints on performance optimisation of code using +lxml.objectify. To make the doctests in this document look a little nicer, we also use this: @@ -89,6 +37,29 @@ Imported from within a doctest, this relieves us from caring about the exact formatting of XML output. +.. _Amara: http://uche.ogbuji.net/tech/4suite/amara/ +.. _gnosis.xml.objectify: http://gnosis.cx/download/ +.. _`benchmark page`: performance.html#lxml-objectify + +.. contents:: +.. + 1 The lxml.objectify API + 1.1 Creating objectify trees + 1.2 Element access through object attributes + 1.3 Tree generation with the E-factory + 1.4 Namespace handling + 2 ObjectPath + 3 Python data types + 3.1 Recursive tree dump + 3.2 Recursive string representation of elements + 4 How data types are matched + 4.1 Type annotations + 4.2 XML Schema datatype annotation + 4.3 The DataElement factory + 4.4 Defining additional data classes + 4.5 Advanced element class lookup + 5 What is different from lxml.etree? + The lxml.objectify API ====================== @@ -100,25 +71,40 @@ Creating objectify trees ------------------------ -To create an ``objectify`` tree, you can either parse a document with the -parser you created:: +As with ``lxml.etree``, you can either create an ``objectify`` tree by +parsing an XML document or by building one from scratch. To parse a +document, just use the ``parse()`` or ``fromstring()`` functions of +the module:: >>> from StringIO import StringIO - >>> xml = StringIO('') - >>> tree = etree.parse(xml, parser) + >>> fileobject = StringIO('') + + >>> tree = objectify.parse(fileobject) >>> print isinstance(tree.getroot(), objectify.ObjectifiedElement) True -or you can call the ``makeelement()`` method of the parser to create a new -root element from scratch:: + >>> tree = objectify.fromstring('') + >>> print isinstance(tree.getroot(), objectify.ObjectifiedElement) + True - >>> obj_el = parser.makeelement("test") +To build a new tree in memory, ``objectify`` replicates the standard +factory function ``Element()`` from ``lxml.etree``:: + + >>> obj_el = objectify.Element("new") >>> print isinstance(obj_el, objectify.ObjectifiedElement) True -New subelements will automatically inherit the setup. However, all -independent elements that you create through the normal etree API will not be -associated with the parser and therefore not support the ``objectify`` API:: +After creating such an Element, you can use the `usual API`_ of +lxml.etree to add SubElements to the tree:: + + >>> child = etree.SubElement(obj_el, "newchild", attr="value") + +.. _`usual API`: tutorial.html#the-element-class + +New subelements will automatically inherit the objectify behaviour +from their tree. However, all independent elements that you create +through the ``Element()`` factory of lxml.etree (instead of objectify) +will not support the ``objectify`` API by themselves:: >>> subel = etree.SubElement(obj_el, "sub") >>> print isinstance(subel, objectify.ObjectifiedElement) @@ -128,28 +114,6 @@ >>> print isinstance(independent_el, objectify.ObjectifiedElement) False -The ``makeelement()`` method of the parser has the same signature as the -normal ``Element()`` factory known from lxml.etree and can therefore easily -replace the respective calls. - -For convenience, ``objectify`` also replicates the standard factory -``Element()`` and the ``fromstring()`` function from ``lxml.etree`` using a -parser that is local to the ``objectify`` module. So, after setting up the -parser based element lookup above, you can keep using the same API as in -``lxml.etree``, except that you have to import these functions from a -different module:: - - >>> obj_el = objectify.Element("new") - >>> print isinstance(obj_el, objectify.ObjectifiedElement) - True - - >>> obj_el = objectify.fromstring("") - >>> print isinstance(obj_el, objectify.ObjectifiedElement) - True - -You can change this parser with ``objectify.setDefaultParser(parser)``, which -also allows to add the above support for namespace specific element classes. - Element access through object attributes ---------------------------------------- @@ -1024,7 +988,7 @@ The registration of data classes uses the ``PyType`` class:: >>> class ChristmasDate(objectify.ObjectifiedDataElement): - ... def callSanta(self): + ... def call_santa(self): ... print "Ho ho ho!" >>> def checkChristmasDate(date_string): @@ -1056,12 +1020,12 @@ >>> root = objectify.fromstring( ... "24.12.200012.24.2000") - >>> root.a.callSanta() + >>> root.a.call_santa() Ho ho ho! - >>> root.b.callSanta() + >>> root.b.call_santa() Traceback (most recent call last): ... - AttributeError: no such child: callSanta + AttributeError: no such child: call_santa If you need to specify dependencies between the type check functions, you can pass a sequence of type names through the ``before`` and ``after`` keyword @@ -1081,24 +1045,71 @@ ... ''') >>> print root.a 12.24.2000 - >>> root.a.callSanta() + >>> root.a.call_santa() Ho ho ho! To unregister a type, call its ``unregister()`` method:: - >>> root.a.callSanta() + >>> root.a.call_santa() Ho ho ho! >>> xmas_type.unregister() - >>> root.a.callSanta() + >>> root.a.call_santa() Traceback (most recent call last): ... - AttributeError: no such child: callSanta + AttributeError: no such child: call_santa Be aware, though, that this does not immediately apply to elements to which there already is a Python reference. Their Python class will only be changed after all references are gone and the Python object is garbage collected. +Advanced element class lookup +----------------------------- + +In some cases, the normal data class setup is not enough. Being based +on ``lxml.etree``, however, ``lxml.objectify`` supports very +fine-grained control over the Element classes used in a tree. All you +have to do is configure a different `class lookup`_ mechanism (or +write one yourself). + +.. _`class lookup`: element-classes.html + +The first step for the setup is to create a new parser that builds +objectify documents. The objectify API is meant for data-centric XML +(as opposed to document XML with mixed content). Therefore, we +configure the parser to let it remove whitespace-only text from the +parsed document if it is not enclosed by an XML element. Note that +this alters the document infoset, so if you consider the removed +spaces as data in your specific use case, you should go with a normal +parser and just set the element class lookup. Most applications, +however, will work fine with the following setup:: + + >>> parser = objectify.makeparser(remove_blank_text=True) + +What this does internally, is:: + + >>> parser = etree.XMLParser(remove_blank_text=True) + + >>> lookup = objectify.ObjectifyElementClassLookup() + >>> parser.set_element_class_lookup(lookup) + +If you want to change the lookup scheme, say, to get additional +support for `namespace specific classes`_, you can register the +objectify lookup as a fallback of the namespace lookup. In this case, +however, you have to take care that the namespace classes inherit from +``objectify.ObjectifiedElement``, not only from the normal +``lxml.etree.ElementBase``, so that they support the ``objectify`` +API. The above setup code then becomes:: + + >>> lookup = etree.ElementNamespaceClassLookup( + ... objectify.ObjectifyElementClassLookup() ) + >>> parser.set_element_class_lookup(lookup) + +.. _`namespace specific classes`: element_classes.html#namespace-class-lookup + +See the documentation on `class lookup` schemes for more information. + + What is different from lxml.etree? ================================== From scoder at codespeak.net Mon Jan 28 09:01:33 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 09:01:33 +0100 (CET) Subject: [Lxml-checkins] r51091 - lxml/tag/lxml-2.0beta2 Message-ID: <20080128080133.BC18416855B@codespeak.net> Author: scoder Date: Mon Jan 28 09:01:27 2008 New Revision: 51091 Added: lxml/tag/lxml-2.0beta2/ - copied from r51073, lxml/trunk/ Log: tag for lxml 2.0beta2 From scoder at codespeak.net Mon Jan 28 11:20:40 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 11:20:40 +0100 (CET) Subject: [Lxml-checkins] r51094 - in lxml/trunk: . src/lxml Message-ID: <20080128102040.A421116855B@codespeak.net> Author: scoder Date: Mon Jan 28 11:20:38 2008 New Revision: 51094 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.objectify.pyx Log: r3348 at delle: sbehnel | 2008-01-28 10:39:23 +0100 let objectify.makeparser() create a 'remove_blank_text' XML parser by default Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Mon Jan 28 11:20:38 2008 @@ -1627,8 +1627,12 @@ """Create a new XML parser for objectify trees. You can pass all keyword arguments that are supported by - ``etree.XMLParser()``. + ``etree.XMLParser()``. Note that this parser defaults to removing + blank text. You can disable this by passing the + ``remove_blank_text`` boolean keyword option yourself. """ + if 'remove_blank_text' not in kw: + kw['remove_blank_text'] = True parser = etree.XMLParser(**kw) parser.set_element_class_lookup( ObjectifyElementClassLookup() ) return parser From scoder at codespeak.net Mon Jan 28 11:20:46 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 11:20:46 +0100 (CET) Subject: [Lxml-checkins] r51095 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20080128102046.69DA016855E@codespeak.net> Author: scoder Date: Mon Jan 28 11:20:45 2008 New Revision: 51095 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/relaxng.pxi lxml/trunk/src/lxml/tests/test_relaxng.py lxml/trunk/src/lxml/tests/test_xmlschema.py lxml/trunk/src/lxml/xmlschema.pxi Log: r3349 at delle: sbehnel | 2008-01-28 11:05:44 +0100 support parsing from StringIO in XMLSchema/RelaxNG Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Mon Jan 28 11:20:45 2008 @@ -8,6 +8,8 @@ Features added -------------- +* ``XMLSchema()`` and ``RelaxNG()`` can parse from StringIO. + * ``makeparser()`` function in ``lxml.objectify`` to create a new parser with the usual objectify setup. Modified: lxml/trunk/src/lxml/relaxng.pxi ============================================================================== --- lxml/trunk/src/lxml/relaxng.pxi (original) +++ lxml/trunk/src/lxml/relaxng.pxi Mon Jan 28 11:20:45 2008 @@ -51,13 +51,14 @@ fake_c_doc = _fakeRootDoc(doc._c_doc, root_node._c_node) parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(fake_c_doc) elif file is not None: - filename = _getFilenameForFile(file) - if filename is None: - # XXX assume a string object - filename = file - filename = _encodeFilename(filename) - self._error_log.connect() - parser_ctxt = relaxng.xmlRelaxNGNewParserCtxt(_cstr(filename)) + if _isString(file): + filename = _encodeFilename(file) + self._error_log.connect() + parser_ctxt = relaxng.xmlRelaxNGNewParserCtxt(_cstr(filename)) + else: + doc = _parseDocument(file, None) + self._error_log.connect() + parser_ctxt = relaxng.xmlRelaxNGNewDocParserCtxt(doc._c_doc) else: raise RelaxNGParseError("No tree or file given") Modified: lxml/trunk/src/lxml/tests/test_relaxng.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_relaxng.py (original) +++ lxml/trunk/src/lxml/tests/test_relaxng.py Mon Jan 28 11:20:45 2008 @@ -6,7 +6,7 @@ import unittest -from common_imports import etree, doctest, HelperTestCase, fileInTestDir +from common_imports import etree, doctest, StringIO, HelperTestCase, fileInTestDir class ETreeRelaxNGTestCase(HelperTestCase): def test_relaxng(self): @@ -25,6 +25,22 @@ self.assert_(schema.validate(tree_valid)) self.assert_(not schema.validate(tree_invalid)) + def test_relaxng_stringio(self): + tree_valid = self.parse('') + tree_invalid = self.parse('') + schema_file = StringIO('''\ + + + + + + + +''') + schema = etree.RelaxNG(file=schema_file) + self.assert_(schema.validate(tree_valid)) + self.assert_(not schema.validate(tree_invalid)) + def test_relaxng_elementtree_error(self): self.assertRaises(ValueError, etree.RelaxNG, etree.ElementTree()) Modified: lxml/trunk/src/lxml/tests/test_xmlschema.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_xmlschema.py (original) +++ lxml/trunk/src/lxml/tests/test_xmlschema.py Mon Jan 28 11:20:45 2008 @@ -6,7 +6,7 @@ import unittest -from common_imports import etree, doctest, HelperTestCase, fileInTestDir +from common_imports import etree, doctest, StringIO, HelperTestCase, fileInTestDir class ETreeXMLSchemaTestCase(HelperTestCase): def test_xmlschema(self): @@ -46,6 +46,26 @@ self.assertRaises(etree.XMLSyntaxError, self.parse, '', parser=parser) + def test_xmlschema_stringio(self): + schema_file = StringIO(''' + + + + + + + + +''') + schema = etree.XMLSchema(file=schema_file) + parser = etree.XMLParser(schema=schema) + + tree_valid = self.parse('', parser=parser) + self.assertEquals('a', tree_valid.getroot().tag) + + self.assertRaises(etree.XMLSyntaxError, + self.parse, '', parser=parser) + def test_xmlschema_elementtree_error(self): self.assertRaises(ValueError, etree.XMLSchema, etree.ElementTree()) Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Mon Jan 28 11:20:45 2008 @@ -52,13 +52,14 @@ self._error_log.connect() parser_ctxt = xmlschema.xmlSchemaNewDocParserCtxt(fake_c_doc) elif file is not None: - filename = _getFilenameForFile(file) - if filename is None: - # XXX assume a string object - filename = file - filename = _encodeFilename(filename) - self._error_log.connect() - parser_ctxt = xmlschema.xmlSchemaNewParserCtxt(_cstr(filename)) + if _isString(file): + filename = _encodeFilename(file) + self._error_log.connect() + parser_ctxt = xmlschema.xmlSchemaNewParserCtxt(_cstr(filename)) + else: + doc = _parseDocument(file, None) + self._error_log.connect() + parser_ctxt = xmlschema.xmlSchemaNewDocParserCtxt(doc._c_doc) else: raise XMLSchemaParseError("No tree or file given") From scoder at codespeak.net Mon Jan 28 11:20:49 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 11:20:49 +0100 (CET) Subject: [Lxml-checkins] r51096 - in lxml/trunk: . doc Message-ID: <20080128102049.93042168560@codespeak.net> Author: scoder Date: Mon Jan 28 11:20:49 2008 New Revision: 51096 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/objectify.txt Log: r3350 at delle: sbehnel | 2008-01-28 11:20:10 +0100 objectify doc section on XML schema validation Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Jan 28 11:20:49 2008 @@ -48,17 +48,18 @@ 1.2 Element access through object attributes 1.3 Tree generation with the E-factory 1.4 Namespace handling - 2 ObjectPath - 3 Python data types - 3.1 Recursive tree dump - 3.2 Recursive string representation of elements - 4 How data types are matched - 4.1 Type annotations - 4.2 XML Schema datatype annotation - 4.3 The DataElement factory - 4.4 Defining additional data classes - 4.5 Advanced element class lookup - 5 What is different from lxml.etree? + 2 Asserting a Schema + 3 ObjectPath + 4 Python data types + 4.1 Recursive tree dump + 4.2 Recursive string representation of elements + 5 How data types are matched + 5.1 Type annotations + 5.2 XML Schema datatype annotation + 5.3 The DataElement factory + 5.4 Defining additional data classes + 5.5 Advanced element class lookup + 6 What is different from lxml.etree? The lxml.objectify API @@ -83,8 +84,8 @@ >>> print isinstance(tree.getroot(), objectify.ObjectifiedElement) True - >>> tree = objectify.fromstring('') - >>> print isinstance(tree.getroot(), objectify.ObjectifiedElement) + >>> root = objectify.fromstring('') + >>> print isinstance(root, objectify.ObjectifiedElement) True To build a new tree in memory, ``objectify`` replicates the standard @@ -358,6 +359,61 @@ TEXT +Asserting a Schema +================== + +When dealing with XML documents from different sources, it can often +be interesting to assure that they follow a common schema. See the +`documentation on validation`_ on this topic. + +In lxml.objectify, this directly translates to enforcing a specific +object tree, i.e. expected object attributes are ensured to be there +and to have the expected type. This can easily be achieved through +XML Schema validation at parse time. + +.. _`documentation on validation`: validation.html + +First of all, we need a parser that knows our schema, so let's say we +parse the schema from a file (or filename or file-like object):: + + >>> from StringIO import StringIO + >>> f = StringIO('''\ + ... + ... + ... + ... + ... + ... + ... + ... + ... ''') + >>> schema = etree.XMLSchema(file=f) + +When creating the validating parser, we must make sure it returns +objectify trees. This is best done with the ``makeparser()`` +function:: + + >>> parser = objectify.makeparser(schema = schema) + +Now we can use it to parse a valid document:: + + >>> xml = "test" + >>> a = objectify.fromstring(xml, parser) + + >>> print a.b + test + +Or an invalid document:: + + >>> xml = "test" + >>> a = objectify.fromstring(xml, parser) + Traceback (most recent call last): + XMLSyntaxError: Element 'c': This element is not expected. + +Note that the same works for parse-time DTD validation, except that it +does not support any data types by design. + + ObjectPath ========== From scoder at codespeak.net Mon Jan 28 11:29:43 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 11:29:43 +0100 (CET) Subject: [Lxml-checkins] r51097 - in lxml/trunk: . doc Message-ID: <20080128102943.1DD0A168565@codespeak.net> Author: scoder Date: Mon Jan 28 11:29:42 2008 New Revision: 51097 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/objectify.txt Log: r3355 at delle: sbehnel | 2008-01-28 11:29:13 +0100 doc cleanup Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Jan 28 11:29:42 2008 @@ -363,13 +363,12 @@ ================== When dealing with XML documents from different sources, it can often -be interesting to assure that they follow a common schema. See the -`documentation on validation`_ on this topic. - -In lxml.objectify, this directly translates to enforcing a specific +be interesting to assure that they follow a common schema. In +lxml.objectify, this directly translates to enforcing a specific object tree, i.e. expected object attributes are ensured to be there and to have the expected type. This can easily be achieved through -XML Schema validation at parse time. +XML Schema validation at parse time. Also see the `documentation on +validation`_ on this topic. .. _`documentation on validation`: validation.html From lxml-checkins at codespeak.net Mon Jan 28 12:08:38 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 28 Jan 2008 12:08:38 +0100 (CET) Subject: [Lxml-checkins] January 73% OFF Message-ID: <20080128050827.3018.qmail@ppp85-141-207-161.pppoe.mtu-net.ru> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080128/75b7ecd2/attachment.htm From scoder at codespeak.net Mon Jan 28 15:44:38 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 15:44:38 +0100 (CET) Subject: [Lxml-checkins] r51100 - in lxml/trunk: . doc Message-ID: <20080128144438.CEA45168408@codespeak.net> Author: scoder Date: Mon Jan 28 15:44:37 2008 New Revision: 51100 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/objectify.txt lxml/trunk/doc/validation.txt Log: r3357 at delle: sbehnel | 2008-01-28 15:44:16 +0100 doc cleanup Modified: lxml/trunk/doc/objectify.txt ============================================================================== --- lxml/trunk/doc/objectify.txt (original) +++ lxml/trunk/doc/objectify.txt Mon Jan 28 15:44:37 2008 @@ -362,18 +362,18 @@ Asserting a Schema ================== -When dealing with XML documents from different sources, it can often -be interesting to assure that they follow a common schema. In -lxml.objectify, this directly translates to enforcing a specific -object tree, i.e. expected object attributes are ensured to be there -and to have the expected type. This can easily be achieved through -XML Schema validation at parse time. Also see the `documentation on -validation`_ on this topic. +When dealing with XML documents from different sources, you will often +require them to follow a common schema. In lxml.objectify, this +directly translates to enforcing a specific object tree, i.e. expected +object attributes are ensured to be there and to have the expected +type. This can easily be achieved through XML Schema validation at +parse time. Also see the `documentation on validation`_ on this +topic. .. _`documentation on validation`: validation.html First of all, we need a parser that knows our schema, so let's say we -parse the schema from a file (or filename or file-like object):: +parse the schema from a file-like object (or file or filename):: >>> from StringIO import StringIO >>> f = StringIO('''\ @@ -388,12 +388,14 @@ ... ''') >>> schema = etree.XMLSchema(file=f) -When creating the validating parser, we must make sure it returns -objectify trees. This is best done with the ``makeparser()`` +When creating the validating parser, we must make sure it `returns +objectify trees`_. This is best done with the ``makeparser()`` function:: >>> parser = objectify.makeparser(schema = schema) +.. _`returns objectify trees`: #advance-element-class-lookup + Now we can use it to parse a valid document:: >>> xml = "test" @@ -409,8 +411,8 @@ Traceback (most recent call last): XMLSyntaxError: Element 'c': This element is not expected. -Note that the same works for parse-time DTD validation, except that it -does not support any data types by design. +Note that the same works for parse-time DTD validation, except that +DTDs do not support any data types by design. ObjectPath Modified: lxml/trunk/doc/validation.txt ============================================================================== --- lxml/trunk/doc/validation.txt (original) +++ lxml/trunk/doc/validation.txt Mon Jan 28 15:44:37 2008 @@ -75,6 +75,10 @@ Traceback (most recent call last): XMLSyntaxError: Element 'a': 'not int' is not a valid value of the atomic type 'xs:integer'. +If you want the parser to succeed regardless of the outcome of the +validation, you should use a non validating parser and run the +validation separately after parsing the document. + DTD --- From scoder at codespeak.net Mon Jan 28 18:47:20 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 28 Jan 2008 18:47:20 +0100 (CET) Subject: [Lxml-checkins] r51104 - in lxml/trunk: . doc Message-ID: <20080128174720.9B2E31683E3@codespeak.net> Author: scoder Date: Mon Jan 28 18:47:20 2008 New Revision: 51104 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/performance.txt Log: r3359 at delle: sbehnel | 2008-01-28 18:46:58 +0100 link to 'happy user' message from benchmark page Modified: lxml/trunk/doc/performance.txt ============================================================================== --- lxml/trunk/doc/performance.txt (original) +++ lxml/trunk/doc/performance.txt Mon Jan 28 18:47:20 2008 @@ -10,10 +10,13 @@ :keywords: lxml performance, lxml.etree, lxml.objectify, benchmarks, ElementTree -As an XML library, lxml.etree is very fast. It is also slow. As with all -software, it depends on what you do with it. Rest assured that lxml is fast -enough for most applications, so lxml is probably somewhere between 'fast -enough' and 'the best choice' for yours. +As an XML library, lxml.etree is very fast. It is also slow. As with +all software, it depends on what you do with it. Rest assured that +lxml is fast enough for most applications, so lxml is probably +somewhere between 'fast enough' and 'the best choice' for yours. Read +this `message from a happy user`_ to see what we mean. + +.. _`message from a happy user`: http://permalink.gmane.org/gmane.comp.python.lxml.devel/3244 This text describes where lxml.etree (abbreviated to 'lxe') excels, gives hints on some performance traps and compares the overall performance to the From scoder at codespeak.net Tue Jan 29 08:14:05 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 29 Jan 2008 08:14:05 +0100 (CET) Subject: [Lxml-checkins] r51107 - in lxml/trunk: . src/lxml Message-ID: <20080129071405.4B74F16841D@codespeak.net> Author: scoder Date: Tue Jan 29 08:14:03 2008 New Revision: 51107 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r3361 at delle: sbehnel | 2008-01-29 08:12:56 +0100 docstring update Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Tue Jan 29 08:14:03 2008 @@ -661,9 +661,8 @@ _appendChild(self, element) def clear(self): - """Resets an element. This function removes all subelements, - clears all attributes and sets the text and tail - attributes to None. + """Resets an element. This function removes all subelements, clears + all attributes and sets the text and tail properties to None. """ cdef xmlAttr* c_attr cdef xmlAttr* c_attr_next @@ -1666,6 +1665,10 @@ cdef class _Attrib: + """A proxy for the ``Element.attrib`` property. + + Behaves as a normal Python dict. + """ cdef _Element _element def __init__(self, _Element element not None): self._element = element From scoder at codespeak.net Tue Jan 29 08:14:08 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 29 Jan 2008 08:14:08 +0100 (CET) Subject: [Lxml-checkins] r51108 - in lxml/trunk: . doc Message-ID: <20080129071408.92EC716841E@codespeak.net> Author: scoder Date: Tue Jan 29 08:14:07 2008 New Revision: 51108 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/performance.txt Log: r3362 at delle: sbehnel | 2008-01-29 08:13:37 +0100 doc update Modified: lxml/trunk/doc/performance.txt ============================================================================== --- lxml/trunk/doc/performance.txt (original) +++ lxml/trunk/doc/performance.txt Tue Jan 29 08:14:07 2008 @@ -14,9 +14,9 @@ all software, it depends on what you do with it. Rest assured that lxml is fast enough for most applications, so lxml is probably somewhere between 'fast enough' and 'the best choice' for yours. Read -this `message from a happy user`_ to see what we mean. +these `messages from happy users`_ to see what we mean. -.. _`message from a happy user`: http://permalink.gmane.org/gmane.comp.python.lxml.devel/3244 +.. _`messages from happy users`: http://thread.gmane.org/gmane.comp.python.lxml.devel/3244/focus=3244 This text describes where lxml.etree (abbreviated to 'lxe') excels, gives hints on some performance traps and compares the overall performance to the From lxml-checkins at codespeak.net Tue Jan 29 17:07:40 2008 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 29 Jan 2008 17:07:40 +0100 (CET) Subject: [Lxml-checkins] Your Featured Products for the Week‏‏ Message-ID: <20080129060722.2523.qmail@dzi137.neoplus.adsl.tpnet.pl> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20080129/7fb9b0aa/attachment.htm From scoder at codespeak.net Wed Jan 30 16:49:07 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 30 Jan 2008 16:49:07 +0100 (CET) Subject: [Lxml-checkins] r51134 - in lxml/trunk: . doc Message-ID: <20080130154907.110A416841D@codespeak.net> Author: scoder Date: Wed Jan 30 16:49:06 2008 New Revision: 51134 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt lxml/trunk/doc/performance.txt Log: r3365 at delle: sbehnel | 2008-01-30 16:48:39 +0100 doc references to user experience reports Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Wed Jan 30 16:49:06 2008 @@ -18,9 +18,10 @@ 1.1 Is there a tutorial? 1.2 Where can I find more documentation about lxml? 1.3 What standards does lxml implement? - 1.4 What is the difference between lxml.etree and lxml.objectify? - 1.5 How can I make my application run faster? - 1.6 What about that trailing text on serialised Elements? + 1.4 Who uses lxml? + 1.5 What is the difference between lxml.etree and lxml.objectify? + 1.6 How can I make my application run faster? + 1.7 What about that trailing text on serialised Elements? 2 Installation 2.1 Which version of libxml2 and libxslt should I use or require? 2.2 Where are the Windows binaries? @@ -103,6 +104,44 @@ libxml2 also supports loading documents through HTTP and FTP. +Who uses lxml? +-------------- + +As an XML library, lxml is often used under the hood of in-house +server applications, such as web servers or applications that +facilitate some kind of document management. Therefore, it is hard to +get an idea of who uses it, and the following list of 'users and +projects we know of' is definitely not a complete list of lxml's +users. + +Also note that the compatibility to the ElementTree library does not +require projects to set a hard dependency on lxml - as long as they do +not need lxml's enhanced feature set. + +* Deliverance_, a content theming tool +* gocept.lxml_, Zope3 interface bindings for lxml +* Inteproxy_, a secure HTTP proxy +* lwebstring_, an XML template engine +* OpenXMLlib_, a library for handling OpenXML document meta data +* Pycoon_, a WSGI web development framework based on XML pipelines +* rfadict_, an RDFa parser wth a simple dictionary-like interface. + +And a couple of generally happy_ users_, and other `sites that link to +lxml`_. + +.. _Deliverance: http://www.openplans.org/projects/deliverance/project-home +.. _gocept.lxml: http://pypi.python.org/pypi/gocept.lxml +.. _Inteproxy: http://lists.wald.intevation.org/pipermail/inteproxy-devel/2007-February/000000.html +.. _lwebstring: http://pypi.python.org/pypi/lwebstring +.. _OpenXMLlib: http://permalink.gmane.org/gmane.comp.python.lxml.devel/3250 +.. _Pycoon: http://pypi.python.org/pypi/pycoon +.. _rfadict: http://pypi.python.org/pypi/rdfadict + +.. _happy: http://thread.gmane.org/gmane.comp.python.lxml.devel/3244/focus=3244 +.. _users: http://article.gmane.org/gmane.comp.python.lxml.devel/3246 +.. _`sites that link to lxml`: http://www.google.com/search?as_lq=http%3A%2F%2Fcodespeak.net%2Flxml + + What is the difference between lxml.etree and lxml.objectify? ------------------------------------------------------------- Modified: lxml/trunk/doc/performance.txt ============================================================================== --- lxml/trunk/doc/performance.txt (original) +++ lxml/trunk/doc/performance.txt Wed Jan 30 16:49:06 2008 @@ -14,9 +14,11 @@ all software, it depends on what you do with it. Rest assured that lxml is fast enough for most applications, so lxml is probably somewhere between 'fast enough' and 'the best choice' for yours. Read -these `messages from happy users`_ to see what we mean. +some messages_ from happy_ users_ to see what we mean. -.. _`messages from happy users`: http://thread.gmane.org/gmane.comp.python.lxml.devel/3244/focus=3244 +.. _messages: http://permalink.gmane.org/gmane.comp.python.lxml.devel/3250 +.. _happy: http://article.gmane.org/gmane.comp.python.lxml.devel/3246 +.. _users: http://thread.gmane.org/gmane.comp.python.lxml.devel/3244/focus=3244 This text describes where lxml.etree (abbreviated to 'lxe') excels, gives hints on some performance traps and compares the overall performance to the From scoder at codespeak.net Thu Jan 31 15:31:22 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 31 Jan 2008 15:31:22 +0100 (CET) Subject: [Lxml-checkins] r51154 - in lxml/trunk: . src/lxml Message-ID: <20080131143122.BC62C1684C2@codespeak.net> Author: scoder Date: Thu Jan 31 15:31:22 2008 New Revision: 51154 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r3367 at delle: sbehnel | 2008-01-31 14:57:24 +0100 added default prefix for XSLT namespace Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Jan 31 15:31:22 2008 @@ -74,6 +74,7 @@ cdef object _DEFAULT_NAMESPACE_PREFIXES _DEFAULT_NAMESPACE_PREFIXES = { "http://www.w3.org/1999/xhtml": "html", + "http://www.w3.org/1999/XSL/Transform": "xsl", "http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf", "http://schemas.xmlsoap.org/wsdl/": "wsdl", # xml schema From scoder at codespeak.net Thu Jan 31 15:31:27 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 31 Jan 2008 15:31:27 +0100 (CET) Subject: [Lxml-checkins] r51155 - in lxml/trunk: . src/lxml Message-ID: <20080131143127.965A61684C6@codespeak.net> Author: scoder Date: Thu Jan 31 15:31:27 2008 New Revision: 51155 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r3368 at delle: sbehnel | 2008-01-31 15:00:31 +0100 signature cleanup Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Jan 31 15:31:27 2008 @@ -925,7 +925,7 @@ def __reversed__(self): return ElementChildIterator(self, reversed=True) - def index(self, _Element x not None, start=None, stop=None): + def index(self, _Element child not None, start=None, stop=None): """Find the position of the child within the parent. This method is not part of the original ElementTree API. @@ -934,7 +934,7 @@ cdef Py_ssize_t c_start, c_stop cdef xmlNode* c_child cdef xmlNode* c_start_node - c_child = x._c_node + c_child = child._c_node if c_child.parent is not self._c_node: raise ValueError("Element is not a child of this node.") From scoder at codespeak.net Thu Jan 31 15:31:32 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 31 Jan 2008 15:31:32 +0100 (CET) Subject: [Lxml-checkins] r51156 - in lxml/trunk: . doc Message-ID: <20080131143132.4D268168451@codespeak.net> Author: scoder Date: Thu Jan 31 15:31:31 2008 New Revision: 51156 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r3369 at delle: sbehnel | 2008-01-31 15:30:50 +0100 link to cssutils Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Thu Jan 31 15:31:31 2008 @@ -118,17 +118,19 @@ require projects to set a hard dependency on lxml - as long as they do not need lxml's enhanced feature set. +* cssutils_, a CSS parser and toolkit, can be used with ``lxml.cssselect`` * Deliverance_, a content theming tool * gocept.lxml_, Zope3 interface bindings for lxml * Inteproxy_, a secure HTTP proxy * lwebstring_, an XML template engine * OpenXMLlib_, a library for handling OpenXML document meta data * Pycoon_, a WSGI web development framework based on XML pipelines -* rfadict_, an RDFa parser wth a simple dictionary-like interface. +* rfadict_, an RDFa parser with a simple dictionary-like interface. And a couple of generally happy_ users_, and other `sites that link to lxml`_. +.. _cssutils: http://code.google.com/p/cssutils/source/browse/trunk/examples/style.py?r=917 .. _Deliverance: http://www.openplans.org/projects/deliverance/project-home .. _gocept.lxml: http://pypi.python.org/pypi/gocept.lxml .. _Inteproxy: http://lists.wald.intevation.org/pipermail/inteproxy-devel/2007-February/000000.html