From scoder at codespeak.net Thu Apr 1 22:28:41 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 Apr 2010 22:28:41 +0200 (CEST) Subject: [Lxml-checkins] r73254 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20100401202841.B98EF282BD8@codespeak.net> Author: scoder Date: Thu Apr 1 22:28:37 2010 New Revision: 73254 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/proxy.pxi lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/tests/test_etree.py Log: r5546 at lenny: sbehnel | 2010-04-01 22:28:22 +0200 support C14N serialisation through 'c14n' serialisation method, also in tostring() Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Apr 1 22:28:37 2010 @@ -8,6 +8,12 @@ Features added -------------- +* Support 'unicode' string name as encoding parameter in + ``tostring()``, following ElementTree 1.3. + +* Support 'c14n' serialisation method in ``ElementTree.write()`` and + ``tostring()``, following ElementTree 1.3. + * During regular XPath evaluation, various ESXLT functions are available within their namespace when using libxslt 1.1.26 or later. Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Apr 1 22:28:37 2010 @@ -1708,17 +1708,23 @@ def write(self, file, *, encoding=None, method=u"xml", pretty_print=False, xml_declaration=None, with_tail=True, - standalone=None, docstring=None, compression=0): + standalone=None, docstring=None, compression=0, + exclusive=False, with_comments=True): u"""write(self, file, encoding=None, method="xml", pretty_print=False, xml_declaration=None, with_tail=True, - standalone=None, compression=0) + standalone=None, compression=0, + exclusive=False, with_comments=True) Write the tree to a filename, file or file-like object. Defaults to ASCII encoding and writing a declaration as needed. - The keyword argument 'method' selects the output method: 'xml' or - 'html'. + The keyword argument 'method' selects the output method: + 'xml', 'html', 'text' or 'c14n'. Default is 'xml'. + + The ``exclusive`` and ``with_comments`` arguments are only + used with C14N output, where they request exclusive and + uncommented C14N serialisation respectively. Passing a boolean value to the ``standalone`` option will output an XML declaration with the corresponding @@ -1729,6 +1735,19 @@ cdef bint write_declaration cdef int is_standalone self._assertHasRoot() + if compression is None or compression < 0: + compression = 0 + # C14N serialisation + if method == 'c14n': + if encoding is not None: + raise ValueError("Cannot specify encoding with C14N") + if xml_declaration: + raise ValueError("Cannot enable XML declaration in C14N") + _tofilelikeC14N(file, self._context_node, exclusive, with_comments, + compression) + return + if not with_comments: + raise ValueError("Can only discard comments in C14N serialisation") # suppress decl. in default case (purely for ElementTree compatibility) if xml_declaration is not None: write_declaration = xml_declaration @@ -1749,8 +1768,6 @@ else: write_declaration = 1 is_standalone = 0 - if compression is None or compression < 0: - compression = 0 _tofilelike(file, self._context_node, encoding, docstring, method, write_declaration, 1, pretty_print, with_tail, is_standalone, compression) @@ -2659,26 +2676,36 @@ def tostring(element_or_tree, *, encoding=None, method=u"xml", xml_declaration=None, pretty_print=False, with_tail=True, - standalone=None, doctype=None): + standalone=None, doctype=None, + exclusive=False, with_comments=True): u"""tostring(element_or_tree, encoding=None, method="xml", xml_declaration=None, pretty_print=False, with_tail=True, - standalone=None, doctype=None) + standalone=None, doctype=None, + exclusive=False, with_comments=True) Serialize an element to an encoded string representation of its XML tree. - Defaults to ASCII encoding without XML declaration. This behaviour can be - configured with the keyword arguments 'encoding' (string) and - 'xml_declaration' (bool). Note that changing the encoding to a non UTF-8 - compatible encoding will enable a declaration by default. + Defaults to ASCII encoding without XML declaration. This + behaviour can be configured with the keyword arguments 'encoding' + (string) and 'xml_declaration' (bool). Note that changing the + encoding to a non UTF-8 compatible encoding will enable a + declaration by default. You can also serialise to a Unicode string without declaration by - passing the ``unicode`` function as encoding (or ``str`` in Py3). + passing the ``unicode`` function as encoding (or ``str`` in Py3), + or the name 'unicode'. This changes the return value from a byte + string to an unencoded unicode string. The keyword argument 'pretty_print' (bool) enables formatted XML. The keyword argument 'method' selects the output method: 'xml', - 'html' or plain 'text'. + 'html', plain 'text' (text content without tags) or 'c14n'. + Default is 'xml'. + + The ``exclusive`` and ``with_comments`` arguments are only used + with C14N output, where they request exclusive and uncommented + C14N serialisation respectively. Passing a boolean value to the ``standalone`` option will output an XML declaration with the corresponding ``standalone`` flag. @@ -2693,11 +2720,21 @@ """ cdef bint write_declaration cdef int is_standalone - if encoding is _unicode: + # C14N serialisation + if method == 'c14n': + if encoding is not None: + raise ValueError("Cannot specify encoding with C14N") + if xml_declaration: + raise ValueError("Cannot enable XML declaration in C14N") + return _tostringC14N(element_or_tree, exclusive, with_comments) + if not with_comments: + raise ValueError("Can only discard comments in C14N serialisation") + if encoding is _unicode or (encoding is not None and encoding.upper() == 'UNICODE'): if xml_declaration: raise ValueError, \ u"Serialisation to unicode must not request an XML declaration" write_declaration = 0 + encoding = _unicode elif xml_declaration is None: # by default, write an XML declaration only for non-standard encodings write_declaration = encoding is not None and encoding.upper() not in \ Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Thu Apr 1 22:28:37 2010 @@ -61,6 +61,10 @@ # temporarily make a node the root node of its document cdef xmlDoc* _fakeRootDoc(xmlDoc* c_base_doc, xmlNode* c_node) except NULL: + return _plainFakeRootDoc(c_base_doc, c_node, 1) + +cdef xmlDoc* _plainFakeRootDoc(xmlDoc* c_base_doc, xmlNode* c_node, + bint with_siblings) except NULL: # build a temporary document that has the given node as root node # note that copy and original must not be modified during its lifetime!! # always call _destroyFakeDoc() after use! @@ -68,10 +72,11 @@ cdef xmlNode* c_root cdef xmlNode* c_new_root cdef xmlDoc* c_doc - c_root = tree.xmlDocGetRootElement(c_base_doc) - if c_root is c_node: - # already the root node - return c_base_doc + if with_siblings or (c_node.prev is NULL and c_node.next is NULL): + c_root = tree.xmlDocGetRootElement(c_base_doc) + if c_root is c_node: + # already the root node, no siblings + return c_base_doc c_doc = _copyDoc(c_base_doc, 0) # non recursive! c_new_root = tree.xmlDocCopyNode(c_node, c_doc, 2) # non recursive! Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Thu Apr 1 22:28:37 2010 @@ -142,6 +142,37 @@ _raiseSerialisationError(error_result) return result +cdef bytes _tostringC14N(element_or_tree, bint exclusive, bint with_comments): + cdef xmlDoc* c_doc + cdef char* c_buffer = NULL + cdef int byte_count = -1 + cdef bytes result + cdef _Document doc + cdef _Element element + + if isinstance(element_or_tree, _Element): + doc = (<_Element>element_or_tree)._doc + c_doc = _plainFakeRootDoc(doc._c_doc, (<_Element>element_or_tree)._c_node, 0) + else: + doc = _documentOrRaise(element_or_tree) + c_doc = doc._c_doc + + with nogil: + byte_count = c14n.xmlC14NDocDumpMemory( + c_doc, NULL, exclusive, NULL, with_comments, &c_buffer) + + _destroyFakeDoc(doc._c_doc, c_doc) + + if byte_count < 0 or c_buffer is NULL: + if c_buffer is not NULL: + tree.xmlFree(c_buffer) + raise C14NError, u"C14N failed" + try: + result = c_buffer[:byte_count] + finally: + tree.xmlFree(c_buffer) + return result + cdef _raiseSerialisationError(int error_result): if error_result == xmlerror.XML_ERR_NO_MEMORY: return python.PyErr_NoMemory() Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Thu Apr 1 22:28:37 2010 @@ -3029,6 +3029,30 @@ self.assertEquals(_bytes(''), s) + def test_c14n_tostring_with_comments(self): + tree = self.parse(_bytes('')) + s = etree.tostring(tree, method='c14n') + self.assertEquals(_bytes('\n\n'), + s) + s = etree.tostring(tree, method='c14n', with_comments=True) + self.assertEquals(_bytes('\n\n'), + s) + s = etree.tostring(tree, method='c14n', with_comments=False) + self.assertEquals(_bytes(''), + s) + + def test_c14n_element_tostring_with_comments(self): + tree = self.parse(_bytes('')) + s = etree.tostring(tree.getroot(), method='c14n') + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree.getroot(), method='c14n', with_comments=True) + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree.getroot(), method='c14n', with_comments=False) + self.assertEquals(_bytes(''), + s) + def test_c14n_exclusive(self): tree = self.parse(_bytes( '')) @@ -3048,6 +3072,39 @@ self.assertEquals(_bytes(''), s) + def test_c14n_tostring_exclusive(self): + tree = self.parse(_bytes( + '')) + s = etree.tostring(tree, method='c14n') + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree, method='c14n', exclusive=False) + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree, method='c14n', exclusive=True) + self.assertEquals(_bytes(''), + s) + + def test_c14n_element_tostring_exclusive(self): + tree = self.parse(_bytes( + '')) + s = etree.tostring(tree.getroot(), method='c14n') + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree.getroot(), method='c14n', exclusive=False) + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree.getroot(), method='c14n', exclusive=True) + self.assertEquals(_bytes(''), + s) + + s = etree.tostring(tree.getroot()[0], method='c14n', exclusive=False) + self.assertEquals(_bytes(''), + s) + s = etree.tostring(tree.getroot()[0], method='c14n', exclusive=True) + self.assertEquals(_bytes(''), + s) + class ETreeWriteTestCase(HelperTestCase): def test_write(self): From scoder at codespeak.net Thu Apr 1 22:30:03 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 Apr 2010 22:30:03 +0200 (CEST) Subject: [Lxml-checkins] r73255 - lxml/trunk Message-ID: <20100401203003.EB11F282BD8@codespeak.net> Author: scoder Date: Thu Apr 1 22:30:02 2010 New Revision: 73255 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r5548 at lenny: sbehnel | 2010-04-01 22:29:57 +0200 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Apr 1 22:30:02 2010 @@ -14,6 +14,11 @@ * Support 'c14n' serialisation method in ``ElementTree.write()`` and ``tostring()``, following ElementTree 1.3. +* The ElementPath expression syntax (``el.find*()``) was extended to + match the upcoming ElementTree 1.3 that will ship in the standard + library of Python 3.2/2.7. This includes extended support for + predicates as well as namespace prefixes (as known from XPath). + * During regular XPath evaluation, various ESXLT functions are available within their namespace when using libxslt 1.1.26 or later. From scoder at codespeak.net Thu Apr 1 23:09:26 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 Apr 2010 23:09:26 +0200 (CEST) Subject: [Lxml-checkins] r73257 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20100401210926.77C3B282BD8@codespeak.net> Author: scoder Date: Thu Apr 1 23:09:24 2010 New Revision: 73257 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/tests/test_elementtree.py Log: r5550 at lenny: sbehnel | 2010-04-01 23:09:19 +0200 allow user code to globally register namespace prefixes Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Apr 1 23:09:24 2010 @@ -8,6 +8,11 @@ Features added -------------- +* New function ``lxml.etree.register_namespace(prefix, uri)`` that + globally registers a namespace prefix for a namespace that newly + created Elements in that namespace will use automatically. Follows + ElementTree 1.3. + * Support 'unicode' string name as encoding parameter in ``tostring()``, following ElementTree 1.3. Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Apr 1 23:09:24 2010 @@ -34,7 +34,7 @@ 'iterparse', 'iterwalk', 'parse', 'parseid', 'set_default_parser', 'set_element_class_lookup', 'strip_attributes', 'strip_elements', 'strip_tags', 'tostring', 'tostringlist', 'tounicode', - 'use_global_python_log' + 'use_global_python_log', 'register_namespace' ] cimport tree, python, config @@ -140,6 +140,25 @@ b"http://codespeak.net/lxml/objectify/pytype" : b"py", } +cdef object _check_internal_prefix = re.compile(b"ns\d+$").match + +def register_namespace(prefix, uri): + u"""Registers a namespace prefix that newly created Elements in that + namespace will use. The registry is global, and any existing + mapping for either the given prefix or the namespace URI will be + removed. + """ + prefix_utf, uri_utf = _utf8(prefix), _utf8(uri) + if _check_internal_prefix(prefix_utf): + raise ValueError("Prefix format reserved for internal use") + _tagValidOrRaise(prefix_utf) + _uriValidOrRaise(uri_utf) + for k, v in _DEFAULT_NAMESPACE_PREFIXES.items(): + if k == uri_utf or v == prefix_utf: + del _DEFAULT_NAMESPACE_PREFIXES[k] + _DEFAULT_NAMESPACE_PREFIXES[uri_utf] = prefix_utf + + # Error superclass for ElementTree compatibility class Error(Exception): pass Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Thu Apr 1 23:09:24 2010 @@ -2619,6 +2619,26 @@ root2 = fromstring(xml2) self.assertEquals('TEST', root[0].get('{%s}a' % ns_href)) + required_versions_ET['test_itertext'] = (1,3) + def test_register_namespace(self): + # ET 1.3+ + Element = self.etree.Element + tostring = self.etree.tostring + prefix = 'TESTPREFIX' + namespace = 'http://seriously.unknown/namespace/URI' + + el = Element('{%s}test' % namespace) + self.assertEquals(_bytes('' % namespace), + self._writeElement(el)) + + self.etree.register_namespace(prefix, namespace) + el = Element('{%s}test' % namespace) + self.assertEquals(_bytes('<%s:test xmlns:%s="%s">' % ( + prefix, prefix, namespace, prefix)), + self._writeElement(el)) + + self.assertRaises(ValueError, self.etree.register_namespace, 'ns25', namespace) + def test_tostring(self): tostring = self.etree.tostring Element = self.etree.Element From scoder at codespeak.net Thu Apr 1 23:34:15 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 Apr 2010 23:34:15 +0200 (CEST) Subject: [Lxml-checkins] r73260 - in lxml/trunk: . doc Message-ID: <20100401213415.13CCD282BD8@codespeak.net> Author: scoder Date: Thu Apr 1 23:34:13 2010 New Revision: 73260 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/element_classes.txt Log: r5552 at lenny: sbehnel | 2010-04-01 23:34:05 +0200 doc update: proxy guaranteed to stay alive as long as referenced Modified: lxml/trunk/doc/element_classes.txt ============================================================================== --- lxml/trunk/doc/element_classes.txt (original) +++ lxml/trunk/doc/element_classes.txt Thu Apr 1 23:34:13 2010 @@ -67,18 +67,44 @@ an ``__init___`` or ``__new__`` method. There should not be any internal state either, except for the data stored in the underlying XML tree. Element instances are created and garbage collected at -need, so there is no way to predict when and how often a proxy is -created for them. Even worse, when the ``__init__`` method is called, -the object is not even initialized yet to represent the XML tag, so -there is not much use in providing an ``__init__`` method in +need, so there is normally no way to predict when and how often a +proxy is created for them. Even worse, when the ``__init__`` method +is called, the object is not even initialized yet to represent the XML +tag, so there is not much use in providing an ``__init__`` method in subclasses. -Most use cases will not require any class initialisation, so you can content -yourself with skipping to the next section for now. However, if you really -need to set up your element class on instantiation, there is one possible way -to do so. ElementBase classes have an ``_init()`` method that can be -overridden. It can be used to modify the XML tree, e.g. to construct special -children or verify and update attributes. +Most use cases will not require any class initialisation or proxy +state, so you can content yourself with skipping to the next section +for now. However, if you really need to set up your element class on +instantiation, or need a way to persistently store state in the proxy +instances instead of the XML tree, here is a way to do so. + +There is one important guarantee regarding Element proxies. Once a +proxy has been instantiated, it will keep alive as long as there is a +Python reference to it, and any access to the XML element in the tree +will return this very instance. Therefore, if you need to store local +state in a custom Element class (which is generally discouraged), you +can do so by keeping the Elements in a tree alive. If the tree +doesn't change, you can simply do this: + +.. sourcecode:: python + + proxy_cache = list(root.iter()) + +or + +.. sourcecode:: python + + proxy_cache = set(root.iter()) + +or use any other suitable container. Note that you have to keep this +cache manually up to date if the tree changes, which can get tricky in +cases. + +For proxy initialisation, ElementBase classes have an ``_init()`` +method that can be overridden, as oppose to the normal ``__init__()`` +method. It can be used to modify the XML tree, e.g. to construct +special children or verify and update attributes. The semantics of ``_init()`` are as follows: From scoder at codespeak.net Thu Apr 1 23:50:34 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 Apr 2010 23:50:34 +0200 (CEST) Subject: [Lxml-checkins] r73261 - in lxml/trunk: . src/lxml Message-ID: <20100401215034.77215282BD8@codespeak.net> Author: scoder Date: Thu Apr 1 23:50:32 2010 New Revision: 73261 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/parser.pxi Log: r5554 at lenny: sbehnel | 2010-04-01 23:50:28 +0200 export XMLParser as XMLTreeBuilder, as ET 1.2 calls it Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu Apr 1 23:50:32 2010 @@ -74,6 +74,9 @@ Bugs fixed ---------- +* Export ElementTree compatible XML parser class as + ``XMLTreeBuilder``, as it is called in ET 1.2. + * ObjectifiedDataElements in lxml.objectify were not hashable. * Crash in XPath evaluation when reading smart strings from a document Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Thu Apr 1 23:50:32 2010 @@ -25,16 +25,16 @@ 'SubElement', 'TreeBuilder', 'XInclude', 'XIncludeError', 'XML', 'XMLDTDID', 'XMLID', 'XMLParser', 'XMLSchema', 'XMLSchemaError', 'XMLSchemaParseError', 'XMLSchemaValidateError', 'XMLSyntaxError', - 'XPath', 'XPathDocumentEvaluator', 'XPathError', 'XPathEvalError', - 'XPathEvaluator', 'XPathFunctionError', 'XPathResultError', + 'XMLTreeBuilder', 'XPath', 'XPathDocumentEvaluator', 'XPathError', + 'XPathEvalError', 'XPathEvaluator', 'XPathFunctionError', 'XPathResultError', 'XPathSyntaxError', 'XSLT', 'XSLTAccessControl', 'XSLTApplyError', 'XSLTError', 'XSLTExtension', 'XSLTExtensionError', 'XSLTParseError', 'XSLTSaveError', 'cleanup_namespaces', 'clear_error_log', 'dump', 'fromstring', 'fromstringlist', 'get_default_parser', 'iselement', - 'iterparse', 'iterwalk', 'parse', 'parseid', 'set_default_parser', - 'set_element_class_lookup', 'strip_attributes', 'strip_elements', - 'strip_tags', 'tostring', 'tostringlist', 'tounicode', - 'use_global_python_log', 'register_namespace' + 'iterparse', 'iterwalk', 'parse', 'parseid', 'register_namespace', + 'set_default_parser', 'set_element_class_lookup', 'strip_attributes', + 'strip_elements', 'strip_tags', 'tostring', 'tostringlist', 'tounicode', + 'use_global_python_log' ] cimport tree, python, config Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Thu Apr 1 23:50:32 2010 @@ -1306,6 +1306,9 @@ encoding=encoding, schema=schema) +# ET 1.2 compatible name +XMLTreeBuilder = ETCompatXMLParser + cdef XMLParser __DEFAULT_XML_PARSER __DEFAULT_XML_PARSER = XMLParser() From lxml-checkins at codespeak.net Fri Apr 2 00:57:07 2010 From: lxml-checkins at codespeak.net (Selling Viagra on-line since 1997) Date: Fri, 2 Apr 2010 01:57:07 +0300 Subject: [Lxml-checkins] Hot Sale, lxml-checkins! 77% off on top goods Ahanuwayg Message-ID: <20100401225706.BCBDA36C21E@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20100402/d2523615/attachment.htm From scoder at codespeak.net Fri Apr 2 07:43:44 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 Apr 2010 07:43:44 +0200 (CEST) Subject: [Lxml-checkins] r73283 - lxml/trunk Message-ID: <20100402054344.8E375282B9C@codespeak.net> Author: scoder Date: Fri Apr 2 07:43:43 2010 New Revision: 73283 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r5556 at lenny: sbehnel | 2010-04-02 07:43:38 +0200 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Apr 2 07:43:43 2010 @@ -77,7 +77,9 @@ * Export ElementTree compatible XML parser class as ``XMLTreeBuilder``, as it is called in ET 1.2. -* ObjectifiedDataElements in lxml.objectify were not hashable. +* ObjectifiedDataElements in lxml.objectify were not hashable. They + now use the hash value of the underlying Python value (string, + number, etc.) to which they compare equal. * Crash in XPath evaluation when reading smart strings from a document other than the original context document. From scoder at codespeak.net Tue Apr 6 22:14:46 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Apr 2010 22:14:46 +0200 (CEST) Subject: [Lxml-checkins] r73467 - in lxml/trunk: . src/lxml Message-ID: <20100406201446.0D329282B9C@codespeak.net> Author: scoder Date: Tue Apr 6 22:14:45 2010 New Revision: 73467 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/parser.pxi Log: r5558 at lenny: sbehnel | 2010-04-02 18:01:37 +0200 docstring fix Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Tue Apr 6 22:14:45 2010 @@ -1271,7 +1271,7 @@ u"""ETCompatXMLParser(self, encoding=None, attribute_defaults=False, \ dtd_validation=False, load_dtd=False, no_network=True, \ ns_clean=False, recover=False, schema=None, \ - remove_blank_text=False, resolve_entities=True, \ + huge_tree=False, remove_blank_text=False, resolve_entities=True, \ remove_comments=True, remove_pis=True, strip_cdata=True, \ target=None, compact=True) From scoder at codespeak.net Tue Apr 6 22:15:14 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Apr 2010 22:15:14 +0200 (CEST) Subject: [Lxml-checkins] r73468 - in lxml/trunk: . src/lxml Message-ID: <20100406201514.5A160282B9C@codespeak.net> Author: scoder Date: Tue Apr 6 22:15:12 2010 New Revision: 73468 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/classlookup.pxi Log: r5559 at lenny: sbehnel | 2010-04-06 22:14:22 +0200 prevent crash when instantiating CommentBase and friends Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Apr 6 22:15:12 2010 @@ -74,6 +74,8 @@ Bugs fixed ---------- +* Prevent crash when instantiating ``CommentBase`` and friends. + * Export ElementTree compatible XML parser class as ``XMLTreeBuilder``, as it is called in ET 1.2. Modified: lxml/trunk/src/lxml/classlookup.pxi ============================================================================== --- lxml/trunk/src/lxml/classlookup.pxi (original) +++ lxml/trunk/src/lxml/classlookup.pxi Tue Apr 6 22:15:12 2010 @@ -101,8 +101,7 @@ cdef class CommentBase(_Comment): u"""All custom Comment classes must inherit from this one. - Note that you cannot (and must not) instantiate this class or its - subclasses. + To create an XML Comment instance, use the ``Comment()`` factory. Subclasses *must not* override __init__ or __new__ as it is absolutely undefined when these objects will be created or @@ -111,12 +110,24 @@ creation, you can implement an ``_init(self)`` method that will be called after object creation. """ + def __init__(self, text): + # copied from Comment() factory + cdef _Document doc + cdef xmlDoc* c_doc + if text is None: + text = b'' + else: + text = _utf8(text) + c_doc = _newXMLDoc() + doc = _documentFactory(c_doc, None) + self._c_node = _createComment(c_doc, _cstr(text)) + tree.xmlAddChild(c_doc, self._c_node) cdef class PIBase(_ProcessingInstruction): u"""All custom Processing Instruction classes must inherit from this one. - Note that you cannot (and must not) instantiate this class or its - subclasses. + To create an XML ProcessingInstruction instance, use the ``PI()`` + factory. Subclasses *must not* override __init__ or __new__ as it is absolutely undefined when these objects will be created or @@ -125,12 +136,24 @@ creation, you can implement an ``_init(self)`` method that will be called after object creation. """ + def __init__(self, target, text=None): + # copied from PI() factory + cdef _Document doc + cdef xmlDoc* c_doc + target = _utf8(target) + if text is None: + text = b'' + else: + text = _utf8(text) + c_doc = _newXMLDoc() + doc = _documentFactory(c_doc, None) + self._c_node = _createPI(c_doc, _cstr(target), _cstr(text)) + tree.xmlAddChild(c_doc, self._c_node) cdef class EntityBase(_Entity): u"""All custom Entity classes must inherit from this one. - Note that you cannot (and must not) instantiate this class or its - subclasses. + To create an XML Entity instance, use the ``Entity()`` factory. Subclasses *must not* override __init__ or __new__ as it is absolutely undefined when these objects will be created or @@ -139,7 +162,21 @@ creation, you can implement an ``_init(self)`` method that will be called after object creation. """ - + def __init__(self, name): + cdef _Document doc + cdef xmlDoc* c_doc + cdef char* c_name + name_utf = _utf8(name) + c_name = _cstr(name_utf) + if c_name[0] == c'#': + if not _characterReferenceIsValid(c_name + 1): + raise ValueError, u"Invalid character reference: '%s'" % name + elif not _xmlNameIsValid(c_name): + raise ValueError, u"Invalid entity reference: '%s'" % name + c_doc = _newXMLDoc() + doc = _documentFactory(c_doc, None) + self._c_node = _createEntity(c_doc, c_name) + tree.xmlAddChild(c_doc, self._c_node) ################################################################################ # Element class lookup From scoder at codespeak.net Fri Apr 9 20:29:00 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 9 Apr 2010 20:29:00 +0200 (CEST) Subject: [Lxml-checkins] r73601 - in lxml/trunk: . src/lxml Message-ID: <20100409182900.54968282BAD@codespeak.net> Author: scoder Date: Fri Apr 9 20:28:57 2010 New Revision: 73601 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/iterparse.pxi Log: r5562 at lenny: sbehnel | 2010-04-09 20:28:51 +0200 docstring cleanup Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Fri Apr 9 20:28:57 2010 @@ -311,48 +311,48 @@ cdef class iterparse(_BaseParser): u"""iterparse(self, source, events=("end",), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, huge_tree=False, schema=None) -Incremental parser. + Incremental parser. -Parses XML into a tree and generates tuples (event, element) in a -SAX-like fashion. ``event`` is any of 'start', 'end', 'start-ns', -'end-ns'. - -For 'start' and 'end', ``element`` is the Element that the parser just -found opening or closing. For 'start-ns', it is a tuple (prefix, URI) of -a new namespace declaration. For 'end-ns', it is simply None. Note that -all start and end events are guaranteed to be properly nested. - -The keyword argument ``events`` specifies a sequence of event type names -that should be generated. By default, only 'end' events will be -generated. - -The additional ``tag`` argument restricts the 'start' and 'end' events to -those elements that match the given tag. By default, events are generated -for all elements. Note that the 'start-ns' and 'end-ns' events are not -impacted by this restriction. - -The other keyword arguments in the constructor are mainly based on the -libxml2 parser configuration. A DTD will also be loaded if validation or -attribute default values are requested. - -Available boolean keyword arguments: - - attribute_defaults: read default attributes from DTD - - dtd_validation: validate (if DTD is available) - - load_dtd: use DTD for parsing - - no_network: prevent network access for related files - - remove_blank_text: discard blank text nodes - - remove_comments: discard comments - - remove_pis: discard processing instructions - - strip_cdata: replace CDATA sections by normal text content (default: True) - - compact: safe memory for short text content (default: True) - - resolve_entities: replace entities by their text value (default: True) - - huge_tree: disable security restrictions and support very deep trees - and very long text content (only affects libxml2 2.7+) - -Other keyword arguments: - - encoding: override the document encoding - - schema: an XMLSchema to validate against -""" # stupid, stupid MSVC has a 2048 bytes limit for strings!!! + Parses XML into a tree and generates tuples (event, element) in a + SAX-like fashion. ``event`` is any of 'start', 'end', 'start-ns', + 'end-ns'. + + For 'start' and 'end', ``element`` is the Element that the parser just + found opening or closing. For 'start-ns', it is a tuple (prefix, URI) of + a new namespace declaration. For 'end-ns', it is simply None. Note that + all start and end events are guaranteed to be properly nested. + + The keyword argument ``events`` specifies a sequence of event type names + that should be generated. By default, only 'end' events will be + generated. + + The additional ``tag`` argument restricts the 'start' and 'end' events to + those elements that match the given tag. By default, events are generated + for all elements. Note that the 'start-ns' and 'end-ns' events are not + impacted by this restriction. + + The other keyword arguments in the constructor are mainly based on the + libxml2 parser configuration. A DTD will also be loaded if validation or + attribute default values are requested. + + Available boolean keyword arguments: + - attribute_defaults: read default attributes from DTD + - dtd_validation: validate (if DTD is available) + - load_dtd: use DTD for parsing + - no_network: prevent network access for related files + - remove_blank_text: discard blank text nodes + - remove_comments: discard comments + - remove_pis: discard processing instructions + - strip_cdata: replace CDATA sections by normal text content (default: True) + - compact: safe memory for short text content (default: True) + - resolve_entities: replace entities by their text value (default: True) + - huge_tree: disable security restrictions and support very deep trees + and very long text content (only affects libxml2 2.7+) + + Other keyword arguments: + - encoding: override the document encoding + - schema: an XMLSchema to validate against + """ cdef object _tag cdef object _events cdef readonly object root From scoder at codespeak.net Sun Apr 11 19:34:59 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 11 Apr 2010 19:34:59 +0200 (CEST) Subject: [Lxml-checkins] r73648 - in lxml/trunk: . src/lxml/tests Message-ID: <20100411173459.DB527282B90@codespeak.net> Author: scoder Date: Sun Apr 11 19:34:58 2010 New Revision: 73648 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_css.py Log: r5564 at lenny: sbehnel | 2010-04-11 19:31:29 +0200 execute doctests in cssselect.py Modified: lxml/trunk/src/lxml/tests/test_css.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_css.py (original) +++ lxml/trunk/src/lxml/tests/test_css.py Sun Apr 11 19:34:58 2010 @@ -129,5 +129,6 @@ if sys.version_info >= (2,4): suite.addTests([make_doctest('test_css_select.txt')]) suite.addTests([make_doctest('test_css.txt')]) + suite.addTests(doctest.DocTestSuite(cssselect)) suite.addTests(list(CSSTestCase.all())) return suite From scoder at codespeak.net Sun Apr 11 19:35:03 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 11 Apr 2010 19:35:03 +0200 (CEST) Subject: [Lxml-checkins] r73649 - in lxml/trunk: . src/lxml Message-ID: <20100411173503.3AB10282B90@codespeak.net> Author: scoder Date: Sun Apr 11 19:35:01 2010 New Revision: 73649 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/cssselect.py Log: r5565 at lenny: sbehnel | 2010-04-11 19:34:52 +0200 ticket #560381: allow passing prefix-to-namespace mapping into CSSSelector() Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Apr 11 19:35:01 2010 @@ -8,6 +8,9 @@ Features added -------------- +* Keyword argument ``namespaces`` in ``lxml.cssselect.CSSSelector()`` + to pass a prefix-to-namespace mapping for the selector. + * New function ``lxml.etree.register_namespace(prefix, uri)`` that globally registers a namespace prefix for a namespace that newly created Elements in that namespace will use automatically. Follows Modified: lxml/trunk/src/lxml/cssselect.py ============================================================================== --- lxml/trunk/src/lxml/cssselect.py (original) +++ lxml/trunk/src/lxml/cssselect.py Sun Apr 11 19:35:01 2010 @@ -32,10 +32,24 @@ >>> root = etree.XML("TEXT") >>> [ el.tag for el in select(root) ] ['child'] + + To use CSS namespaces, you need to pass a prefix-to-namespace + mapping as ``namespaces`` keyword argument:: + + >>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' + >>> select_ns = cssselect.CSSSelector('root > rdf|Description', + ... namespaces={'rdf': rdfns}) + + >>> rdf = etree.XML(( + ... '' + ... 'blah' + ... '') % rdfns) + >>> [(el.tag, el.text) for el in select_ns(rdf)] + [('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')] """ - def __init__(self, css): + def __init__(self, css, namespaces=None): path = css_to_xpath(css) - etree.XPath.__init__(self, path) + etree.XPath.__init__(self, path, namespaces=namespaces) self.css = css def __repr__(self): From lxml-checkins at codespeak.net Fri Apr 16 08:50:26 2010 From: lxml-checkins at codespeak.net (© Pfizer Inc ® 1916-2010) Date: Fri, 16 Apr 2010 08:50:26 +0200 (CEST) Subject: [Lxml-checkins] 344 C-A-N-A-D-l-A-N P-H-A-R-M-A-C-Y Message-ID: <20100416065026.DA3FA282B9D@codespeak.net> Dear lxml-checkins at codespeak.net Get ready to make her happy. Discount price store: ID71040142 http://ugp.squareunit.ru?efa We do guarantee high-quality medications, instant worldwide delivery and friendly support. ? 2001-2010 Pfizer Inc. All rights reserved. From lxml-checkins at codespeak.net Mon Apr 19 14:49:40 2010 From: lxml-checkins at codespeak.net (Erectile Dysfunction Treatment - VIAGRA ® (ED)) Date: Mon, 19 Apr 2010 12:49:40 -0000 Subject: [Lxml-checkins] 19.4.2010 Pfizer 38% OFF. Message-ID: <000301cadfbe$bb6bde54$425aa8c0@.dsl.telesp.net.br> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20100419/5d61bf70/attachment.htm From lxml-checkins at codespeak.net Tue Apr 20 22:31:28 2010 From: lxml-checkins at codespeak.net (VIAGRA ® Official Seller) Date: Tue, 20 Apr 2010 22:31:28 +0200 (CEST) Subject: [Lxml-checkins] lxml-checkins@codespeak.net April 68% off Message-ID: <20100420203128.C5077282BEC@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20100420/0366e92e/attachment.htm From scoder at codespeak.net Tue Apr 27 21:41:29 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 27 Apr 2010 21:41:29 +0200 (CEST) Subject: [Lxml-checkins] r74122 - in lxml/trunk: . src/lxml/html Message-ID: <20100427194129.1CB1C282B9D@codespeak.net> Author: scoder Date: Tue Apr 27 21:41:28 2010 New Revision: 74122 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/__init__.py Log: r5568 at lenny: sbehnel | 2010-04-13 20:33:51 +0200 lxml.html.tostring(): only drop tag when serialising as HTML Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Tue Apr 27 21:41:28 2010 @@ -1491,7 +1491,7 @@ """ html = etree.tostring(doc, method=method, pretty_print=pretty_print, encoding=encoding) - if not include_meta_content_type: + if method == 'html' and not include_meta_content_type: if isinstance(html, str): html = __str_replace_meta_content_type('', html) else: From scoder at codespeak.net Tue Apr 27 21:41:40 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 27 Apr 2010 21:41:40 +0200 (CEST) Subject: [Lxml-checkins] r74123 - in lxml/trunk: . src/lxml Message-ID: <20100427194140.81D41282B9D@codespeak.net> Author: scoder Date: Tue Apr 27 21:41:38 2010 New Revision: 74123 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/docloader.pxi lxml/trunk/src/lxml/dtd.pxi lxml/trunk/src/lxml/extensions.pxi lxml/trunk/src/lxml/iterparse.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/readonlytree.pxi lxml/trunk/src/lxml/relaxng.pxi lxml/trunk/src/lxml/saxparser.pxi lxml/trunk/src/lxml/schematron.pxi lxml/trunk/src/lxml/serializer.pxi lxml/trunk/src/lxml/xinclude.pxi lxml/trunk/src/lxml/xmlerror.pxi lxml/trunk/src/lxml/xmlschema.pxi lxml/trunk/src/lxml/xpath.pxi lxml/trunk/src/lxml/xslt.pxi lxml/trunk/src/lxml/xsltext.pxi Log: r5569 at lenny: sbehnel | 2010-04-27 21:41:20 +0200 API hardening against uninitialised proxies and missing __init__ calls Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Apr 27 21:41:38 2010 @@ -77,6 +77,9 @@ Bugs fixed ---------- +* API is hardened against invalid proxy instances to prevent crashes + due to incorrectly instantiated Element instances. + * Prevent crash when instantiating ``CommentBase`` and friends. * Export ElementTree compatible XML parser class as Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Tue Apr 27 21:41:38 2010 @@ -11,6 +11,12 @@ displayNode(c_child, indent + 1) c_child = c_child.next +cdef inline int _assertValidNode(_Element element) except -1: + assert element._c_node is not NULL, u"invalid Element proxy at %s" % id(element) + +cdef inline int _assertValidDoc(_Document doc) except -1: + assert doc._c_doc is not NULL, u"invalid Document proxy at %s" % id(doc) + cdef _Document _documentOrRaise(object input): u"""Call this to get the document of a _Document, _ElementTree or _Element object, or to raise an exception if it can't be determined. @@ -33,8 +39,8 @@ if doc is None: raise ValueError, u"Input object has no document: %s" % \ python._fqtypename(input) - else: - return doc + _assertValidDoc(doc) + return doc cdef _Element _rootNodeOrRaise(object input): u"""Call this to get the root node of a _Document, _ElementTree or @@ -55,36 +61,41 @@ if node is None: raise ValueError, u"Input object has no element: %s" % \ python._fqtypename(input) - else: - return node + _assertValidNode(node) + return node cdef _Document _documentOf(object input): # call this to get the document of a # _Document, _ElementTree or _Element object # may return None! cdef _Element element + cdef _Document doc = None if isinstance(input, _ElementTree): element = (<_ElementTree>input)._context_node if element is not None: - return element._doc + doc = element._doc elif isinstance(input, _Element): - return (<_Element>input)._doc + doc = (<_Element>input)._doc elif isinstance(input, _Document): - return <_Document>input - return None + doc = <_Document>input + if doc is not None: + _assertValidDoc(doc) + return doc cdef _Element _rootNodeOf(object input): # call this to get the root node of a # _Document, _ElementTree or _Element object # may return None! + cdef _Element element = None if isinstance(input, _ElementTree): - return (<_ElementTree>input)._context_node + element = (<_ElementTree>input)._context_node elif isinstance(input, _Element): - return <_Element>input + element = <_Element>input elif isinstance(input, _Document): - return (<_Document>input).getroot() - else: - return None + element = (<_Document>input).getroot() + if element is not None: + _assertValidNode(element) + return element cdef _Element _makeElement(tag, xmlDoc* c_doc, _Document doc, _BaseParser parser, text, tail, attrib, nsmap, @@ -183,6 +194,7 @@ cdef xmlDoc* c_doc if parent is None or parent._doc is None: return None + _assertValidNode(parent) ns_utf, name_utf = _getNsTag(tag) c_doc = parent._doc._c_doc @@ -1181,6 +1193,7 @@ if c_node is not NULL: for element in elements: assert element is not None, u"Node must not be None" + _assertValidNode(element) # move element and tail over c_source_doc = element._c_node.doc c_next = element._c_node.next @@ -1205,10 +1218,12 @@ if left_to_right: for element in elements: assert element is not None, u"Node must not be None" + _assertValidNode(element) _appendChild(parent, element) else: for element in elements: assert element is not None, u"Node must not be None" + _assertValidNode(element) _prependChild(parent, element) return 0 Modified: lxml/trunk/src/lxml/docloader.pxi ============================================================================== --- lxml/trunk/src/lxml/docloader.pxi (original) +++ lxml/trunk/src/lxml/docloader.pxi Tue Apr 27 21:41:38 2010 @@ -1,6 +1,7 @@ # Custom resolver API ctypedef enum _InputDocumentDataType: + PARSER_DATA_INVALID PARSER_DATA_EMPTY PARSER_DATA_STRING PARSER_DATA_FILENAME @@ -12,6 +13,10 @@ cdef object _filename cdef object _file + def __cinit__(self): + self._type = PARSER_DATA_INVALID + + cdef class Resolver: u"This is the base class of all resolvers." def resolve(self, system_url, public_id, context): @@ -101,7 +106,7 @@ cdef class _ResolverRegistry: cdef object _resolvers cdef Resolver _default_resolver - def __init__(self, Resolver default_resolver=None): + def __cinit__(self, Resolver default_resolver=None): self._resolvers = set() self._default_resolver = default_resolver Modified: lxml/trunk/src/lxml/dtd.pxi ============================================================================== --- lxml/trunk/src/lxml/dtd.pxi (original) +++ lxml/trunk/src/lxml/dtd.pxi Tue Apr 27 21:41:38 2010 @@ -28,8 +28,10 @@ catalog. """ cdef tree.xmlDtd* _c_dtd - def __init__(self, file=None, *, external_id=None): + def __cinit__(self): self._c_dtd = NULL + + def __init__(self, file=None, *, external_id=None): _Validator.__init__(self) if file is not None: if _isString(file): @@ -69,6 +71,7 @@ cdef dtdvalid.xmlValidCtxt* valid_ctxt cdef int ret + assert self._c_dtd is not NULL, "DTD not initialised" doc = _documentOrRaise(etree) root_node = _rootNodeOrRaise(etree) Modified: lxml/trunk/src/lxml/extensions.pxi ============================================================================== --- lxml/trunk/src/lxml/extensions.pxi (original) +++ lxml/trunk/src/lxml/extensions.pxi Tue Apr 27 21:41:38 2010 @@ -42,6 +42,8 @@ cdef _TempStore _temp_refs cdef set _temp_documents cdef _ExceptionContext _exc + def __cinit__(self): + self._xpathCtxt = NULL def __init__(self, namespaces, extensions, enable_regexp, build_smart_strings): @@ -340,7 +342,7 @@ """ cdef _Document doc for doc in self._temp_documents: - if doc._c_doc is c_node.doc: + if doc is not None and doc._c_doc is c_node.doc: return doc return None @@ -374,7 +376,7 @@ cdef class _ExsltRegExp: cdef dict _compile_map - def __init__(self): + def __cinit__(self): self._compile_map = {} cdef _make_string(self, value): Modified: lxml/trunk/src/lxml/iterparse.pxi ============================================================================== --- lxml/trunk/src/lxml/iterparse.pxi (original) +++ lxml/trunk/src/lxml/iterparse.pxi Tue Apr 27 21:41:38 2010 @@ -79,7 +79,7 @@ cdef char* _tag_href cdef char* _tag_name - def __init__(self): + def __cinit__(self): self._ns_stack = [] self._pop_ns = self._ns_stack.pop self._node_stack = [] @@ -581,7 +581,7 @@ cdef _Element node cdef _Element next_node cdef int ns_count - if python.PyList_GET_SIZE(self._events): + if self._events: return self._pop_event(0) ns_count = 0 # find next node @@ -597,7 +597,7 @@ next_node = None while next_node is None: # back off through parents - self._index = self._index - 1 + self._index -= 1 node = self._end_node() if self._index < 0: break @@ -609,8 +609,8 @@ elif self._event_filter & ITERPARSE_FILTER_END_NS: ns_count = _countNsDefs(next_node._c_node) self._node_stack.append( (next_node, ns_count) ) - self._index = self._index + 1 - if python.PyList_GET_SIZE(self._events): + self._index += 1 + if self._events: return self._pop_event(0) raise StopIteration Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Tue Apr 27 21:41:38 2010 @@ -171,7 +171,7 @@ def __init__(self, message, error_log=None): if python.PY_VERSION_HEX >= 0x02050000: # Python >= 2.5 uses new style class exceptions - super(_LxmlError, self).__init__(message) + super(_Error, self).__init__(message) else: error_super_init(self, message) if error_log is None: @@ -179,7 +179,7 @@ else: self.error_log = error_log.copy() -cdef object _LxmlError = LxmlError +cdef object _Error = Error cdef object error_super_init = Error.__init__ @@ -326,7 +326,7 @@ cdef bint hasdoctype(self): # DOCTYPE gets parsed into internal subset (xmlDTD*) - return self._c_doc.intSubset is not NULL + return self._c_doc is not NULL and self._c_doc.intSubset is not NULL cdef getdoctype(self): # get doctype info: root tag, public/system ID (or None if not known) @@ -355,8 +355,7 @@ cdef getxmlinfo(self): # return XML version and encoding (or None if not known) - cdef xmlDoc* c_doc - c_doc = self._c_doc + cdef xmlDoc* c_doc = self._c_doc if c_doc.version is NULL: version = None else: @@ -377,8 +376,8 @@ cdef buildNewPrefix(self): # get a new unique prefix ("nsX") for this document - if self._ns_counter < python.PyTuple_GET_SIZE(_PREFIX_CACHE): - ns = python.PyTuple_GET_ITEM(_PREFIX_CACHE, self._ns_counter) + if self._ns_counter < len(_PREFIX_CACHE): + ns = _PREFIX_CACHE[self._ns_counter] python.Py_INCREF(ns) else: ns = python.PyBytes_FromFormat("ns%d", self._ns_counter) @@ -444,12 +443,12 @@ c_ns = self._findOrBuildNodeNs(c_node, href, NULL, 0) tree.xmlSetNs(c_node, c_ns) -cdef __initPrefixCache(): +cdef tuple __initPrefixCache(): cdef int i return tuple([ python.PyBytes_FromFormat("ns%d", i) for i in range(30) ]) -cdef object _PREFIX_CACHE +cdef tuple _PREFIX_CACHE _PREFIX_CACHE = __initPrefixCache() cdef extern from "etree_defs.h": @@ -607,6 +606,7 @@ cdef _Element element cdef bint left_to_right cdef Py_ssize_t slicelength, step + _assertValidNode(self) if value is None: raise ValueError, u"cannot assign None" if python.PySlice_Check(x): @@ -622,6 +622,7 @@ else: # otherwise: normal item assignment element = value + _assertValidNode(element) c_node = _findChild(self._c_node, x) if c_node is NULL: raise IndexError, u"list index out of range" @@ -642,6 +643,7 @@ cdef xmlNode* c_node cdef xmlNode* c_next cdef Py_ssize_t step, slicelength + _assertValidNode(self) if python.PySlice_Check(x): # slice deletion if _isFullSlice(x): @@ -673,6 +675,7 @@ cdef xmlDoc* c_doc cdef xmlNode* c_node cdef _Document new_doc + _assertValidNode(self) c_doc = _copyDocRoot(self._doc._c_doc, self._c_node) # recursive new_doc = _documentFactory(c_doc, self._doc._parser) root = new_doc.getroot() @@ -691,6 +694,7 @@ Sets an element attribute. """ + _assertValidNode(self) _setAttributeValue(self, key, value) def append(self, _Element element not None): @@ -698,9 +702,11 @@ Adds a subelement to the end of this element. """ + _assertValidNode(self) + _assertValidNode(element) _appendChild(self, element) - def addnext(self, _Element element): + def addnext(self, _Element element not None): u"""addnext(self, element) Adds the element as a following sibling directly after this @@ -710,6 +716,8 @@ the root node of a document. Note that tail text is automatically discarded when adding at the root level. """ + _assertValidNode(self) + _assertValidNode(element) if self._c_node.parent != NULL and not _isElement(self._c_node.parent): if element._c_node.type != tree.XML_PI_NODE: if element._c_node.type != tree.XML_COMMENT_NODE: @@ -717,7 +725,7 @@ element.tail = None _appendSibling(self, element) - def addprevious(self, _Element element): + def addprevious(self, _Element element not None): u"""addprevious(self, element) Adds the element as a preceding sibling directly before this @@ -727,6 +735,8 @@ before the root node of a document. Note that tail text is automatically discarded when adding at the root level. """ + _assertValidNode(self) + _assertValidNode(element) if self._c_node.parent != NULL and not _isElement(self._c_node.parent): if element._c_node.type != tree.XML_PI_NODE: if element._c_node.type != tree.XML_COMMENT_NODE: @@ -739,7 +749,10 @@ Extends the current children by the elements in the iterable. """ + _assertValidNode(self) for element in elements: + assert element is not None, u"Node must not be None" + _assertValidNode(element) _appendChild(self, element) def clear(self): @@ -752,6 +765,7 @@ cdef xmlAttr* c_attr_next cdef xmlNode* c_node cdef xmlNode* c_node_next + _assertValidNode(self) c_node = self._c_node # remove self.text and self.tail _removeText(c_node.children) @@ -780,6 +794,8 @@ cdef xmlNode* c_node cdef xmlNode* c_next cdef xmlDoc* c_source_doc + _assertValidNode(self) + _assertValidNode(element) c_node = _findChild(self._c_node, index) if c_node is NULL: _appendChild(self, element) @@ -799,6 +815,8 @@ """ cdef xmlNode* c_node cdef xmlNode* c_next + _assertValidNode(self) + _assertValidNode(element) c_node = element._c_node if c_node.parent is not self._c_node: raise ValueError, u"Element is not a child of this node." @@ -819,6 +837,9 @@ cdef xmlNode* c_new_node cdef xmlNode* c_new_next cdef xmlDoc* c_source_doc + _assertValidNode(self) + _assertValidNode(old_element) + _assertValidNode(new_element) c_old_node = old_element._c_node if c_old_node.parent is not self._c_node: raise ValueError, u"Element is not a child of this node." @@ -840,11 +861,13 @@ def __get__(self): if self._tag is not None: return self._tag + _assertValidNode(self) self._tag = _namespacedName(self._c_node) return self._tag def __set__(self, value): cdef _BaseParser parser + _assertValidNode(self) ns, name = _getNsTag(value) parser = self._doc._parser if parser is not None and parser._for_html: @@ -863,6 +886,7 @@ keys(), values() and items() to access element attributes. """ def __get__(self): + _assertValidNode(self) return _Attrib(self) ## cdef python.PyObject* ref ## if self._attrib is not None: @@ -878,9 +902,11 @@ the value None, if there was no text. """ def __get__(self): + _assertValidNode(self) return _collectText(self._c_node.children) def __set__(self, value): + _assertValidNode(self) if isinstance(value, QName): value = python.PyUnicode_FromEncodedObject( _resolveQNameText(self, value), 'UTF-8', 'strict') @@ -896,9 +922,11 @@ there was no text. """ def __get__(self): + _assertValidNode(self) return _collectText(self._c_node.next) def __set__(self, value): + _assertValidNode(self) _setTailText(self._c_node, value) # using 'del el.tail' is the wrong thing to do @@ -921,6 +949,7 @@ """ def __get__(self): cdef long line + _assertValidNode(self) line = tree.xmlGetLineNo(self._c_node) if line > 0: return line @@ -928,6 +957,7 @@ return None def __set__(self, line): + _assertValidNode(self) if line < 0: self._c_node.line = 0 else: @@ -945,6 +975,7 @@ cdef xmlNode* c_node cdef xmlNs* c_ns cdef dict nsmap = {} + _assertValidNode(self) c_node = self._c_node while c_node is not NULL and c_node.type == tree.XML_ELEMENT_NODE: c_ns = c_node.nsDef @@ -973,6 +1004,7 @@ """ def __get__(self): cdef char* c_base + _assertValidNode(self) c_base = tree.xmlNodeGetBase(self._doc._c_doc, self._c_node) if c_base is NULL: if self._doc._c_doc.URL is NULL: @@ -981,8 +1013,10 @@ base = _decodeFilename(c_base) tree.xmlFree(c_base) return base + def __set__(self, url): cdef char* c_base + _assertValidNode(self) if url is None: c_base = NULL else: @@ -1004,6 +1038,7 @@ cdef Py_ssize_t c, i cdef _node_to_node_function next_element cdef list result + _assertValidNode(self) if python.PySlice_Check(x): # slicing if _isFullSlice(x): @@ -1036,6 +1071,7 @@ Returns the number of subelements. """ + _assertValidNode(self) return _countElements(self._c_node.children) def __nonzero__(self): @@ -1047,11 +1083,13 @@ FutureWarning ) # emulate old behaviour + _assertValidNode(self) return _hasChild(self._c_node) def __contains__(self, element): u"__contains__(self, element)" cdef xmlNode* c_node + _assertValidNode(self) if not isinstance(element, _Element): return 0 c_node = (<_Element>element)._c_node @@ -1076,6 +1114,8 @@ cdef Py_ssize_t c_start, c_stop cdef xmlNode* c_child cdef xmlNode* c_start_node + _assertValidNode(self) + _assertValidNode(child) c_child = child._c_node if c_child.parent is not self._c_node: raise ValueError, u"Element is not a child of this node." @@ -1155,6 +1195,7 @@ Gets an element attribute. """ + _assertValidNode(self) return _getAttributeValue(self, key, default) def keys(self): @@ -1163,6 +1204,7 @@ Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary). """ + _assertValidNode(self) return _collectAttributes(self._c_node, 1) def values(self): @@ -1171,6 +1213,7 @@ Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order. """ + _assertValidNode(self) return _collectAttributes(self._c_node, 2) def items(self): @@ -1179,6 +1222,7 @@ Gets element attributes, as a sequence. The attributes are returned in an arbitrary order. """ + _assertValidNode(self) return _collectAttributes(self._c_node, 3) def getchildren(self): @@ -1191,6 +1235,7 @@ ElementTree 1.3 and lxml 2.0. New code should use ``list(element)`` or simply iterate over elements. """ + _assertValidNode(self) return _collectChildren(self) def getparent(self): @@ -1199,6 +1244,7 @@ Returns the parent of this element or None for the root element. """ cdef xmlNode* c_node + #_assertValidNode(self) # not needed c_node = _parentElement(self._c_node) if c_node is NULL: return None @@ -1210,6 +1256,7 @@ Returns the following sibling of this element or None. """ cdef xmlNode* c_node + #_assertValidNode(self) # not needed c_node = _nextElement(self._c_node) if c_node is NULL: return None @@ -1221,6 +1268,7 @@ Returns the preceding sibling of this element or None. """ cdef xmlNode* c_node + #_assertValidNode(self) # not needed c_node = _previousElement(self._c_node) if c_node is NULL: return None @@ -1282,6 +1330,7 @@ This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.""" + _assertValidDoc(self._doc) return _elementTreeFactory(self._doc, None) def getiterator(self, tag=None): @@ -1339,6 +1388,7 @@ Creates a new element associated with the same document. """ + _assertValidDoc(self._doc) return _makeElement(_tag, NULL, self._doc, None, None, None, attrib, nsmap, _extra) @@ -1408,7 +1458,7 @@ cdef extern from "etree_defs.h": # macro call to 't->tp_new()' for fast instantiation - cdef _Element NEW_ELEMENT "PY_NEW" (object t) + cdef object NEW_ELEMENT "PY_NEW" (object t) cdef _Element _elementFactory(_Document doc, xmlNode* c_node): cdef _Element result @@ -1461,6 +1511,7 @@ property text: def __get__(self): + _assertValidNode(self) if self._c_node.content is NULL: return '' else: @@ -1469,6 +1520,7 @@ def __set__(self, value): cdef tree.xmlDict* c_dict cdef char* c_text + _assertValidNode(self) if value is None: c_text = NULL else: @@ -1520,9 +1572,11 @@ property target: # not in ElementTree def __get__(self): + _assertValidNode(self) return funicode(self._c_node.name) def __set__(self, value): + _assertValidNode(self) value = _utf8(value) c_text = _cstr(value) tree.xmlNodeSetName(self._c_node, c_text) @@ -1542,9 +1596,11 @@ property name: # not in ElementTree def __get__(self): + _assertValidNode(self) return funicode(self._c_node.name) def __set__(self, value): + _assertValidNode(self) value_utf = _utf8(value) assert u'&' not in value and u';' not in value, \ u"Invalid entity name '%s'" % value @@ -1554,6 +1610,7 @@ # FIXME: should this be None or '&[VALUE];' or the resolved # entity value ? def __get__(self): + _assertValidNode(self) return u'&%s;' % funicode(self._c_node.name) def __repr__(self): @@ -1659,6 +1716,7 @@ Relocate the ElementTree to a new root node. """ + _assertValidNode(root) if root._c_node.type != tree.XML_ELEMENT_NODE: raise TypeError, u"Only elements can be the root of an ElementTree" self._context_node = root @@ -1693,6 +1751,7 @@ python.PyErr_NoMemory() return _elementTreeFactory(None, root) elif self._doc is not None: + _assertValidDoc(self._doc) c_doc = tree.xmlCopyDoc(self._doc._c_doc, 1) if c_doc is NULL: python.PyErr_NoMemory() @@ -1754,6 +1813,7 @@ cdef bint write_declaration cdef int is_standalone self._assertHasRoot() + _assertValidNode(self._context_node) if compression is None or compression < 0: compression = 0 # C14N serialisation @@ -1797,12 +1857,24 @@ Returns a structural, absolute XPath expression to find that element. """ cdef _Document doc + cdef _Element root cdef xmlDoc* c_doc cdef char* c_path - doc = self._context_node._doc + _assertValidNode(element) + if self._context_node is not None: + root = self._context_node + doc = root._doc + elif self._doc is not None: + doc = self._doc + root = doc.getroot() + else: + raise ValueError, u"Element is not in this tree." + _assertValidDoc(doc) + _assertValidNode(root) if element._doc is not doc: raise ValueError, u"Element is not in this tree." - c_doc = _fakeRootDoc(doc._c_doc, self._context_node._c_node) + + c_doc = _fakeRootDoc(doc._c_doc, root._c_node) c_path = tree.xmlGetNodePath(element._c_node) _destroyFakeDoc(doc._c_doc, c_doc) if c_path is NULL: @@ -2020,7 +2092,6 @@ Note that XInclude does not support custom resolvers in Python space due to restrictions of libxml2 <= 2.6.29. """ - cdef int result self._assertHasRoot() XInclude()(self._context_node) @@ -2034,6 +2105,7 @@ The ``compression`` option enables GZip compression level 1-9. """ self._assertHasRoot() + _assertValidNode(self._context_node) if compression is None or compression < 0: compression = 0 _tofilelikeC14N(file, self._context_node, exclusive, with_comments, @@ -2049,7 +2121,10 @@ if context_node is None and doc is not None: context_node = doc.getroot() if context_node is None: + _assertValidDoc(doc) result._doc = doc + else: + _assertValidNode(context_node) result._context_node = context_node return result @@ -2059,6 +2134,7 @@ """ cdef _Element _element def __cinit__(self, _Element element not None): + _assertValidNode(element) self._element = element # MANIPULATORS @@ -2285,8 +2361,9 @@ u"""ElementChildIterator(self, node, tag=None, reversed=False) Iterates over the children of an element. """ - def __init__(self, _Element node not None, tag=None, *, reversed=False): + def __cinit__(self, _Element node not None, tag=None, *, reversed=False): cdef xmlNode* c_node + _assertValidNode(node) self._initTagMatch(tag) if reversed: c_node = _findChildBackwards(node._c_node, 0) @@ -2310,7 +2387,8 @@ You can pass the boolean keyword ``preceding`` to specify the direction. """ - def __init__(self, _Element node not None, tag=None, *, preceding=False): + def __cinit__(self, _Element node not None, tag=None, *, preceding=False): + _assertValidNode(node) self._initTagMatch(tag) if preceding: self._next_element = _previousElement @@ -2322,7 +2400,8 @@ u"""AncestorsIterator(self, node, tag=None) Iterates over the ancestors of an element (from parent to parent). """ - def __init__(self, _Element node not None, tag=None): + def __cinit__(self, _Element node not None, tag=None): + _assertValidNode(node) self._initTagMatch(tag) self._next_element = _parentElement self._storeNext(node) @@ -2351,7 +2430,8 @@ # keep next node to return and the (s)top node cdef _Element _next_node cdef _Element _top_node - def __init__(self, _Element node not None, tag=None, *, inclusive=True): + def __cinit__(self, _Element node not None, tag=None, *, inclusive=True): + _assertValidNode(node) self._top_node = node self._next_node = node self._initTagMatch(tag) @@ -2417,7 +2497,8 @@ """ cdef object _nextEvent cdef _Element _start_element - def __init__(self, _Element element not None, tag=None, *, with_tail=True): + def __cinit__(self, _Element element not None, tag=None, *, with_tail=True): + _assertValidNode(element) if with_tail: events = (u"start", u"end") else: @@ -2683,7 +2764,7 @@ Checks if an object appears to be a valid element object. """ - return isinstance(element, _Element) + return isinstance(element, _Element) and (<_Element>element)._c_node is not NULL def dump(_Element elem not None, *, pretty_print=True, with_tail=True): u"""dump(elem, pretty_print=True, with_tail=True) @@ -2691,6 +2772,7 @@ Writes an element tree or element structure to sys.stdout. This function should be used for debugging only. """ + _assertValidNode(elem) _dumpToFile(sys.stdout, elem._c_node, pretty_print, with_tail) def tostring(element_or_tree, *, encoding=None, method=u"xml", @@ -2900,8 +2982,7 @@ cdef class _Validator: u"Base class for XML validators." cdef _ErrorLog _error_log - def __init__(self): - u"__init__(self)" + def __cinit__(self): self._error_log = _ErrorLog() def validate(self, etree): @@ -2943,6 +3024,7 @@ property error_log: u"The log of validation errors and warnings." def __get__(self): + assert self._error_log is not None, "XPath evaluator not initialised" return self._error_log.copy() include "dtd.pxi" # DTD Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Tue Apr 27 21:41:38 2010 @@ -20,18 +20,18 @@ # Python >= 2.5 uses new style class exceptions super(_ParseError, self).__init__(message) else: - _XMLSyntaxError.__init__(self, message) + _LxmlSyntaxError.__init__(self, message) self.position = (line, column) self.code = code +cdef object _LxmlSyntaxError = LxmlSyntaxError +cdef object _ParseError = ParseError + class XMLSyntaxError(ParseError): u"""Syntax error while parsing an XML document. """ pass -cdef object _XMLSyntaxError = XMLSyntaxError -cdef object _ParseError = ParseError - class ParserError(LxmlError): u"""Internal lxml parser error. """ @@ -51,7 +51,8 @@ cdef _BaseParser _default_parser cdef list _implied_parser_contexts - def __init__(self): + def __cinit__(self): + self._c_dict = NULL self._implied_parser_contexts = [] def __dealloc__(self): @@ -65,7 +66,7 @@ cdef python.PyObject* result thread_dict = python.PyThreadState_GetDict() if thread_dict is not NULL: - (thread_dict)[u"_ParserDictionaryContext"] = self + (thread_dict)[u"_ParserDictionaryContext"] = self cdef _ParserDictionaryContext _findThreadParserContext(self): u"Find (or create) the _ParserDictionaryContext object for the current thread" @@ -75,7 +76,7 @@ thread_dict = python.PyThreadState_GetDict() if thread_dict is NULL: return self - d = thread_dict + d = thread_dict result = python.PyDict_GetItem(d, u"_ParserDictionaryContext") if result is not NULL: return result @@ -264,7 +265,7 @@ cdef _ExceptionContext _exc_context cdef Py_ssize_t _bytes_read cdef char* _c_url - def __init__(self, filelike, exc_context, url, encoding): + def __cinit__(self, filelike, exc_context, url, encoding): self._exc_context = exc_context self._filelike = filelike self._encoding = encoding @@ -472,6 +473,13 @@ cdef _ParserSchemaValidationContext _validator cdef xmlparser.xmlParserCtxt* _c_ctxt cdef python.PyThread_type_lock _lock + def __cinit__(self): + self._c_ctxt = NULL + if not config.ENABLE_THREADING: + self._lock = NULL + else: + self._lock = python.PyThread_allocate_lock() + self._error_log = _ErrorLog() def __dealloc__(self): if self._validator is not None: @@ -543,13 +551,8 @@ _ResolverRegistry resolvers, xmlparser.xmlParserCtxt* c_ctxt): _initResolverContext(context, resolvers) - if not config.ENABLE_THREADING: - context._lock = NULL - else: - context._lock = python.PyThread_allocate_lock() if c_ctxt is not NULL: context._initParserContext(c_ctxt) - context._error_log = _ErrorLog() cdef int _raiseParseError(xmlparser.xmlParserCtxt* ctxt, filename, _ErrorLog error_log) except 0: @@ -839,6 +842,8 @@ parser._resolvers = self._resolvers parser.target = self.target parser._class_lookup = self._class_lookup + parser._default_encoding = self._default_encoding + parser._schema = self._schema return parser def copy(self): Modified: lxml/trunk/src/lxml/readonlytree.pxi ============================================================================== --- lxml/trunk/src/lxml/readonlytree.pxi (original) +++ lxml/trunk/src/lxml/readonlytree.pxi Tue Apr 27 21:41:38 2010 @@ -6,6 +6,9 @@ cdef xmlNode* _c_node cdef _ReadOnlyProxy _source_proxy cdef list _dependent_proxies + def __cinit__(self): + self._c_node = NULL + self._free_after_use = 0 cdef int _assertNode(self) except -1: u"""This is our way of saying: this proxy is invalid! @@ -329,7 +332,6 @@ cdef inline _initReadOnlyProxy(_ReadOnlyProxy el, _ReadOnlyProxy source_proxy): - el._free_after_use = 0 if source_proxy is None: el._source_proxy = el el._dependent_proxies = [el] Modified: lxml/trunk/src/lxml/relaxng.pxi ============================================================================== --- lxml/trunk/src/lxml/relaxng.pxi (original) +++ lxml/trunk/src/lxml/relaxng.pxi Tue Apr 27 21:41:38 2010 @@ -27,6 +27,9 @@ filename through the ``file`` keyword argument. """ cdef relaxng.xmlRelaxNG* _c_schema + def __cinit__(self): + self._c_schema = NULL + def __init__(self, etree=None, *, file=None): cdef _Document doc cdef _Element root_node @@ -35,7 +38,6 @@ cdef char* c_href cdef relaxng.xmlRelaxNGParserCtxt* parser_ctxt _Validator.__init__(self) - self._c_schema = NULL fake_c_doc = NULL if etree is not None: doc = _documentOrRaise(etree) @@ -103,6 +105,7 @@ cdef relaxng.xmlRelaxNGValidCtxt* valid_ctxt cdef int ret + assert self._c_schema is not NULL, "RelaxNG instance not initialised" doc = _documentOrRaise(etree) root_node = _rootNodeOrRaise(etree) Modified: lxml/trunk/src/lxml/saxparser.pxi ============================================================================== --- lxml/trunk/src/lxml/saxparser.pxi (original) +++ lxml/trunk/src/lxml/saxparser.pxi Tue Apr 27 21:41:38 2010 @@ -340,7 +340,7 @@ cdef list _data cdef list _element_stack cdef object _element_stack_pop - cdef _Element _last + cdef _Element _last # may be None cdef bint _in_tail def __init__(self, *, element_factory=None, parser=None): Modified: lxml/trunk/src/lxml/schematron.pxi ============================================================================== --- lxml/trunk/src/lxml/schematron.pxi (original) +++ lxml/trunk/src/lxml/schematron.pxi Tue Apr 27 21:41:38 2010 @@ -70,14 +70,16 @@ """ cdef schematron.xmlSchematron* _c_schema cdef xmlDoc* _c_schema_doc + def __cinit__(self): + self._c_schema = NULL + self._c_schema_doc = NULL + def __init__(self, etree=None, *, file=None): cdef _Document doc cdef _Element root_node cdef xmlNode* c_node cdef char* c_href cdef schematron.xmlSchematronParserCtxt* parser_ctxt - self._c_schema = NULL - self._c_schema_doc = NULL _Validator.__init__(self) if not config.ENABLE_SCHEMATRON: raise SchematronError, \ @@ -138,6 +140,7 @@ cdef int ret cdef int options + assert self._c_schema is not NULL, "Schematron instance not initialised" doc = _documentOrRaise(etree) root_node = _rootNodeOrRaise(etree) Modified: lxml/trunk/src/lxml/serializer.pxi ============================================================================== --- lxml/trunk/src/lxml/serializer.pxi (original) +++ lxml/trunk/src/lxml/serializer.pxi Tue Apr 27 21:41:38 2010 @@ -89,6 +89,7 @@ cdef int error_result if element is None: return None + _assertValidNode(element) c_method = _findOutputMethod(method) if c_method == OUTPUT_METHOD_TEXT: return _textToString(element._c_node, encoding, with_tail) @@ -151,10 +152,12 @@ cdef _Element element if isinstance(element_or_tree, _Element): + _assertValidNode(<_Element>element_or_tree) doc = (<_Element>element_or_tree)._doc c_doc = _plainFakeRootDoc(doc._c_doc, (<_Element>element_or_tree)._c_node, 0) else: doc = _documentOrRaise(element_or_tree) + _assertValidDoc(doc) c_doc = doc._c_doc with nogil: @@ -345,7 +348,7 @@ cdef object _close_filelike cdef _ExceptionContext _exc_context cdef _ErrorLog error_log - def __init__(self, filelike, exc_context=None, compression=None): + def __cinit__(self, filelike, exc_context=None, compression=None): if compression is not None and compression > 0: filelike = gzip.GzipFile( fileobj=filelike, mode=u'wb', compresslevel=compression) Modified: lxml/trunk/src/lxml/xinclude.pxi ============================================================================== --- lxml/trunk/src/lxml/xinclude.pxi (original) +++ lxml/trunk/src/lxml/xinclude.pxi Tue Apr 27 21:41:38 2010 @@ -20,6 +20,7 @@ property error_log: def __get__(self): + assert self._error_log is not None, "XInclude instance not initialised" return self._error_log.copy() def __call__(self, _Element node not None): @@ -32,6 +33,8 @@ # typed as elements. The included fragment is added between the two, # i.e. as a sibling, which does not conflict with traversal. cdef int result + _assertValidNode(node) + assert self._error_log is not None, "XPath evaluator not initialised" self._error_log.connect() __GLOBAL_PARSER_CONTEXT.pushImpliedContextFromParser( node._doc._parser) Modified: lxml/trunk/src/lxml/xmlerror.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlerror.pxi (original) +++ lxml/trunk/src/lxml/xmlerror.pxi Tue Apr 27 21:41:38 2010 @@ -494,7 +494,7 @@ cdef void _receiveError(void* c_log_handler, xmlerror.xmlError* error) nogil: # no Python objects here, may be called without thread context ! # when we declare a Python object, Pyrex will INCREF(None) ! - if __DEBUG != 0: + if __DEBUG: _forwardError(c_log_handler, error) cdef void _receiveXSLTError(void* c_log_handler, char* msg, ...) nogil: Modified: lxml/trunk/src/lxml/xmlschema.pxi ============================================================================== --- lxml/trunk/src/lxml/xmlschema.pxi (original) +++ lxml/trunk/src/lxml/xmlschema.pxi Tue Apr 27 21:41:38 2010 @@ -36,6 +36,10 @@ cdef xmlschema.xmlSchema* _c_schema cdef bint _has_default_attributes cdef bint _add_attribute_defaults + def __cinit__(self): + self._c_schema = NULL + self._has_default_attributes = True # play safe + self._add_attribute_defaults = False def __init__(self, etree=None, *, file=None, attribute_defaults=False): cdef _Document doc @@ -45,9 +49,7 @@ cdef char* c_href cdef xmlschema.xmlSchemaParserCtxt* parser_ctxt - self._has_default_attributes = True # play safe self._add_attribute_defaults = attribute_defaults - self._c_schema = NULL _Validator.__init__(self) fake_c_doc = NULL if etree is not None: @@ -126,6 +128,7 @@ cdef xmlDoc* c_doc cdef int ret + assert self._c_schema is not NULL, "Schema instance not initialised" doc = _documentOrRaise(etree) root_node = _rootNodeOrRaise(etree) @@ -161,8 +164,6 @@ cdef _ParserSchemaValidationContext context context = NEW_SCHEMA_CONTEXT(_ParserSchemaValidationContext) context._schema = self - context._valid_ctxt = NULL - context._sax_plug = NULL context._add_default_attributes = (self._has_default_attributes and ( add_default_attributes or self._add_attribute_defaults)) return context @@ -172,6 +173,10 @@ cdef xmlschema.xmlSchemaValidCtxt* _valid_ctxt cdef xmlschema.xmlSchemaSAXPlugStruct* _sax_plug cdef bint _add_default_attributes + def __cinit__(self): + self._valid_ctxt = NULL + self._sax_plug = NULL + self._add_default_attributes = False def __dealloc__(self): self.disconnect() @@ -179,6 +184,7 @@ xmlschema.xmlSchemaFreeValidCtxt(self._valid_ctxt) cdef _ParserSchemaValidationContext copy(self): + assert self._schema is not None, "_ParserSchemaValidationContext not initialised" return self._schema._newSaxValidator( self._add_default_attributes) Modified: lxml/trunk/src/lxml/xpath.pxi ============================================================================== --- lxml/trunk/src/lxml/xpath.pxi (original) +++ lxml/trunk/src/lxml/xpath.pxi Tue Apr 27 21:41:38 2010 @@ -130,6 +130,13 @@ cdef _XPathContext _context cdef python.PyThread_type_lock _eval_lock cdef _ErrorLog _error_log + def __cinit__(self): + self._xpathCtxt = NULL + if config.ENABLE_THREADING: + self._eval_lock = python.PyThread_allocate_lock() + if self._eval_lock is NULL: + python.PyErr_NoMemory() + self._error_log = _ErrorLog() def __init__(self, namespaces, extensions, enable_regexp, smart_strings): @@ -139,17 +146,13 @@ import warnings warnings.warn(u"This version of libxml2 has a known XPath bug. " + \ u"Use it at your own risk.") - self._error_log = _ErrorLog() self._context = _XPathContext(namespaces, extensions, enable_regexp, None, smart_strings) - if config.ENABLE_THREADING: - self._eval_lock = python.PyThread_allocate_lock() - if self._eval_lock is NULL: - python.PyErr_NoMemory() property error_log: def __get__(self): + assert self._error_log is not None, "XPath evaluator not initialised" return self._error_log.copy() def __dealloc__(self): @@ -195,7 +198,7 @@ result = python.PyThread_acquire_lock( self._eval_lock, python.WAIT_LOCK) if result == 0: - raise ParserError, u"parser locking failed" + raise XPathError, u"XPath evaluator locking failed" return 0 cdef void _unlock(self): @@ -266,6 +269,8 @@ cdef xpath.xmlXPathContext* xpathCtxt cdef int ns_register_status cdef _Document doc + _assertValidNode(element) + _assertValidDoc(element._doc) self._element = element doc = element._doc _XPathEvaluatorBase.__init__(self, namespaces, extensions, @@ -300,6 +305,7 @@ cdef xpath.xmlXPathObject* xpathObj cdef _Document doc cdef char* c_path + assert self._xpathCtxt is not NULL, "XPath context not initialised" path = _utf8(_path) doc = self._element._doc @@ -351,6 +357,7 @@ cdef xmlDoc* c_doc cdef _Document doc cdef char* c_path + assert self._xpathCtxt is not NULL, "XPath context not initialised" path = _utf8(_path) doc = self._element._doc @@ -417,6 +424,8 @@ """ cdef xpath.xmlXPathCompExpr* _xpath cdef bytes _path + def __cinit__(self): + self._xpath = NULL def __init__(self, path, *, namespaces=None, extensions=None, regexp=True, smart_strings=True): @@ -440,6 +449,7 @@ cdef _Document document cdef _Element element + assert self._xpathCtxt is not NULL, "XPath context not initialised" document = _documentOrRaise(_etree_or_element) element = _rootNodeOrRaise(_etree_or_element) Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Tue Apr 27 21:41:38 2010 @@ -124,8 +124,7 @@ cdef xmlDoc* _xslt_doc_loader(char* c_uri, tree.xmlDict* c_dict, int parse_options, void* c_ctxt, xslt.xsltLoadType c_type) nogil: - # no Python objects here, may be called without thread context ! - # when we declare a Python object, Pyrex will INCREF(None) ! + # nogil => no Python objects here, may be called without thread context ! cdef xmlDoc* c_doc cdef xmlDoc* result cdef void* c_pcontext @@ -186,7 +185,7 @@ See `XSLT`. """ cdef xslt.xsltSecurityPrefs* _prefs - def __init__(self, *, read_file=True, write_file=True, create_dir=True, + def __cinit__(self, *, read_file=True, write_file=True, create_dir=True, read_network=True, write_network=True): self._prefs = xslt.xsltNewSecurityPrefs() if self._prefs is NULL: @@ -271,10 +270,12 @@ cdef xslt.xsltTransformContext* _xsltCtxt cdef _ReadOnlyElementProxy _extension_element_proxy cdef dict _extension_elements - def __init__(self, namespaces, extensions, enable_regexp, - build_smart_strings): + def __cinit__(self): self._xsltCtxt = NULL self._extension_elements = EMPTY_DICT + + def __init__(self, namespaces, extensions, enable_regexp, + build_smart_strings): if extensions is not None and extensions: for ns_name_tuple, extension in extensions.items(): if ns_name_tuple[0] is None: @@ -320,7 +321,7 @@ quote escaping. """ cdef bytes strval - def __init__(self, strval): + def __cinit__(self, strval): self.strval = _utf8(strval) @@ -356,6 +357,9 @@ cdef XSLTAccessControl _access_control cdef _ErrorLog _error_log + def __cinit__(self): + self._c_style = NULL + def __init__(self, xslt_input, *, extensions=None, regexp=True, access_control=None): cdef xslt.xsltStylesheet* c_style @@ -413,7 +417,8 @@ self._xslt_resolver_context._c_style_doc is not NULL: tree.xmlFreeDoc(self._xslt_resolver_context._c_style_doc) # this cleans up the doc copy as well - xslt.xsltFreeStylesheet(self._c_style) + if self._c_style is not NULL: + xslt.xsltFreeStylesheet(self._c_style) property error_log: u"The log of errors and warnings of an XSLT execution." @@ -477,6 +482,7 @@ cdef tree.xmlDict* c_dict cdef char** params + assert self._c_style is not NULL, "XSLT stylesheet not initialised" input_doc = _documentOrRaise(_input) root_node = _rootNodeOrRaise(_input) @@ -645,6 +651,7 @@ cdef XSLT _copyXSLT(XSLT stylesheet): cdef XSLT new_xslt cdef xmlDoc* c_doc + assert stylesheet._c_style is not NULL, "XSLT stylesheet not initialised" new_xslt = NEW_XSLT(XSLT) # without calling __init__() new_xslt._access_control = stylesheet._access_control new_xslt._error_log = _ErrorLog() @@ -668,6 +675,11 @@ cdef char* _buffer cdef Py_ssize_t _buffer_len cdef Py_ssize_t _buffer_refcnt + def __cinit__(self): + self._buffer = NULL + self._buffer_len = 0 + self._buffer_refcnt = 0 + cdef _saveToStringAndSize(self, char** s, int* l): cdef _Document doc cdef int r @@ -719,7 +731,7 @@ def __getbuffer__(self, Py_buffer* buffer, int flags): cdef int l if buffer is NULL: - return # LOCK + return if self._buffer is NULL or flags & python.PyBUF_WRITABLE: self._saveToStringAndSize(&buffer.buf, &l) buffer.len = l @@ -748,7 +760,7 @@ def __releasebuffer__(self, Py_buffer* buffer): if buffer is NULL: - return # UNLOCK + return if buffer.buf is self._buffer: self._buffer_refcnt -= 1 if self._buffer_refcnt == 0: @@ -778,9 +790,6 @@ result = <_XSLTResultTree>_newElementTree(doc, None, _XSLTResultTree) result._xslt = xslt result._profile = profile - result._buffer = NULL - result._buffer_refcnt = 0 - result._buffer_len = 0 return result # functions like "output" and "write" are a potential security risk, but we @@ -831,6 +840,7 @@ cdef _Element result_node cdef char* c_href cdef xmlAttr* c_attr + _assertValidNode(self) if self._c_node.content is NULL: raise ValueError, u"PI lacks content" hrefs = _FIND_PI_HREF(u' ' + funicode(self._c_node.content)) @@ -852,6 +862,7 @@ # ID reference to embedded stylesheet # try XML:ID lookup + _assertValidDoc(self._doc) c_href += 1 # skip leading '#' c_attr = tree.xmlGetID(self._c_node.doc, c_href) if c_attr is not NULL and c_attr.doc is self._c_node.doc: Modified: lxml/trunk/src/lxml/xsltext.pxi ============================================================================== --- lxml/trunk/src/lxml/xsltext.pxi (original) +++ lxml/trunk/src/lxml/xsltext.pxi Tue Apr 27 21:41:38 2010 @@ -37,6 +37,7 @@ cdef xmlNode* c_parent cdef xmlNode* c_node cdef xmlNode* c_context_node + assert context._xsltCtxt is not NULL, "XSLT context not initialised" c_context_node = _roNodeOf(node) #assert c_context_node.doc is context._xsltContext.node.doc, \ # "switching input documents during transformation is not currently supported" @@ -80,6 +81,7 @@ cdef xmlNode* c_parent cdef xslt.xsltTransformContext* c_ctxt = context._xsltCtxt cdef xmlNode* c_old_output_parent = c_ctxt.insert + assert context._xsltCtxt is not NULL, "XSLT context not initialised" # output_parent node is used for adding results instead of # elements list used in apply_templates, that's easier and allows to From scoder at codespeak.net Wed Apr 28 14:45:03 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 28 Apr 2010 14:45:03 +0200 (CEST) Subject: [Lxml-checkins] r74168 - in lxml/trunk: . doc Message-ID: <20100428124503.CB4F6282B9D@codespeak.net> Author: scoder Date: Wed Apr 28 14:45:02 2010 New Revision: 74168 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/build.txt Log: r5572 at lenny: sbehnel | 2010-04-28 14:43:42 +0200 require Cython 0.13 Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Wed Apr 28 14:45:02 2010 @@ -46,9 +46,9 @@ you want to be an lxml developer, then you do need a working Cython installation. You can use EasyInstall_ to install it:: - easy_install Cython>=0.12 + easy_install Cython>=0.13 -lxml currently requires Cython 0.12, later release versions should +lxml currently requires Cython 0.13, later release versions should work as well. @@ -226,7 +226,6 @@ STATIC_DEPS=true sudo easy_install lxml - Static linking on Windows ------------------------- From scoder at codespeak.net Wed Apr 28 14:45:07 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 28 Apr 2010 14:45:07 +0200 (CEST) Subject: [Lxml-checkins] r74169 - lxml/trunk Message-ID: <20100428124507.A57AD282B9D@codespeak.net> Author: scoder Date: Wed Apr 28 14:45:05 2010 New Revision: 74169 Modified: lxml/trunk/ (props changed) lxml/trunk/INSTALL.txt Log: r5573 at lenny: sbehnel | 2010-04-28 14:44:55 +0200 note on using lxml.etree together with python-libxml2 bindings Modified: lxml/trunk/INSTALL.txt ============================================================================== --- lxml/trunk/INSTALL.txt (original) +++ lxml/trunk/INSTALL.txt Wed Apr 28 14:45:05 2010 @@ -9,8 +9,9 @@ 1 Requirements 2 Installation 3 Building lxml from sources - 4 MS Windows - 5 MacOS-X + 4 Using lxml with python-libxml2 + 5 MS Windows + 6 MacOS-X Requirements @@ -90,6 +91,23 @@ .. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev +Using lxml with python-libxml2 +------------------------------ + +If you want to use lxml together with the official libxml2 Python +bindings (maybe because one of your dependencies uses it), you must +build lxml statically. Otherwise, the two packages will interfere in +places where the libxml2 library requires global configuration, which +can have any kind of effect from disappearing functionality to crashes +in either of the two. + +To get a static build, either pass the ``--static-deps`` option to the +setup.py script, or run ``easy_install`` with the ``STATIC_DEPS`` +environment variable set to true, i.e. + + STATIC_DEPS=true easy_install lxml + + MS Windows ----------