From rampeters at gmail.com Sun Apr 1 16:09:10 2007 From: rampeters at gmail.com (Ram Peters) Date: Sun, 1 Apr 2007 10:09:10 -0400 Subject: [lxml-dev] lxml objectify Message-ID: <81b45360704010709x1358a95fw274e339a048b0aad@mail.gmail.com> Breakfast at Tiffany's Movie Classic Borat Movie Comedy How do you represent DVD id=1 and it's elements, and DVD id=2 and it's elements as child of root "Library"? Like this:? from lxml import etree from lxml import objectify root = objectify.Element("Library") child[1] = objectify.Element("DVD", id="1") root.new_child = child[1] Thank you From jholg at gmx.de Mon Apr 2 14:50:39 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 02 Apr 2007 14:50:39 +0200 Subject: [lxml-dev] lxml objectify In-Reply-To: <81b45360704010709x1358a95fw274e339a048b0aad@mail.gmail.com> References: <81b45360704010709x1358a95fw274e339a048b0aad@mail.gmail.com> Message-ID: <20070402125039.321780@gmx.net> > > > Breakfast at Tiffany's > Movie > Classic > > > > Borat > Movie > Comedy > > > > How do you represent DVD id=1 and it's elements, and DVD id=2 and it's > elements as child of root "Library"? This should give you an idea: >>> root = objectify.Element("Library") >>> root.DVD = [ objectify.Element("DVD", id="1"), objectify.Element("DVD", id="2") ] >>> root.DVD[0].title = "Breakfast at Tiffany's" >>> root.DVD[1].title = "Borat" >>> print objectify.dump(root) Library = None [ObjectifiedElement] DVD = None [ObjectifiedElement] * id = '1' title = "Breakfast at Tiffany's" [StringElement] DVD = None [ObjectifiedElement] * id = '2' title = 'Borat' [StringElement] >>> print etree.tostring(root, pretty_print=True) Breakfast at Tiffany's Borat >>> HTH, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From cz at gocept.com Mon Apr 2 17:56:45 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 2 Apr 2007 17:56:45 +0200 Subject: [lxml-dev] ObjectPath for "current node" Message-ID: Hoi, in the object paths can be relative, like '.foo.bar'. Shouldn't it be possible then to create the object path of '.' referencing the current node? There would be a more general way to use path.find and path.setattr then. At least for me anyway :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From JCheng at opsware.com Tue Apr 3 22:41:33 2007 From: JCheng at opsware.com (Jeff Cheng) Date: Tue, 3 Apr 2007 13:41:33 -0700 Subject: [lxml-dev] Document is not valid XML Schema Message-ID: <3B8C9773FAA87E448B042757CCD1FDA70150E35F@mayhem.opsware.com> I am trying to validate XML files against their respective schemas. However, lxml complains that the schemas are not valid. Python 2.5 (r25:51908, Mar 13 2007, 08:13:14) [GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> f = file("linux-definitions-schema.xsd") >>> xmlschema_doc = etree.parse(f) >>> xmlschema = etree.XMLSchema(xmlschema_doc) Traceback (most recent call last): File "", line 1, in File "xmlschema.pxi", line 61, in etree.XMLSchema.__init__ etree.XMLSchemaParseError: Document is not valid XML Schema The schemas are from Mitre (http://oval.mitre.org/language/download/schema/version5.2/index.html#do wnloads) and are assumed to be valid. I am using python-2.5, lxml-1.2.1, and libxml2-2.6.26 on cygwin. Any help would be greatly appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070403/888d35d2/attachment.htm From jholg at gmx.de Wed Apr 4 15:06:39 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Wed, 04 Apr 2007 15:06:39 +0200 Subject: [lxml-dev] [objectify] patch/changes proposal: xsiannotate, deannotate Message-ID: <20070404130639.321050@gmx.net> Hi all, I suggest 1. adding two functions to lxml.objectify: def xsiannotate(element_or_tree, ignore_old=True): """Recursively annotates the elements of an XML tree with 'xsi:type' attributes. If the 'ignore_old' keyword argument is True (the default), current 'xsi:type' attributes will be ignored and replaced. Otherwise, they will be checked and only replaced if they no longer fit the current text value. """ [...] Note: Will simply take the first schema type in PyType.xmlSchemaTypes list. def deannotate(element_or_tree, pytype=True, xsi=True): """Recursively de-annotate the elements of an XML tree by removing 'pytype' and/or 'type' attributes. If the 'pytype' keyword argument is True (the default), 'pytype' attributes will be removed. If the 'xsi' keyword argument is True (the default), 'xsi:type' attributes will be removed. """ [...] 2. Patching annotate() so that it allows for leaving pytype="str" as is if ignore_old=False. Currently it will start type-guessing/xsi-type lookup as PyType(str,...) uses no type_check function. 3. Modifying the objectify.Element() factory to default nsmap to nsmap = { "py": PYTYPE_NAMESPACE, "xsi": XML_SCHEMA_INSTANCE_NS } if it is None. This keeps namespace-information in non-root nodes nice and clean with the cool new 1.3 lookup-if-ns-is-defined-up-in-the-tree functionality. 4. Patch DataElement so that it allows s.o. using an _xsitype argument that is not registered (or even plain wrong). Currently, this raises a KeyError, whereas using an unknown pytype defaults to StringElement. 5. Restructure pytype<-->XML Schema type mapping a bit, as e.g XML Schema type integer fits better to a Python long than a Python int regarding value space. a) I propose the following for non-fractional: pytype = PyType('int', int, IntElement) pytype.xmlSchemaTypes = ("int", "short", "byte", "unsignedShort", "unsignedByte",) pytype.register() pytype = PyType('long', long, LongElement) pytype.xmlSchemaTypes = ("integer", "nonPositiveInteger", "negativeInteger", "long", "nonNegativeInteger", "unsignedLong", "unsignedInt", "positiveInteger",) pytype.register() (Anything that fits in 32bit becomes a Python int, everything else a Python long. Maybe slightly arbitrary, but ok for 32bit-machines :-) This does not have big implications in practice, it's more or less for consistency. One thing remains: xsiannotate()-ing an IntElement >=2**31 will still xsi:type that as "int", which is not really valid regarding schema types. This could be addressed by using a more elaborate type_check for PyType("int",...) but I'm unsure about performance drawback and if it's worth the effort. b) Add all (non-list) XML Schema datatypes that restrict "string" to PyType('str', ...) As StringElement is the default these end up in StringElement anyway today. Adding them can result in faster lookup as no type-guessing will be invoked, and just for completeness... It also does not hurt s.o. who defines some custom class that handles a special schema datatype as this will override the objectify default. S.th. along the lines of pytype = PyType('str', None, StringElement) pytype.xmlSchemaTypes = ("string", "normalizedString", "token", "language", "Name", "NCName", "ID", "IDREF", "ENTITY", "NMTOKEN", ) What do you say? I've attached the patch/doc/tests for the proposed behaviour, for discussion, based on trunk versions of 2007/04/03. Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail -------------- next part -------------- *** ./src/lxml/objectify.pyx.ORIG Tue Apr 3 16:19:47 2007 --- ./src/lxml/objectify.pyx Wed Apr 4 12:15:36 2007 *************** *** 711,719 **** if text is None: return 0 text = text.lower() ! if text == 'false': return 0 ! elif text == 'true': return 1 else: raise ValueError, "Invalid boolean value: '%s'" % text --- 711,719 ---- if text is None: return 0 text = text.lower() ! if text in ('false', '0'): return 0 ! elif text in ('true', '1'): return 1 else: raise ValueError, "Invalid boolean value: '%s'" % text *************** *** 882,894 **** cdef _registerPyTypes(): pytype = PyType('int', int, IntElement) ! pytype.xmlSchemaTypes = ("integer", "positiveInteger", "negativeInteger", ! "nonNegativeInteger", "nonPositiveInteger", ! "int", "unsignedInt", "short", "unsignedShort") pytype.register() pytype = PyType('long', long, LongElement) ! pytype.xmlSchemaTypes = ("long", "unsignedLong") pytype.register() pytype = PyType('float', float, FloatElement) --- 882,896 ---- cdef _registerPyTypes(): pytype = PyType('int', int, IntElement) ! pytype.xmlSchemaTypes = ("int", "short", "byte", "unsignedShort", ! "unsignedByte",) ! pytype.register() pytype = PyType('long', long, LongElement) ! pytype.xmlSchemaTypes = ("integer", "nonPositiveInteger", "negativeInteger", ! "long", "nonNegativeInteger", "unsignedLong", ! "unsignedInt", "positiveInteger",) pytype.register() pytype = PyType('float', float, FloatElement) *************** *** 900,906 **** pytype.register() pytype = PyType('str', None, StringElement) ! pytype.xmlSchemaTypes = ("string", "normalizedString") pytype.register() pytype = PyType('none', None, NoneElement) --- 902,910 ---- pytype.register() pytype = PyType('str', None, StringElement) ! pytype.xmlSchemaTypes = ("string", "normalizedString", "token", "language", ! "Name", "NCName", "ID", "IDREF", "ENTITY", ! "NMTOKEN", ) pytype.register() pytype = PyType('none', None, NoneElement) *************** *** 1425,1431 **** """Recursively annotates the elements of an XML tree with 'pytype' attributes. ! If the 'ignore_old' keyword argument is True (the default), current attributes will be ignored and replaced. Otherwise, they will be checked and only replaced if they no longer fit the current text value. """ --- 1429,1435 ---- """Recursively annotates the elements of an XML tree with 'pytype' attributes. ! If the 'ignore_old' keyword argument is True (the default), current 'pytype' attributes will be ignored and replaced. Otherwise, they will be checked and only replaced if they no longer fit the current text value. """ *************** *** 1450,1461 **** c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) if old_value is not None and old_value != TREE_PYTYPE: pytype = _PYTYPE_DICT.get(old_value) ! if pytype is not None: value = textOf(c_node) try: if not (pytype).type_check(value): pytype = None ! except ValueError: pytype = None if pytype is None: --- 1454,1467 ---- c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) if old_value is not None and old_value != TREE_PYTYPE: pytype = _PYTYPE_DICT.get(old_value) ! # StrType does not have a typecheck but is the default anyway, ! # so just accept it if given as type information ! if pytype not in (None, StrType): value = textOf(c_node) try: if not (pytype).type_check(value): pytype = None ! except IGNORABLE_ERRORS: pytype = None if pytype is None: *************** *** 1502,1507 **** --- 1508,1639 ---- _cstr(pytype.name)) tree.END_FOR_EACH_ELEMENT_FROM(c_node) + def xsiannotate(element_or_tree, ignore_old=True): + """Recursively annotates the elements of an XML tree with 'xsi:type' + attributes. + + If the 'ignore_old' keyword argument is True (the default), current + 'xsi:type' attributes will be ignored and replaced. Otherwise, they will be + checked and only replaced if they no longer fit the current text value. + """ + cdef _Element element + cdef _Document doc + cdef int ignore + cdef tree.xmlNode* c_node + cdef tree.xmlNs* c_ns + cdef python.PyObject* dict_result + element = cetree.rootNodeOrRaise(element_or_tree) + doc = element._doc + ignore = bool(ignore_old) + + StrType = _PYTYPE_DICT.get('str') + c_node = element._c_node + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + xsitype = None + pytype = None + value = None + if not ignore: + # check that old value is valid + xsitype = cetree.attributeValueFromNsName(c_node, + _XML_SCHEMA_INSTANCE_NS, + "type") + if xsitype is not None: + dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, xsitype) + if dict_result is not NULL: + pytype = dict_result + # StrType does not have a typecheck but is the default anyway, + # so just accept it if given as type information + if pytype not in (None, StrType): + value = textOf(c_node) + try: + if not (pytype).type_check(value): + xsitype = None + except IGNORABLE_ERRORS: + xsitype = None + + if xsitype is None: + # check for pytype hint + value = cetree.attributeValueFromNsName( + c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) + + if value is not None: + if value != TREE_PYTYPE: + pytype = _PYTYPE_DICT.get(value) + if pytype not in (None, StrType): + value = textOf(c_node) + try: + if not (pytype).type_check(value): + pytype = None + except IGNORABLE_ERRORS: + pytype = None + if pytype is not None: + try: + # pytype->xsi:type is a 1:n mapping, simply take first item + xsitype = (pytype)._schema_types[0] + except IndexError: + xsitype = None + else: + xsitype = TREE_PYTYPE + if xsitype is None: + # try to guess type + if cetree.findChildForwards(c_node, 0) is NULL: + # element has no children => data class + if value is None: + value = textOf(c_node) + if value is not None: + for type_check, tested_pytype in _TYPE_CHECKS: + try: + if type_check(value) is not False: + pytype = tested_pytype + break + except IGNORABLE_ERRORS: + pass + else: + pytype = StrType + try: + # pytype->xsi:type is a 1:n mapping so simply take the first + xsitype = (pytype)._schema_types[0] + except IndexError: + xsitype = None + + if xsitype is None or xsitype == TREE_PYTYPE: + # delete attribute if it exists + cetree.delAttributeFromNsName(c_node, _XML_SCHEMA_INSTANCE_NS, "type") + else: + # update or create attribute + c_ns = cetree.findOrBuildNodeNs(doc, c_node, _XML_SCHEMA_INSTANCE_NS) + tree.xmlSetNsProp(c_node, c_ns, "type", _cstr(xsitype)) + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + + + def deannotate(element_or_tree, pytype=True, xsi=True): + """Recursively de-annotate the elements of an XML tree by removing 'pytype' + and/or 'type' attributes. + + If the 'pytype' keyword argument is True (the default), 'pytype' attributes + will be removed. If the 'xsi' keyword argument is True (the default), + 'xsi:type' attributes will be removed. + """ + cdef _Element element + cdef tree.xmlNode* c_node + + element = cetree.rootNodeOrRaise(element_or_tree) + c_node = element._c_node + if pytype is True and xsi is True: + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + removed = cetree.delAttributeFromNsName(c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) + removed = cetree.delAttributeFromNsName(c_node, _XML_SCHEMA_INSTANCE_NS, "type") + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + elif pytype is True: + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + removed = cetree.delAttributeFromNsName(c_node, _PYTYPE_NAMESPACE, _PYTYPE_ATTRIBUTE_NAME) + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + else: + tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_node, c_node, 1) + removed = cetree.delAttributeFromNsName(c_node, _XML_SCHEMA_INSTANCE_NS, "type") + tree.END_FOR_EACH_ELEMENT_FROM(c_node) + + ################################################################################ # Module level parser setup *************** *** 1558,1563 **** --- 1690,1697 ---- _attributes = attrib if _pytype is None: _pytype = TREE_PYTYPE + if nsmap is None: + nsmap = { "py": PYTYPE_NAMESPACE, "xsi": XML_SCHEMA_INSTANCE_NS } _attributes[PYTYPE_ATTRIBUTE] = _pytype return _makeElement(_tag, None, _attributes, nsmap) *************** *** 1566,1576 **** """Create a new element with a Python value and XML attributes taken from keyword arguments or a dictionary passed as second argument. ! Automatically adds a 'pyval' attribute for the Python type of the value, ! if the type can be identified. If '_pyval' or '_xsi' are among the keyword arguments, they will be used instead. """ - cdef _Element element if attrib is not None: if python.PyDict_Size(_attributes): attrib.update(_attributes) --- 1700,1709 ---- """Create a new element with a Python value and XML attributes taken from keyword arguments or a dictionary passed as second argument. ! Automatically adds a 'pytype' attribute for the Python type of the value, ! if the type can be identified. If '_pytype' or '_xsi' are among the keyword arguments, they will be used instead. """ if attrib is not None: if python.PyDict_Size(_attributes): attrib.update(_attributes) *************** *** 1578,1584 **** if _xsi is not None: python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_TYPE_ATTR, _xsi) if _pytype is None: ! _pytype = _SCHEMA_TYPE_DICT[_xsi].name if python._isString(_value): strval = _value --- 1711,1720 ---- if _xsi is not None: python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_TYPE_ATTR, _xsi) if _pytype is None: ! # allow for s.o. using unregistered or even wrong xsi:type names ! pytype_lookup = _SCHEMA_TYPE_DICT.get(_xsi) ! if pytype_lookup is not None: ! _pytype = pytype_lookup.name if python._isString(_value): strval = _value -------------- next part -------------- *** ./doc/objectify.txt.ORIG Wed Apr 4 12:19:54 2007 --- ./doc/objectify.txt Wed Apr 4 14:35:18 2007 *************** *** 693,698 **** --- 693,753 ---- s = '5' [StringElement] * xsi:type = 'string' + Again, there is a utility function ``xsiannotate()`` that recursively + generates the "xsi:type" attribute for the elements of a tree:: + + >>> root = objectify.fromstring('''\ + ... test5true + ... ''') + >>> print objectify.dump(root) + root = None [ObjectifiedElement] + a = 'test' [StringElement] + b = 5 [IntElement] + c = True [BoolElement] + + >>> objectify.xsiannotate(root) + + >>> print objectify.dump(root) + root = None [ObjectifiedElement] + a = 'test' [StringElement] + * xsi:type = 'string' + b = 5 [IntElement] + * xsi:type = 'int' + c = True [BoolElement] + * xsi:type = 'boolean' + + Note, however, that ``xsiannotate()`` will always use the first XML Schema + datatype that is defined for any given Python type, see also + `Defining additional data classes`_. + + The utility function ``deannotate()`` can be used to get rid of 'py:pytype' + and/or 'xsi:type' information:: + + >>> root = objectify.fromstring('''\ + ... + ... 5 + ... 5 + ... 5 + ... ''') + >>> objectify.annotate(root) + >>> print objectify.dump(root) + root = None [ObjectifiedElement] + d = 5.0 [FloatElement] + * xsi:type = 'double' + * py:pytype = 'float' + l = 5L [LongElement] + * xsi:type = 'long' + * py:pytype = 'long' + s = '5' [StringElement] + * xsi:type = 'string' + * py:pytype = 'str' + >>> objectify.deannotate(root) + >>> print objectify.dump(root) + root = None [ObjectifiedElement] + d = 5 [IntElement] + l = 5 [IntElement] + s = 5 [IntElement] + For convenience, the ``DataElement()`` factory creates an Element with a Python value in one step. You can pass the required Python type name or the XSI type name:: *************** *** 714,721 **** >>> root.x = objectify.DataElement(5, _xsi="integer") >>> print objectify.dump(root) root = None [ObjectifiedElement] ! x = 5 [IntElement] ! * py:pytype = 'int' * xsi:type = 'integer' There is a side effect of the type lookup. If you assign a string value using --- 769,776 ---- >>> root.x = objectify.DataElement(5, _xsi="integer") >>> print objectify.dump(root) root = None [ObjectifiedElement] ! x = 5L [LongElement] ! * py:pytype = 'long' * xsi:type = 'integer' There is a side effect of the type lookup. If you assign a string value using -------------- next part -------------- *** ./src/lxml/tests/test_objectify.py.ORIG Wed Apr 4 10:45:47 2007 --- ./src/lxml/tests/test_objectify.py Wed Apr 4 12:18:13 2007 *************** *** 13,18 **** --- 13,22 ---- from lxml import objectify + XML_SCHEMA_INSTANCE_NS = "http://www.w3.org/2001/XMLSchema-instance" + XML_SCHEMA_INSTANCE_TYPE_ATTR = "{%s}type" % XML_SCHEMA_INSTANCE_NS + XML_SCHEMA_NIL_ATTR = "{%s}nil" % XML_SCHEMA_INSTANCE_NS + xml_str = '''\ *************** *** 28,34 **** """Test cases for lxml.objectify """ etree = etree ! def XML(self, xml): return self.etree.XML(xml, self.parser) --- 32,38 ---- """Test cases for lxml.objectify """ etree = etree ! def XML(self, xml): return self.etree.XML(xml, self.parser) *************** *** 356,375 **** XML = self.XML root = XML('''\ ! 5 ! 5 ! 5 ''') ! self.assert_(isinstance(root.a[0], objectify.IntElement)) ! self.assertEquals(5, root.a[0]) ! ! self.assert_(isinstance(root.a[1], objectify.StringElement)) ! self.assertEquals("5", root.a[1]) ! ! self.assert_(isinstance(root.a[2], objectify.FloatElement)) ! self.assertEquals(5.0, root.a[2]) def test_type_str_sequence(self): XML = self.XML --- 360,428 ---- XML = self.XML root = XML('''\ ! true ! false ! 1 ! 0 ! ! 5 ! 5 ! ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! 5 ! ! 5 ! 5 ! 5 ! 5 ! 5 ! ! ''') ! for b in root.b: ! self.assert_(isinstance(b, objectify.BoolElement)) ! self.assertEquals(True, root.b[0]) ! self.assertEquals(False, root.b[1]) ! self.assertEquals(True, root.b[2]) ! self.assertEquals(False, root.b[3]) ! ! for f in root.f: ! self.assert_(isinstance(f, objectify.FloatElement)) ! self.assertEquals(5, f) ! ! for s in root.s: ! self.assert_(isinstance(s, objectify.StringElement)) ! self.assertEquals("5", s) ! ! for l in root.l: ! self.assert_(isinstance(l, objectify.LongElement)) ! self.assertEquals(5l, l) ! ! for i in root.i: ! self.assert_(isinstance(i, objectify.IntElement)) ! self.assertEquals(5, i) ! ! self.assert_(isinstance(root.n, objectify.NoneElement)) ! self.assertEquals(None, root.n) def test_type_str_sequence(self): XML = self.XML *************** *** 444,453 **** root.b = False self.assertFalse(root.b) ! def test_type_annotation(self): XML = self.XML root = XML(u'''\ ! 5 test 1.1 --- 497,667 ---- root.b = False self.assertFalse(root.b) ! def test_pytype_annotation(self): XML = self.XML root = XML(u'''\ ! ! 5 ! test ! 1.1 ! \uF8D2 ! true ! ! ! 5 ! 5 ! 23 ! 42 ! 300 ! 2 ! ! ''') ! objectify.annotate(root) ! ! child_types = [ c.get(objectify.PYTYPE_ATTRIBUTE) ! for c in root.iterchildren() ] ! self.assertEquals("int", child_types[0]) ! self.assertEquals("str", child_types[1]) ! self.assertEquals("float", child_types[2]) ! self.assertEquals("str", child_types[3]) ! self.assertEquals("bool", child_types[4]) ! self.assertEquals("none", child_types[5]) ! self.assertEquals(None, child_types[6]) ! self.assertEquals("float", child_types[7]) ! self.assertEquals("float", child_types[8]) ! self.assertEquals("str", child_types[9]) ! self.assertEquals("int", child_types[10]) ! self.assertEquals("int", child_types[11]) ! self.assertEquals("int", child_types[12]) ! ! self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) ! ! def test_pytype_annotation_use_old(self): ! XML = self.XML ! root = XML(u'''\ ! ! 5 ! test ! 1.1 ! \uF8D2 ! true ! ! ! 5 ! 5 ! 23 ! 42 ! 300 ! 2 ! ! ''') ! objectify.annotate(root, ignore_old=False) ! ! child_types = [ c.get(objectify.PYTYPE_ATTRIBUTE) ! for c in root.iterchildren() ] ! self.assertEquals("int", child_types[0]) ! self.assertEquals("str", child_types[1]) ! self.assertEquals("float", child_types[2]) ! self.assertEquals("str", child_types[3]) ! self.assertEquals("bool", child_types[4]) ! self.assertEquals("none", child_types[5]) ! self.assertEquals(None, child_types[6]) ! self.assertEquals("float", child_types[7]) ! self.assertEquals("float", child_types[8]) ! self.assertEquals("str", child_types[9]) ! self.assertEquals("str", child_types[10]) ! self.assertEquals("float", child_types[11]) ! self.assertEquals("long", child_types[12]) ! ! self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) ! ! def test_xsitype_annotation(self): ! XML = self.XML ! root = XML(u'''\ ! ! 5 ! test ! 1.1 ! \uF8D2 ! true ! ! ! 5 ! 5 ! 23 ! 42 ! 300 ! 2 ! ! ''') ! objectify.xsiannotate(root) ! ! child_types = [ c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR) ! for c in root.iterchildren() ] ! self.assertEquals("int", child_types[0]) ! self.assertEquals("string", child_types[1]) ! self.assertEquals("float", child_types[2]) ! self.assertEquals("string", child_types[3]) ! self.assertEquals("boolean", child_types[4]) ! self.assertEquals(None, child_types[5]) ! self.assertEquals(None, child_types[6]) ! self.assertEquals("int", child_types[7]) ! self.assertEquals("int", child_types[8]) ! self.assertEquals("int", child_types[9]) ! self.assertEquals("string", child_types[10]) ! self.assertEquals("float", child_types[11]) ! self.assertEquals("integer", child_types[12]) ! ! self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) ! ! def test_xsitype_annotation_use_old(self): ! XML = self.XML ! root = XML(u'''\ ! ! 5 ! test ! 1.1 ! \uF8D2 ! true ! ! ! 5 ! 5 ! 23 ! 42 ! 300 ! 2 ! ! ''') ! objectify.xsiannotate(root, ignore_old=False) ! ! child_types = [ c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR) ! for c in root.iterchildren() ] ! self.assertEquals("int", child_types[0]) ! self.assertEquals("string", child_types[1]) ! self.assertEquals("float", child_types[2]) ! self.assertEquals("string", child_types[3]) ! self.assertEquals("boolean", child_types[4]) ! self.assertEquals(None, child_types[5]) ! self.assertEquals(None, child_types[6]) ! self.assertEquals("double", child_types[7]) ! self.assertEquals("float", child_types[8]) ! self.assertEquals("string", child_types[9]) ! self.assertEquals("string", child_types[10]) ! self.assertEquals("float", child_types[11]) ! self.assertEquals("integer", child_types[12]) ! ! self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) ! ! def test_deannotation(self): ! XML = self.XML ! root = XML(u'''\ ! 5 test 1.1 *************** *** 456,464 **** --- 670,756 ---- 5 + 5 + 23 + 42 + 300 + 2 + + ''') + objectify.deannotate(root) + + for c in root.getiterator(): + self.assertEquals(None, c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR)) + self.assertEquals(None, c.get(objectify.PYTYPE_ATTRIBUTE)) + + self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) + + def test_pytype_deannotation(self): + XML = self.XML + root = XML(u'''\ + + 5 + test + 1.1 + \uF8D2 + true + + + 5 + 5 + 23 + 42 + 300 + 2 + + ''') + objectify.xsiannotate(root) + objectify.deannotate(root, xsi=False) + + child_types = [ c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR) + for c in root.iterchildren() ] + self.assertEquals("int", child_types[0]) + self.assertEquals("string", child_types[1]) + self.assertEquals("float", child_types[2]) + self.assertEquals("string", child_types[3]) + self.assertEquals("boolean", child_types[4]) + self.assertEquals(None, child_types[5]) + self.assertEquals(None, child_types[6]) + self.assertEquals("int", child_types[7]) + self.assertEquals("int", child_types[8]) + self.assertEquals("int", child_types[9]) + self.assertEquals("string", child_types[10]) + self.assertEquals("float", child_types[11]) + self.assertEquals("integer", child_types[12]) + + self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) + + for c in root.getiterator(): + self.assertEquals(None, c.get(objectify.PYTYPE_ATTRIBUTE)) + + def test_xsitype_deannotation(self): + XML = self.XML + root = XML(u'''\ + + 5 + test + 1.1 + \uF8D2 + true + + + 5 + 5 + 23 + 42 + 300 + 2 ''') objectify.annotate(root) + objectify.deannotate(root, pytype=False) child_types = [ c.get(objectify.PYTYPE_ATTRIBUTE) for c in root.iterchildren() ] *************** *** 470,475 **** --- 762,777 ---- self.assertEquals("none", child_types[5]) self.assertEquals(None, child_types[6]) self.assertEquals("float", child_types[7]) + self.assertEquals("float", child_types[8]) + self.assertEquals("str", child_types[9]) + self.assertEquals("int", child_types[10]) + self.assertEquals("int", child_types[11]) + self.assertEquals("int", child_types[12]) + + self.assertEquals("true", root.n.get(XML_SCHEMA_NIL_ATTR)) + + for c in root.getiterator(): + self.assertEquals(None, c.get(XML_SCHEMA_INSTANCE_TYPE_ATTR)) def test_change_pytype_attribute(self): XML = self.XML *************** *** 881,887 **** self.assertEquals( etree.tostring(new_root), etree.tostring(root)) - def test_suite(): suite = unittest.TestSuite() --- 1183,1188 ---- From jholg at gmx.de Wed Apr 4 15:26:35 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Wed, 04 Apr 2007 15:26:35 +0200 Subject: [lxml-dev] Document is not valid XML Schema Message-ID: <20070404132635.36750@gmx.net> Hi, one thing I've noticed - they seem to have different schema notions for each platform & version, and some seem to use Schematron (I'm not familiar with that): Complete Schema - has all documentation embedded and the Schematron mark-up. Minimal Schema - includes the raw xml schema only. Maybe you used a "Complete Schema" and should rather use a "Minimal schema" for lxml, which has W3C XML Schema and RelaxNG support? FWIW, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From albert.brandl at tttech.com Thu Apr 5 15:12:59 2007 From: albert.brandl at tttech.com (Albert Brandl) Date: Thu, 5 Apr 2007 15:12:59 +0200 Subject: [lxml-dev] Document is not valid XML Schema In-Reply-To: <3B8C9773FAA87E448B042757CCD1FDA70150E35F@mayhem.opsware.com> References: <3B8C9773FAA87E448B042757CCD1FDA70150E35F@mayhem.opsware.com> Message-ID: <20070405131259.GC23892@tttech.com> On Tue, Apr 03, 2007 at 01:41:33PM -0700, Jeff Cheng wrote: > I am trying to validate XML files against their respective schemas. > However, lxml complains that the schemas are not valid. You might get more information if you catch the exception and evaluate the error log. Here is a description how to do this: http://codespeak.net/lxml/api.html#error-handling-on-exceptions Regards, Albert From rampeters at gmail.com Fri Apr 6 21:33:45 2007 From: rampeters at gmail.com (Ram Peters) Date: Fri, 6 Apr 2007 15:33:45 -0400 Subject: [lxml-dev] Parsing Received XML: Getting Childs and Assign Message-ID: <81b45360704061233p7fb29de2kecef91b4fd58b80@mail.gmail.com> Breakfast at Tiffany's Movie Classic Borat Movie Comedy How to parse this xml received from a client using lxml? I will be using lxml objectify. First I need to get first child (This is where I am stuck.) and assign it to the python model. Get second child and assign it to the python model, so on. I looked at the documentation, it's kind of hard to grasp for a newb. Thank you. From stefan_ml at behnel.de Sat Apr 7 09:13:19 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 07 Apr 2007 09:13:19 +0200 Subject: [lxml-dev] ObjectPath for "current node" In-Reply-To: References: Message-ID: <4617448F.1040203@behnel.de> Hi, Christian Zagrodnick wrote: > in the object paths can be relative, like '.foo.bar'. Shouldn't it be > possible then to create the object path of '.' referencing the current > node? Good idea. Implemented on the trunk. Stefan From stefan_ml at behnel.de Sat Apr 7 18:11:37 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 07 Apr 2007 18:11:37 +0200 Subject: [lxml-dev] Parsing Received XML: Getting Childs and Assign In-Reply-To: <81b45360704061233p7fb29de2kecef91b4fd58b80@mail.gmail.com> References: <81b45360704061233p7fb29de2kecef91b4fd58b80@mail.gmail.com> Message-ID: <4617C2B9.1090607@behnel.de> Hi, Ram Peters wrote: > > > Breakfast at Tiffany's > Movie > Classic > > > > Borat > Movie > Comedy > > > > How to parse this xml received from a client using lxml? > I will be using lxml objectify. At the end of this section in the docs, you will find the command that does it: http://codespeak.net/lxml/dev/objectify.html#creating-objectify-trees namely: >>> root = objectify.fromstring("") Or, if you want to parse from a file, set up a parser as this section describes http://codespeak.net/lxml/dev/objectify.html#setting-up-lxml-objectify and then do something like this: >>> et = etree.parse(myfilename, parser) You might also want to read the doc page on parsing: http://codespeak.net/lxml/dev/parsing.html > First I need to get first child (This > is where I am stuck.) >>> root.DVD > and assign it to the python model. ??? > Get second > child and assign it to the python model, so on. I looked at the > documentation, it's kind of hard to grasp for a newb. Why don't you run a loop over them? >>> for dvd in root.DVD: ... print dvd.get("id") 1 2 I have a slight intuition that it might also help you to read the Python tutorial first. Regards, Stefan From stefan_ml at behnel.de Sat Apr 7 18:14:53 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 07 Apr 2007 18:14:53 +0200 Subject: [lxml-dev] Document is not valid XML Schema In-Reply-To: <20070404132635.36750@gmx.net> References: <20070404132635.36750@gmx.net> Message-ID: <4617C37D.8020106@behnel.de> Hi, jholg at gmx.de wrote: > one thing I've noticed - they seem to have different schema notions for > each platform & version, and some seem to use Schematron (I'm not familiar with that): just a quick note here: you can compile lxml's current trunk with Schematron support if you uncomment the respective line at the end of the etree.pyx file. Have fun, Stefan From cz at gocept.com Tue Apr 10 11:12:07 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Tue, 10 Apr 2007 11:12:07 +0200 Subject: [lxml-dev] ObjectPath for "current node" References: <4617448F.1040203@behnel.de> Message-ID: On 2007-04-07 09:13:19 +0200, Stefan Behnel said: > Hi, > > Christian Zagrodnick wrote: >> in the object paths can be relative, like '.foo.bar'. Shouldn't it be >> possible then to create the object path of '.' referencing the current >> node? > > Good idea. Implemented on the trunk. Great! Thanks :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Tue Apr 10 20:45:56 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 10 Apr 2007 20:45:56 +0200 Subject: [lxml-dev] [objectify] patch/changes proposal: xsiannotate, deannotate In-Reply-To: <20070404130639.321050@gmx.net> References: <20070404130639.321050@gmx.net> Message-ID: <461BDB64.2050809@behnel.de> Hi Holger, thanks a lot for the patch. I took a deeper look at it this morning and it doesn't really look like the cleanest one on earth to me. I applied it anyway and cleaned it up to match my idea of what you were going after. The new patch is attached, please verify that this is what you wanted. jholg at gmx.de wrote: > Hi all, I suggest > > 1. adding two functions to lxml.objectify: > > def xsiannotate(element_or_tree, ignore_old=True): """Recursively annotates > the elements of an XML tree with 'xsi:type' attributes. > > If the 'ignore_old' keyword argument is True (the default), current > 'xsi:type' attributes will be ignored and replaced. Otherwise, they will > be checked and only replaced if they no longer fit the current text value. > """ [...] Sure. I think that's helpful as objectify supports two annotations after all. > Note: Will simply take the first schema type in PyType.xmlSchemaTypes list. > Hmmm. I guess that should do, but I'd prefer having that documented. > def deannotate(element_or_tree, pytype=True, xsi=True): """Recursively > de-annotate the elements of an XML tree by removing 'pytype' and/or 'type' > attributes. > > If the 'pytype' keyword argument is True (the default), 'pytype' attributes > will be removed. If the 'xsi' keyword argument is True (the default), > 'xsi:type' attributes will be removed. """ [...] Sure, definitely helpful for cleanup purposes. > 2. Patching annotate() so that it allows for leaving pytype="str" as is if > ignore_old=False. Currently it will start type-guessing/xsi-type lookup as > PyType(str,...) uses no type_check function. I think that's the right thing to do. > 3. Modifying the objectify.Element() factory to default nsmap to nsmap = { > "py": PYTYPE_NAMESPACE, "xsi": XML_SCHEMA_INSTANCE_NS } if it is None. This > keeps namespace-information in non-root nodes nice and clean with the cool > new 1.3 lookup-if-ns-is-defined-up-in-the-tree functionality. Cool, hum? 8o] Ok, although not everyone will use annotations, we already add them internally in DataElement() if we can figure out the type, so this is also helpful. > 4. Patch DataElement so that it allows s.o. using an _xsitype argument that > is not registered (or even plain wrong). Currently, this raises a KeyError, > whereas using an unknown pytype defaults to StringElement. Sure, why not. We're all adults, right? > 5. Restructure pytype<-->XML Schema type mapping a bit, as e.g XML Schema > type integer fits better to a Python long than a Python int regarding value > space. Definitely. And since Python transmogrifies ints into longs already if it has to, assuming longs can never hurt. > (Anything that fits in 32bit becomes a Python int, everything else a Python > long. Maybe slightly arbitrary, but ok for 32bit-machines :-) Sure, no one will ever need more than 32 bits to address those 670KB of memory, right? :) Have you checked what the XML Schema datatypes spec says here? I know that C doesn't really define an int across platforms, but they do, right? > This does not > have big implications in practice, it's more or less for consistency. One > thing remains: xsiannotate()-ing an IntElement >=2**31 will still xsi:type > that as "int", which is not really valid regarding schema types. This could > be addressed by using a more elaborate type_check for PyType("int",...) but > I'm unsure about performance drawback and if it's worth the effort. Whatever. Just wait until someone complains. :) >From the point of view of objectify's internal use of type annotations, I can't see a major difference here, so whatever we change in the future should not impact current programs (famous last words...) Note that you can always override this by hand by replacing the 'int()' function with something that additionally checks the resulting value. Requires a bit of shuffeling in the PyType registry, but since most people will not care anyway... > b) Add all (non-list) XML Schema datatypes that restrict "string" to > PyType('str', ...) As StringElement is the default these end up in > StringElement anyway today. Adding them can result in faster lookup as no > type-guessing will be invoked, and just for completeness... Sure, and it definitely doesn't hurt as it's still only a lookup in a rather small dictionary. Thanks for the effort, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: secondtry.patch Type: text/x-patch Size: 30083 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070410/59f342b0/attachment-0001.bin From jimrees at itasoftware.com Wed Apr 11 01:05:47 2007 From: jimrees at itasoftware.com (Jim Rees) Date: Tue, 10 Apr 2007 19:05:47 -0400 Subject: [lxml-dev] greetings, and another bug... Message-ID: Itamar told me I'd get best results by joining this list. First off lxml.etree is great. I'm relatively new to both Python and XML, and this is the only way to code XML stuff. I have found a few bugs, the first set of which Itamar may have already forwarded along. Today I found another. A valid XSD construct fails to validate. (Not the document to be validated, but the schema doc itself). If the minInclusive/maxInclusive facets are removed, the problem goes away. xmllint running against the same libxml2 shlibs has no problem with this. This has been consistent across 1.1.2, 1.2.1, and 1.3.beta. import lxml.etree as ET import sys trivial_schema = """ """ schematree = ET.XML(trivial_schema) validator = ET.XMLSchema(schematree) trivial_document = """ 99.99999999999999999999 """ doctree = ET.XML(trivial_document) validator.assertValid(doctree) print "Okay." From jholg at gmx.de Wed Apr 11 09:12:37 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Wed, 11 Apr 2007 09:12:37 +0200 Subject: [lxml-dev] [objectify] patch/changes proposal: xsiannotate, deannotate In-Reply-To: <461BDB64.2050809@behnel.de> References: <20070404130639.321050@gmx.net> <461BDB64.2050809@behnel.de> Message-ID: <20070411071237.117000@gmx.net> Hi Stefan, > and cleaned it up to match my idea of what you were going after. The new > patch > is attached, please verify that this is what you wanted. Just tested the new patch and works fine for me. > Have you checked what the XML Schema datatypes spec says here? I know that > C > doesn't really define an int across platforms, but they do, right? Right: """ [Definition:] int is ?derived? from long by setting the value of ?maxInclusive? to be 2147483647 and ?minInclusive? to be -2147483648. The ?base type? of int is long. ... """ I've taken the Schema types <--> Python type mapping from the XML Schema Datatypes spec. The "widest" or least restricted Schema type is the first in each type registration, e.g. "string" for the Schema types that are string beasts. Thanks a lot, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From novalis at openplans.org Thu Apr 12 23:20:06 2007 From: novalis at openplans.org (David Turner) Date: Thu, 12 Apr 2007 17:20:06 -0400 Subject: [lxml-dev] Weird bug Message-ID: <1176412806.14910.60.camel@novalis.openplans.org> I'm trying to write some code that uses lxml, and I run into a weird memory error. Unfortunately, I can't seem to create a small testcase. So this bug report probably won't be very useful. How to reproduce: Check out the following code: http://codespeak.net/svn/z3/deliverance/branches/parallel python setup.py develop python deliverance/test_wsgi.py This will sometimes run just fine (that is, produce no output). Sometimes, it will give the following error:, which doesn't really seem to matter, since it's "most likely raised during interpreter shutdown" Exception in thread Thread-70 (most likely raised during interpreter shutdown): Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap File "/home/novalis/deliverance/src/deliverance/transcluder/threadpool.py", line 91, in run File "/home/novalis/deliverance/src/deliverance/transcluder/tasklist.py", line 87, in get File "/usr/lib64/python2.4/threading.py", line 197, in wait exceptions.TypeError: 'NoneType' object is not callable Unhandled exception in thread started by Error in sys.excepthook: Original exception was: [nothing is printed here] ------------ And sometimes, there's an error in the actual test: --------- Traceback (most recent call last): File "deliverance/test_wsgi.py", line 361, in ? x[0](*x[1:]) File "deliverance/test_wsgi.py", line 156, in do_aggregate html_string_compare(res.body, res2.body) File "deliverance/test_wsgi.py", line 61, in html_string_compare raise ValueError( ValueError: Comparison failed between actual: ================== I am a title Some text

Paragraph one

Paragraph two

external body text

expected: ================== I am a title Some text

Paragraph one

Paragraph two

external body text

Report: children length differs, 4 != 3 children 1 do not match: head ------------ Running valgrind shows a couple of memory errors. The first is in xmlFreeNode, when it attempts to get the dict from a doc that has been freed. The node in question is created at line 327 of tasklist.py in transcluder -- but the error comes later, during garbage collection. If anyone has any ideas, I'm all ears. From stefan_ml at behnel.de Fri Apr 13 08:44:25 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2007 08:44:25 +0200 Subject: [lxml-dev] Weird bug In-Reply-To: <1176412806.14910.60.camel@novalis.openplans.org> References: <1176412806.14910.60.camel@novalis.openplans.org> Message-ID: <461F26C9.1030907@behnel.de> Hi, David Turner wrote: > I'm trying to write some code that uses lxml, and I run into a weird > memory error. > > Unfortunately, I can't seem to create a small testcase. So this bug > report probably won't be very useful. Thanks for the report. However, I can't see anything related to lxml from your stack traces, so before I try to reproduce this, would you mind trying it with the latest trunk version of lxml? You didn't state which version you were using, so I assume it was a release version. > Running valgrind shows a couple of memory errors. The first is in > xmlFreeNode, when it attempts to get the dict from a doc that has been > freed. The node in question is created at line 327 of tasklist.py in > transcluder -- but the error comes later, during garbage collection. Could you send me the valgrind log? bzip2 is fine. Thanks, Stefan From stefan_ml at behnel.de Fri Apr 13 18:05:14 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2007 18:05:14 +0200 Subject: [lxml-dev] greetings, and another bug... In-Reply-To: References: Message-ID: <461FAA3A.2000808@behnel.de> Hi, Jim Rees wrote: > Itamar told me I'd get best results by joining this list. Definitely the best place for it. > First off lxml.etree is great. I'm relatively new to both Python > and XML, and this is the only way to code XML stuff. Not sure what you mean with "the only way", but I guess you were just rephrasing the obvious "the best way". ;) > I have found a few bugs, the first set of which Itamar may have > already forwarded along. I don't think he did. I would like to see them reported on the list so that we can see what to do about them. > Today I found another. A valid XSD > construct fails to validate. (Not the document to be validated, but > the schema doc itself). If the minInclusive/maxInclusive facets are > removed, the problem goes away. xmllint running against the same > libxml2 shlibs has no problem with this. > > This has been consistent across 1.1.2, 1.2.1, and 1.3.beta. > > import lxml.etree as ET > import sys > > trivial_schema = """ > > > > > > > > > > """ > > schematree = ET.XML(trivial_schema) > validator = ET.XMLSchema(schematree) > > trivial_document = """ > 99.99999999999999999999 > """ > > doctree = ET.XML(trivial_document) > > validator.assertValid(doctree) > > print "Okay." Okay, I tested this and I can't see any problems with the current trunk nor with 1.2. I'm using libxml2 2.6.27 here, what's the version reported by lxml on your side? Regards, Stefan From novalis at openplans.org Fri Apr 13 18:17:19 2007 From: novalis at openplans.org (David Turner) Date: Fri, 13 Apr 2007 12:17:19 -0400 Subject: [lxml-dev] Weird bug In-Reply-To: <461F26C9.1030907@behnel.de> References: <1176412806.14910.60.camel@novalis.openplans.org> <461F26C9.1030907@behnel.de> Message-ID: <1176481039.21362.19.camel@novalis.openplans.org> On Fri, 2007-04-13 at 08:44 +0200, Stefan Behnel wrote: > Hi, > > David Turner wrote: > > I'm trying to write some code that uses lxml, and I run into a weird > > memory error. > > > > Unfortunately, I can't seem to create a small testcase. So this bug > > report probably won't be very useful. > > Thanks for the report. However, I can't see anything related to lxml from your > stack traces, so before I try to reproduce this, would you mind trying it with > the latest trunk version of lxml? You didn't state which version you were > using, so I assume it was a release version. Actually, I'm using the latest trunk, and libxml2.6.27 > > Running valgrind shows a couple of memory errors. The first is in > > xmlFreeNode, when it attempts to get the dict from a doc that has been > > freed. The node in question is created at line 327 of tasklist.py in > > transcluder -- but the error comes later, during garbage collection. > > Could you send me the valgrind log? bzip2 is fine. It's small, so I attached it here. -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind.log.21815.bz2 Type: application/x-bzip Size: 1175 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070413/c3d00e71/attachment.bin From ianb at colorstudy.com Sun Apr 15 00:47:46 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 14 Apr 2007 17:47:46 -0500 Subject: [lxml-dev] finding the line number of a parsed element In-Reply-To: <46005CC9.9010803@behnel.de> References: <200703151442.15024.srichter@cosmos.phy.tufts.edu> <45FABCA6.7030700@behnel.de> <46005CC9.9010803@behnel.de> Message-ID: <46215A12.5010701@colorstudy.com> Stefan Behnel wrote: > Hi everyone, > > Stefan Behnel wrote: >> There is no API for it, but internally, we have this information for parsed >> trees, at least the line number - note that exceptions contain the line number >> already. So we could easily add a property "_line" to elements that returns >> the line number at which the element was parsed (*if* it was parsed). I don't >> like the fact so much that libxml2 puts a zero there > > Sorry for the FUD. I just checked and found that libxml2 is actually smarter > than I remembered from the last time I looked at this. It gives you a 1 for > the first line in the parser. So it's actually easy to distinguish between "no > line known" and "parsed in line x". > > That makes "el.line" a perfectly working API. I called it "el.sourceline" > though, to make it clearer that only parsing XML source produces it, not > creating Elements in any other way. I also made it writable, just in case > someone wants to add line numbers to generated trees or something. Is there a file or resource name in there somewhere too? This would be nice to have if, say, you were using xinclude to combine elements from different sources. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Sun Apr 15 02:32:52 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 14 Apr 2007 19:32:52 -0500 Subject: [lxml-dev] el.attrib.pop() Message-ID: <462172B4.9010207@colorstudy.com> Should the .attrib object have a pop method? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Sun Apr 15 03:13:13 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 14 Apr 2007 20:13:13 -0500 Subject: [lxml-dev] LXML-based doctest output checker Message-ID: <46217C29.1070207@colorstudy.com> I have a rough but probably useful output checker for doctest that uses lxml to parse (and HTML). It's a bit like formencode.doctest_xml_compare (which uses ElementTree), but I think the output is nicer and of course the lxml aspect. It's pretty rough and highly untested, and the way it injects its output comparison into doctest is kind of lame. I suppose there's a way you could subclass the parser to use this output checker, but I didn't look into it too closely. A quick grep of doctest.py doesn't make it look easy. For now it's here: http://svn.pythonpaste.org/Paste/WSGIFilter/trunk/wsgifilter/lxmldoctest.py -- but really we have a number of little routines around lxml that we should probably break off somewhere, as they aren't directly related to WSGIFilter. Mostly they are HTML-related things, which probably don't belong in lxml directly. Though I dunno, lxml.html? It's a grab-bag though, so I'm not really proposing that at this time. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From stefan_ml at behnel.de Sun Apr 15 11:17:18 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2007 11:17:18 +0200 Subject: [lxml-dev] finding the line number of a parsed element In-Reply-To: <46215A12.5010701@colorstudy.com> References: <200703151442.15024.srichter@cosmos.phy.tufts.edu> <45FABCA6.7030700@behnel.de> <46005CC9.9010803@behnel.de> <46215A12.5010701@colorstudy.com> Message-ID: <4621ED9E.3000505@behnel.de> Hi Ian, Ian Bicking wrote: >> Stefan Behnel wrote: >>> There is no API for it, but internally, we have this information for >>> parsed >>> trees, at least the line number - note that exceptions contain the >>> line number > Is there a file or resource name in there somewhere too? This would be > nice to have if, say, you were using xinclude to combine elements from > different sources. No, that's only stored at a per-document level (which makes sense IMHO). Regards, Stefan From stefan_ml at behnel.de Sun Apr 15 11:43:44 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2007 11:43:44 +0200 Subject: [lxml-dev] el.attrib.pop() In-Reply-To: <462172B4.9010207@colorstudy.com> References: <462172B4.9010207@colorstudy.com> Message-ID: <4621F3D0.4010404@behnel.de> Hi Ian, Ian Bicking wrote: > Should the .attrib object have a pop method? it's not in ET, but I wouldn't know why lxml shouldn't have it. "attrib" should look at much like a dict as possible. It's implemented in the trunk now. Stefan From ianb at colorstudy.com Sun Apr 15 20:36:04 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 15 Apr 2007 13:36:04 -0500 Subject: [lxml-dev] finding the line number of a parsed element In-Reply-To: <4621ED9E.3000505@behnel.de> References: <200703151442.15024.srichter@cosmos.phy.tufts.edu> <45FABCA6.7030700@behnel.de> <46005CC9.9010803@behnel.de> <46215A12.5010701@colorstudy.com> <4621ED9E.3000505@behnel.de> Message-ID: <46227094.4070406@colorstudy.com> Stefan Behnel wrote: > Hi Ian, > > Ian Bicking wrote: >>> Stefan Behnel wrote: >>>> There is no API for it, but internally, we have this information for >>>> parsed >>>> trees, at least the line number - note that exceptions contain the >>>> line number >> Is there a file or resource name in there somewhere too? This would be >> nice to have if, say, you were using xinclude to combine elements from >> different sources. > > No, that's only stored at a per-document level (which makes sense IMHO). What would you do then if you create a document with multiple sources? E.g., if you use xinclude to include elements from different sources into a single document. The line numbers will be nonsense at that point, and there's no clear place to keep track of the real source. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From tseaver at palladion.com Sun Apr 15 20:56:58 2007 From: tseaver at palladion.com (Tres Seaver) Date: Sun, 15 Apr 2007 14:56:58 -0400 Subject: [lxml-dev] finding the line number of a parsed element In-Reply-To: <46227094.4070406@colorstudy.com> References: <200703151442.15024.srichter@cosmos.phy.tufts.edu> <45FABCA6.7030700@behnel.de> <46005CC9.9010803@behnel.de> <46215A12.5010701@colorstudy.com> <4621ED9E.3000505@behnel.de> <46227094.4070406@colorstudy.com> Message-ID: <4622757A.9060909@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote: > Stefan Behnel wrote: >> Hi Ian, >> >> Ian Bicking wrote: >>>> Stefan Behnel wrote: >>>>> There is no API for it, but internally, we have this information for >>>>> parsed >>>>> trees, at least the line number - note that exceptions contain the >>>>> line number >>> Is there a file or resource name in there somewhere too? This would be >>> nice to have if, say, you were using xinclude to combine elements from >>> different sources. >> No, that's only stored at a per-document level (which makes sense IMHO). > > What would you do then if you create a document with multiple sources? > E.g., if you use xinclude to include elements from different sources > into a single document. The line numbers will be nonsense at that > point, and there's no clear place to keep track of the real source. Logically, wouldn't the xincluded node have its "own" document reference, with correct filename / URL, since it is just "borrowed" into the including document? I don't know if lxml's / ETree's semantics support such a notion, however. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGInV6+gerLs4ltQ4RAltQAKDa6LHNYl6L/ZhDcv4wsJUxCyVSmgCgjDRV 7Bg0RmDMvzBgBl8vIps0Xxc= =Tyc3 -----END PGP SIGNATURE----- From stefan_ml at behnel.de Sun Apr 15 21:31:11 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2007 21:31:11 +0200 Subject: [lxml-dev] finding the line number of a parsed element In-Reply-To: <4622757A.9060909@palladion.com> References: <200703151442.15024.srichter@cosmos.phy.tufts.edu> <45FABCA6.7030700@behnel.de> <46005CC9.9010803@behnel.de> <46215A12.5010701@colorstudy.com> <4621ED9E.3000505@behnel.de> <46227094.4070406@colorstudy.com> <4622757A.9060909@palladion.com> Message-ID: <46227D7F.2070100@behnel.de> Tres Seaver wrote: > Ian Bicking wrote: >>>>> Is there a file or resource name in there somewhere too? This would be >>>>> nice to have if, say, you were using xinclude to combine elements from >>>>> different sources. >>>> No, that's only stored at a per-document level (which makes sense IMHO). >>> What would you do then if you create a document with multiple sources? >>> E.g., if you use xinclude to include elements from different sources >>> into a single document. The line numbers will be nonsense at that >>> point, and there's no clear place to keep track of the real source. Right. What else should the line number be? It's the line in which the element was found by the parser. If you mix element from different document, this information becomes meaningless. > Logically, wouldn't the xincluded node have its "own" document > reference, with correct filename / URL, since it is just "borrowed" into > the including document? No. It will refer to the document that contains it (after the inclusion). > I don't know if lxml's / ETree's semantics > support such a notion, however. No. All elements in a document should always refer to this document. Stefan From jholg at gmx.de Mon Apr 16 11:59:01 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 16 Apr 2007 11:59:01 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? Message-ID: <20070416095901.169710@gmx.net> Hi, I just detected a problem with the xsi-types in objectify type registry in that they are no QNames: >>> schematree = etree.fromstring(""" ... ... ... ... ... ... ... ... ... ... ... """) >>> schema = etree.XMLSchema(schematree) >>> msg = etree.fromstring("""2387""") >>> print schema.validate(msg) 1 >>> print objectify.dump(msg) root = None [ObjectifiedElement] s = 2387 [IntElement] * xsi:type = 'xsd:string' >>> Note that s is an IntElement wherease it should be a StringElement. This goes away if changing its xsi:type to "string"; however, the doc instance then isn't valid against the schema anymore: >>> msg = etree.fromstring("""2387""") >>> print schema.validate(msg) 0 >>> print objectify.dump(msg) root = None [ObjectifiedElement] s = '2387' [StringElement] * xsi:type = 'string' >>> Is it easily possible to use QNames in the xsi-type lookup system? Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From jimrees at itasoftware.com Tue Apr 17 17:46:41 2007 From: jimrees at itasoftware.com (Jim Rees) Date: Tue, 17 Apr 2007 11:46:41 -0400 Subject: [lxml-dev] greetings, and another bug... In-Reply-To: <461FAA3A.2000808@behnel.de> References: <461FAA3A.2000808@behnel.de> Message-ID: <82EA2B88-9E52-41F4-826C-9A356B94E413@itasoftware.com> On Apr 13, 2007, at 12:05 PM, Stefan Behnel wrote: >> I have found a few bugs, the first set of which Itamar may have >> already forwarded along. > > I don't think he did. I would like to see them reported on the list > so that we > can see what to do about them. Here's my original bug script for the first set of bugs. It reproduces against libxml version at least up to 2.6.20, and lxml version at least up to 1.3.beta. The issues here are what seem to be improper caching of successful validation results, and a minor one regarding inconsistent empty element representations. -------------- next part -------------- A non-text attachment was scrubbed... Name: lxmlbugs.py Type: text/x-python-script Size: 2047 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070417/48cc3c48/attachment.bin -------------- next part -------------- From stefan_ml at behnel.de Tue Apr 17 18:02:20 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 17 Apr 2007 18:02:20 +0200 Subject: [lxml-dev] greetings, and another bug... In-Reply-To: <82EA2B88-9E52-41F4-826C-9A356B94E413@itasoftware.com> References: <461FAA3A.2000808@behnel.de> <82EA2B88-9E52-41F4-826C-9A356B94E413@itasoftware.com> Message-ID: <4624EF8C.2050509@behnel.de> Hi, thanks for the reports. A quick shot on the easy one: Jim Rees wrote: > emptynode = ET.Element("Empty") > emptynode2 = ET.Element("Empty") > emptynode2.text = '' > > print "An empty node with unset text outputs as", ET.tostring(emptynode) > print "That string parses back in with text set to", str(ET.fromstring(ET.tostring(emptynode)).text) > print > > print "An empty node with text set to the empty string outputs as", ET.tostring(emptynode2) > print "That string parses back in with text set to", str(ET.fromstring(ET.tostring(emptynode2)).text) > print "... and re-outputs as", ET.tostring(ET.fromstring(ET.tostring(emptynode2))) On my side, this writes: > An empty node with unset text outputs as I like that. > That string parses back in with text set to None Nice. > An empty node with text set to the empty string outputs as Cool. > That string parses back in with text set to None Not really a bug as XML does not distinguish between and , so technically, this is ok. > ... and re-outputs as As expected. I'm pretty far from calling this a bug. I'd rather see it as a nice feature of lxml that it tries to map the empty Python string to something meaningful. I believe, if you want to make a text empty, you're well off with setting it to None. So, if you rather pass the empty string, there's likely a reason for it. Stefan From jeroen at xos.nl Fri Apr 20 13:32:56 2007 From: jeroen at xos.nl (Jeroen van Holst) Date: Fri, 20 Apr 2007 13:32:56 +0200 Subject: [lxml-dev] XSLT parameter ignored? Message-ID: <4628A4E8.2080907@xos.nl> Hello, I'm trying to pass a parameter to an XSLT object as follows: xslt = etree.parse(stylesheet) style = etree.XSLT(xslt) params = {'profile.lang': 'en'} result = style(doc, params) The stylesheet is applied, but the parameter is ignored. This works in libxslt, so what am I missing? TIA, Jeroen -- -- Jeroen van Holst -- X/OS Experts in Open Systems BV | Phone: +31 20 6938364 -- Amsterdam, The Netherlands | Fax: +31 20 6948204 From stefan_ml at behnel.de Fri Apr 20 13:58:59 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 20 Apr 2007 13:58:59 +0200 Subject: [lxml-dev] XSLT parameter ignored? In-Reply-To: <4628A4E8.2080907@xos.nl> References: <4628A4E8.2080907@xos.nl> Message-ID: <4628AB03.5030406@behnel.de> Hi, Jeroen van Holst wrote: > I'm trying to pass a parameter to an XSLT object as follows: > > xslt = etree.parse(stylesheet) > style = etree.XSLT(xslt) > params = {'profile.lang': 'en'} > result = style(doc, params) > > The stylesheet is applied, but the parameter is ignored. This works in > libxslt, so what am I missing? One thing you're missing is that lxml is not libxslt. It has a different API. Have you tried result = style(doc, **params) ? Stefan From jeroen at xos.nl Fri Apr 20 14:25:07 2007 From: jeroen at xos.nl (Jeroen van Holst) Date: Fri, 20 Apr 2007 14:25:07 +0200 Subject: [lxml-dev] XSLT parameter ignored? In-Reply-To: <4628AB03.5030406@behnel.de> References: <4628A4E8.2080907@xos.nl> <4628AB03.5030406@behnel.de> Message-ID: <4628B123.9050607@xos.nl> Hi Stefan, Stefan Behnel wrote: > One thing you're missing is that lxml is not libxslt. It has a different API. > > Have you tried > > result = style(doc, **params) > > I realize it's different, but wrongly expected the functional equivalent for passing parameters that can't be specified via name = value. Thanks for your suggestion, it works! -- -- Jeroen van Holst -- X/OS Experts in Open Systems BV | Phone: +31 20 6938364 -- Amsterdam, The Netherlands | Fax: +31 20 6948204 From dsoulayrol at free.fr Fri Apr 20 14:56:28 2007 From: dsoulayrol at free.fr (David Soulayrol) Date: Fri, 20 Apr 2007 14:56:28 +0200 Subject: [lxml-dev] Misc questions Message-ID: <1177073788.10008.10.camel@dsoulayr.neotip> Hello, I'm trying my first script with lxml, and here are some (more or less) blockers I have: * How do I generate the XML header in the output ? From documentation, I thought write() would do, but I can't get it. * Is there a way to manage DTDs ? I think I read that something is ready in CVS. Is it true ? Note that it could be the moment for me to learn XML schemas or Relax NG (which one should I choose ? :) ) * For readability only, is there a way to specify the id that is created for each namespace used ? * At last, I discovered by chance the pretty_print argument of the write method. But I did not read anything about this in documentation, nor in the pydocs. Is there another interesting source of documentation to get complete function signatures (apart from the source) ? Thanks, -- David. From stefan_ml at behnel.de Fri Apr 20 15:49:21 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 20 Apr 2007 15:49:21 +0200 Subject: [lxml-dev] Misc questions In-Reply-To: <1177073788.10008.10.camel@dsoulayr.neotip> References: <1177073788.10008.10.camel@dsoulayr.neotip> Message-ID: <4628C4E1.6040809@behnel.de> Hi, David Soulayrol wrote: > * How do I generate the XML header in the output ? From documentation, I > thought write() would do, but I can't get it. > > * Is there a way to manage DTDs ? I think I read that something is ready > in CVS. Is it true ? Note that it could be the moment for me to learn > XML schemas or Relax NG (which one should I choose ? :) ) Please refer to the in-development docs of lxml: http://codespeak.net/lxml/dev/ If you then still want to choose, learn RNG. > * For readability only, is there a way to specify the id that is created > for each namespace used ? You mean the namespace prefix? Pass a dictionary to Element()'s "nsmap" argument. > * At last, I discovered by chance the pretty_print argument of the write > method. But I did not read anything about this in documentation, nor in > the pydocs. It's in there now. See api.txt or api.html respectively. The dev-Version of the docs will replace the old docs with the next release. > Is there another interesting source of documentation to get > complete function signatures (apart from the source) ? That's a bit tricky to generate from Pyrex source. But you can try help() on a lot of object's by now. If you still find anything missing from the docs, we'd appreciate a patch to the text files in the doc directory. Stefan From stefan_ml at behnel.de Sat Apr 21 16:18:00 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Apr 2007 16:18:00 +0200 Subject: [lxml-dev] Weird bug In-Reply-To: <1176481039.21362.19.camel@novalis.openplans.org> References: <1176412806.14910.60.camel@novalis.openplans.org> <461F26C9.1030907@behnel.de> <1176481039.21362.19.camel@novalis.openplans.org> Message-ID: <462A1D18.3030808@behnel.de> Hi, David Turner wrote: > On Fri, 2007-04-13 at 08:44 +0200, Stefan Behnel wrote: >> David Turner wrote: >>> I'm trying to write some code that uses lxml, and I run into a weird >>> memory error. >>> >>> Unfortunately, I can't seem to create a small testcase. So this bug >>> report probably won't be very useful. >>> Running valgrind shows a couple of memory errors. The first is in >>> xmlFreeNode, when it attempts to get the dict from a doc that has been >>> freed. The node in question is created at line 327 of tasklist.py in >>> transcluder -- but the error comes later, during garbage collection. >> >> Could you send me the valgrind log? bzip2 is fine. > > It's small, so I attached it here. sadly, that doesn't tell me much. Also, I can't easily get your example to run, so I won't be able to test it. You appear to run a patched version of libxml2 (2.6.17, you said) as the line numbers from your valgrind trace don't match the sources. I can see that you are using iteration and you seem to be using threads. Threading is most likely required to reproduce this bug and the iteration is likely related, but I can't tell what happens here without reproducing it. Stefan From stefan_ml at behnel.de Sun Apr 22 20:41:37 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 22 Apr 2007 20:41:37 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? In-Reply-To: <20070416095901.169710@gmx.net> References: <20070416095901.169710@gmx.net> Message-ID: <462BAC61.8040109@behnel.de> Hi Holger, finally coming back to this. jholg at gmx.de wrote: > I just detected a problem with the xsi-types in objectify type registry > in that they are no QNames: > >>>> schematree = etree.fromstring(""" > ... > ... > ... > ... > ... > ... > ... > ... > ... > ... > ... """) >>>> schema = etree.XMLSchema(schematree) >>>> msg = etree.fromstring("""2387""") >>>> print schema.validate(msg) > 1 >>>> print objectify.dump(msg) > root = None [ObjectifiedElement] > s = 2387 [IntElement] > * xsi:type = 'xsd:string' > > Note that s is an IntElement wherease it should be a StringElement. > This goes away if changing its xsi:type to "string"; however, the doc > instance then isn't valid against the schema anymore: > >>>> msg = etree.fromstring("""2387""") >>>> print schema.validate(msg) > 0 >>>> print objectify.dump(msg) > root = None [ObjectifiedElement] > s = '2387' [StringElement] > * xsi:type = 'string' > > > Is it easily possible to use QNames in the xsi-type lookup system? I believe this would be the right thing to do, as lxml should be consistent. If XMLSchema handles it one way, objectify should handle it the same way. However, it is actually harder than you might think. In ET, namespaces use the Clark notation, but the standard requires prefixes here. Assuming that people always use "xsd" as prefix is error prone, so we'd have to look up the right prefix for each element when we store it and make sure the namespace is declared. We should definitely use the xsd prefix if we declare it internally, to make it less likely that users deploy the same prefix for a different namespace. More importantly, when we look up the type, we'd have to check for the namespace referenced by the prefix to make sure it's an XMLSchema type. Alternatively, we could switch to writing out the prefixed version internally and just ignore the prefix when figuring out the type. That would prevent people from using data types from other namespaces, but that's an unlikely use case anyway. If you want to do that, you can stick to registering a Python type. I wouldn't mind changing it to the prefixed version - as usual: better now than later. Changing this means that the typed XML that newer versions of objectify write out will not be read as expected by version 1.2. Sounds acceptable to me. I would like to hear other opinions on this before the release of 1.3, which will define the way this will be handled in the future. Stefan From scel at users.sourceforge.net Sun Apr 22 22:05:42 2007 From: scel at users.sourceforge.net (Torsten Rehn) Date: Sun, 22 Apr 2007 22:05:42 +0200 Subject: [lxml-dev] Bug in XPath evaluation Message-ID: <1177272342.7781.26.camel@gentop> Hi list, here's what I have: poc.xml: some text poc.py: #!/usr/bin/env python from lxml import etree DocTree = etree.parse("poc.xml") QueryResult = DocTree.xpath("//myns:mynode") The result (with added version info): [gentop][scel@/home/scel/workspace/lxmlbug] > ./poc.py lxml.etree: (1, 2, 1, 0) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 17) libxslt compiled: (1, 1, 17) Traceback (most recent call last): File "./poc.py", line 9, in ? QueryResult = DocTree.xpath("//myns:mynode") File "etree.pyx", line 1256, in etree._ElementTree.xpath File "xpath.pxi", line 75, in etree._XPathEvaluatorBase.evaluate File "xpath.pxi", line 212, in etree.XPathDocumentEvaluator.__call__ File "xpath.pxi", line 105, in etree._XPathEvaluatorBase._handle_result File "xpath.pxi", line 93, in etree._XPathEvaluatorBase._raise_parse_error etree.XPathSyntaxError: error in xpath expression The expression however, is valid (or I'm just insanely stupid). I tested the same query on the same data using http://dmag.upf.edu/contorsion/query.jsp and it worked just as it should. Strangely, //*[name()='myns:mynode'] works with lxml. Regards, Torsten -- Torsten Rehn -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: This is a digitally signed message part Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070422/5ed9597b/attachment.pgp From stefan_ml at behnel.de Mon Apr 23 08:24:51 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2007 08:24:51 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <1177272342.7781.26.camel@gentop> References: <1177272342.7781.26.camel@gentop> Message-ID: <462C5133.80006@behnel.de> Hi, Torsten Rehn wrote: > poc.xml: > > > > > some text > > > > poc.py: > > #!/usr/bin/env python > from lxml import etree > DocTree = etree.parse("poc.xml") > QueryResult = DocTree.xpath("//myns:mynode") You should pass the namespace-prefix mapping to lxml. See the docs on this topic: http://codespeak.net/lxml/dev/xpathxslt.html#xpath > The result (with added version info): > > [gentop][scel@/home/scel/workspace/lxmlbug] > ./poc.py > lxml.etree: (1, 2, 1, 0) > libxml used: (2, 6, 27) > libxml compiled: (2, 6, 27) > libxslt used: (1, 1, 17) > libxslt compiled: (1, 1, 17) > Traceback (most recent call last): > File "./poc.py", line 9, in ? > QueryResult = DocTree.xpath("//myns:mynode") > File "etree.pyx", line 1256, in etree._ElementTree.xpath > File "xpath.pxi", line 75, in etree._XPathEvaluatorBase.evaluate > File "xpath.pxi", line 212, in etree.XPathDocumentEvaluator.__call__ > File "xpath.pxi", line 105, in etree._XPathEvaluatorBase._handle_result > File "xpath.pxi", line 93, in etree._XPathEvaluatorBase._raise_parse_error > etree.XPathSyntaxError: error in xpath expression As expected. Undefined prefixes are invalid. Stefan From jholg at gmx.de Mon Apr 23 10:01:31 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 23 Apr 2007 10:01:31 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? In-Reply-To: <462BAC61.8040109@behnel.de> References: <20070416095901.169710@gmx.net> <462BAC61.8040109@behnel.de> Message-ID: <20070423080131.114650@gmx.net> Hi Stefan, > > Is it easily possible to use QNames in the xsi-type lookup system? > > I believe this would be the right thing to do, as lxml should be > consistent. > If XMLSchema handles it one way, objectify should handle it the same way. > > However, it is actually harder than you might think. In ET, namespaces use > the > Clark notation, but the standard requires prefixes here. Assuming that > people > always use "xsd" as prefix is error prone, so we'd have to look up the > right > prefix for each element when we store it and make sure the namespace is > declared. We should definitely use the xsd prefix if we declare it > internally, > to make it less likely that users deploy the same prefix for a different > namespace. > > More importantly, when we look up the type, we'd have to check for the > namespace referenced by the prefix to make sure it's an XMLSchema type. > > Alternatively, we could switch to writing out the prefixed version > internally > and just ignore the prefix when figuring out the type. That would prevent > people from using data types from other namespaces, but that's an unlikely > use > case anyway. If you want to do that, you can stick to registering a Python > type. I could certainly live with that for my application :-). > I wouldn't mind changing it to the prefixed version - as usual: better now > than later. Changing this means that the typed XML that newer versions of > objectify write out will not be read as expected by version 1.2. Sounds > acceptable to me. Again, this would work for me. I guess the objectified.Element() factory should then have the schema namespace added to its _DEFAULT_NSMAP, right? > I would like to hear other opinions on this before the release of 1.3, > which > will define the way this will be handled in the future. When parsing a document that declares the schema namespace, will the prefixed write-out be able to pick up this prefix, or will it always use "xsd"? Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From scel at users.sourceforge.net Mon Apr 23 17:54:34 2007 From: scel at users.sourceforge.net (Torsten Rehn) Date: Mon, 23 Apr 2007 17:54:34 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <462C5133.80006@behnel.de> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> Message-ID: <1177343674.7781.23.camel@gentop> On Mon, 2007-04-23 at 08:24 +0200, Stefan Behnel wrote: > You should pass the namespace-prefix mapping to lxml. See the docs on this topic: > > http://codespeak.net/lxml/dev/xpathxslt.html#xpath Ah, looking at the development version's page obviously helps ;) > > etree.XPathSyntaxError: error in xpath expression > As expected. Undefined prefixes are invalid. But it is valid XPath 1.0, isn't it? I'm just a little confused by the term "XPath Syntax Error". As far as I understand the issue, the problem is not with the syntax but with lxml (or whatever lies beneath) not supporting some of it (which is ok with the W3C recommendation). I'm making that much of a problem out of it because my app processes XML documents that use namespaces quite extensively. And these namespaces may be different for every XML doc that comes along, so I would have to scan the file for xmlns attributes first (and then call the .xpath() method with the second argument as described on the page you posted), which is kind of ugly in my opinion. In my specific scenario it is a lot harder to get the namespace URI than to get the namespace prefix. Is there a good reason I am overlooking or why can I use name() in a predicate to find my node without the URI, but cannot use the better looking abbreviated syntax without an explicit predicate? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: This is a digitally signed message part Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070423/1356625c/attachment.pgp From stefan_ml at behnel.de Mon Apr 23 18:09:36 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2007 18:09:36 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <1177343674.7781.23.camel@gentop> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> Message-ID: <462CDA40.1070803@behnel.de> Hi, Torsten Rehn wrote: > On Mon, 2007-04-23 at 08:24 +0200, Stefan Behnel wrote: >> You should pass the namespace-prefix mapping to lxml. See the docs on this topic: >> >> http://codespeak.net/lxml/dev/xpathxslt.html#xpath > > Ah, looking at the development version's page obviously helps ;) Actually it's reading the documentation which helps: http://codespeak.net/lxml/api.html#xpath It's been in there for at least a year. >>> etree.XPathSyntaxError: error in xpath expression >> As expected. Undefined prefixes are invalid. > > But it is valid XPath 1.0, isn't it? I'm just a little confused by the > term "XPath Syntax Error". As far as I understand the issue, the problem > is not with the syntax but with lxml (or whatever lies beneath) not > supporting some of it (which is ok with the W3C recommendation). > I'm making that much of a problem out of it because my app processes XML > documents that use namespaces quite extensively. And these namespaces > may be different for every XML doc that comes along, so I would have to > scan the file for xmlns attributes first (and then call the .xpath() > method with the second argument as described on the page you posted), So you're really ignoring the namespace and just looking at the prefix? That's definitely an unusual use case. What's the use in accepting any namespace in an XPath expression as long as the prefix is the same? I mean, honestly, the prefix doesn't tell you anything, right? Stefan From faassen at startifact.com Mon Apr 23 22:06:09 2007 From: faassen at startifact.com (Martijn Faassen) Date: Mon, 23 Apr 2007 22:06:09 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <1177343674.7781.23.camel@gentop> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> Message-ID: Torsten Rehn wrote: > On Mon, 2007-04-23 at 08:24 +0200, Stefan Behnel wrote: >> You should pass the namespace-prefix mapping to lxml. See the docs on this topic: >> >> http://codespeak.net/lxml/dev/xpathxslt.html#xpath > > Ah, looking at the development version's page obviously helps ;) > >>> etree.XPathSyntaxError: error in xpath expression >> As expected. Undefined prefixes are invalid. > > But it is valid XPath 1.0, isn't it? I'm just a little confused by the > term "XPath Syntax Error". As far as I understand the issue, the problem > is not with the syntax but with lxml (or whatever lies beneath) not > supporting some of it (which is ok with the W3C recommendation). I think it is indeed confusing we call it an XPath Syntax Error. The xpath expression is indeed correct, we just haven't supplied it with enough information. I wonder if there's a way we can detect this specific problem and raise something like an XPathNamespaceError instead? I think this one bites people quite frequently, as people often forget that the prefixes in XPath are not looked up in the document but is independent, just like the prefixes between documents are independent. Regards, Martijn From faassen at startifact.com Mon Apr 23 22:13:54 2007 From: faassen at startifact.com (Martijn Faassen) Date: Mon, 23 Apr 2007 22:13:54 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <462CDA40.1070803@behnel.de> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> <462CDA40.1070803@behnel.de> Message-ID: Hey, Stefan Behnel wrote: [Torsten Rehn] >> I'm making that much of a problem out of it because my app processes XML >> documents that use namespaces quite extensively. And these namespaces >> may be different for every XML doc that comes along, so I would have to >> scan the file for xmlns attributes first (and then call the .xpath() >> method with the second argument as described on the page you posted), Unfortunately any lxml implementation of this behavior would have to do the same internally, so this is not an easy one to implement. > So you're really ignoring the namespace and just looking at the prefix? That's > definitely an unusual use case. Agreed, that is indeed odd. Makes me want to find out more. :) You have documents that use namespaces extensively, but they vary widely in the kinds of namespace URIs they use for the same prefixes? How did you arrive in such a situation? > What's the use in accepting any namespace in an XPath expression as long as > the prefix is the same? I mean, honestly, the prefix doesn't tell you > anything, right? To make sure Torsten understands, ignoring the prefixes and looking at namespace URIs *is* the proper behavior for XML software. The prefixes are nothing but a shortcut, a temporary name, to refer to the namespace URI. This leads to confusion, and is why the ElementTree API in fact includes the whole namespace URI in the element names instead: "{http://mynamespace}foo" ("Clarke notation") ElementTree is rather strict in ignoring the prefixes entirely, which can be a bit frustrating if you are interested in the presentation of the XML document in the end. lxml follows ElementTree but offers various ways to do things with prefix. Unfortunately in xpath the compromise is to use prefixes only to spell out the XPath expression, as using the full qualified names would not be XPath compatible. Occasionally we've had some discussions about offering an API to do XPath queries using Clarke notation. Regards, Martijn From stefan_ml at behnel.de Tue Apr 24 08:20:37 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 24 Apr 2007 08:20:37 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> <462CDA40.1070803@behnel.de> Message-ID: <462DA1B5.3000903@behnel.de> Hi Martijn, just a quick note here. Martijn Faassen wrote: > full qualified names would not be XPath compatible. Occasionally we've > had some discussions about offering an API to do XPath queries using > Clarke notation. >>> from lxml import etree >>> root = etree.Element("{testns}root") >>> etree.SubElement(root, "{testns}test") >>> find = ETXPath("{testns}test") >>> find(root) [] I guess that's actually still missing from the docs - it's been in there for a while... Stefan From gary at zope.com Tue Apr 24 13:43:09 2007 From: gary at zope.com (Gary Poster) Date: Tue, 24 Apr 2007 07:43:09 -0400 Subject: [lxml-dev] lxml Mac OS X probs? Message-ID: Hi all. I saw here http://www.openplans.org/projects/bbq-sprint/nudgenudge the following text at the bottom: """Note that the Deliverance middleware requires lxml to do the theming which is known to have problems on certain platforms, e.g. Mac OS X.""" Googling for such only found problems in 2005 and 2006, and I didn't see an obvious reference to these on the lxml main page or FAQ. Can anyone give me an idea of why that comment might have been made, and what the current issues are, if any? Pointing to a web page would be fine... Thanks Gary From stefan_ml at behnel.de Tue Apr 24 14:39:52 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 24 Apr 2007 14:39:52 +0200 Subject: [lxml-dev] lxml Mac OS X probs? In-Reply-To: References: Message-ID: <462DFA98.7060608@behnel.de> Gary Poster wrote: > http://www.openplans.org/projects/bbq-sprint/nudgenudge > > the following text at the bottom: > > """Note that the Deliverance middleware requires lxml to do the > theming which is known to have problems on certain platforms, e.g. > Mac OS X.""" I am not aware of any problems with lxml on any platform. And I would also like to have such a statement clarified. Stefan From faassen at startifact.com Tue Apr 24 14:50:01 2007 From: faassen at startifact.com (Martijn Faassen) Date: Tue, 24 Apr 2007 14:50:01 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <462DA1B5.3000903@behnel.de> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> <462CDA40.1070803@behnel.de> <462DA1B5.3000903@behnel.de> Message-ID: <8928d4e90704240550x3cdd58fapaed7ddaaa334b0e7@mail.gmail.com> Hey, On 4/24/07, Stefan Behnel wrote: > just a quick note here. > > Martijn Faassen wrote: > > full qualified names would not be XPath compatible. Occasionally we've > > had some discussions about offering an API to do XPath queries using > > Clarke notation. > > >>> from lxml import etree > >>> root = etree.Element("{testns}root") > >>> etree.SubElement(root, "{testns}test") > > > >>> find = ETXPath("{testns}test") > >>> find(root) > [] > > I guess that's actually still missing from the docs - it's been in there for a > while... Yeah. I remember discussions on this, but I didn't remember it getting implemented. Cool! The docs still need tender loving care from a dedicated volunteer, and that shouldn't be you. Nobody can give the excuse that they don't know Pyrex here either, so we should have masses of volunteers standing up to contribute. :) Regards, Martijn From faassen at startifact.com Tue Apr 24 15:02:10 2007 From: faassen at startifact.com (Martijn Faassen) Date: Tue, 24 Apr 2007 15:02:10 +0200 Subject: [lxml-dev] lxml and binary eggs on Linux Message-ID: Hi there, In the past we've been in the habit of providing binary eggs for lxml on Linux. We've been less diligent about this recently, which is actually a good thing. I would in fact ask everybody to stop uploading binary eggs for Linux, and only do so for Windows. Why? Python interpreters on the Linux world are compiled with different options. Prominent here is UCS2 versus UCS4 for the internal unicode encoding. An egg compiled for a UCS4 python doesn't work on a UCS2 python and vice versa. There are other potential issues, such as the location of various shared libraries that might differ per platform. Uploading a binary egg means that we risk making life worse for some users, as they'll be stuck with a non-working egg. If we only upload the source (including the generated C code), lxml will compile and install itself and this should be reliable on all Linux boxes. This does however mean that people need to install the libxml2 and libxslt headers on their system (libxml2-dev and libxstl-dev on debian/ubuntu), otherwise the compile would fail. It would also mean we need to modify our installation instructions. Unfortunately I don't see any other way to make lxml installation more reliable on Linux, though. On Windows, because nobody has a compiler and the platform is more uniform (practically everybody runs the same compiled version of Python), we don't have this problem. In fact we have the problem that nobody has a compiler, so we certainly need to continue uploading the binary eggs. Comments? Regards, Martijn From gary at zope.com Tue Apr 24 15:54:44 2007 From: gary at zope.com (Gary Poster) Date: Tue, 24 Apr 2007 09:54:44 -0400 Subject: [lxml-dev] lxml Mac OS X probs? In-Reply-To: <462DFA98.7060608@behnel.de> References: <462DFA98.7060608@behnel.de> Message-ID: <7515E5B4-06DF-4624-B62D-E07E62A0F996@zope.com> On Apr 24, 2007, at 8:39 AM, Stefan Behnel wrote: > > > Gary Poster wrote: >> http://www.openplans.org/projects/bbq-sprint/nudgenudge >> >> the following text at the bottom: >> >> """Note that the Deliverance middleware requires lxml to do the >> theming which is known to have problems on certain platforms, e.g. >> Mac OS X.""" > > I am not aware of any problems with lxml on any platform. > > And I would also like to have such a statement clarified. Thank you Stefan. I suppose this link would be the place to do that (http://www.openplans.org/projects/bbq-sprint/ contact_project_admins), but your reply is good enough for me ATM. Gary From ltucker at openplans.org Tue Apr 24 16:58:54 2007 From: ltucker at openplans.org (Luke Tucker) Date: Tue, 24 Apr 2007 10:58:54 -0400 Subject: [lxml-dev] lxml Mac OS X probs? In-Reply-To: <462DFA98.7060608@behnel.de> References: <462DFA98.7060608@behnel.de> Message-ID: <1177426734.4049.184.camel@ltucker.openplans.org> My guess is this refers to the fact that Deliverance is known to segfault on OS X out of the box. This segfault occurs when calling lxml. (I believe this can be reproduced by running the tests) >From what we've gathered, this appears to be mainly related to troublesome versions of libxml2 and libxslt that are installed on many OS X boxes. These problems do not appear to happen on other platforms, or on OS X for those who have installed later versions of these libraries. - Luke On Tue, 2007-04-24 at 14:39 +0200, Stefan Behnel wrote: > > Gary Poster wrote: > > http://www.openplans.org/projects/bbq-sprint/nudgenudge > > > > the following text at the bottom: > > > > """Note that the Deliverance middleware requires lxml to do the > > theming which is known to have problems on certain platforms, e.g. > > Mac OS X.""" > > I am not aware of any problems with lxml on any platform. > > And I would also like to have such a statement clarified. > > Stefan > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev From scel at users.sourceforge.net Tue Apr 24 17:19:07 2007 From: scel at users.sourceforge.net (Torsten Rehn) Date: Tue, 24 Apr 2007 17:19:07 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> <462CDA40.1070803@behnel.de> Message-ID: <1177427947.7804.66.camel@gentop> On Mon, 2007-04-23 at 22:13 +0200, Martijn Faassen wrote: > > So you're really ignoring the namespace and just looking at the prefix? That's > > definitely an unusual use case. > > Agreed, that is indeed odd. Makes me want to find out more. :) You have > documents that use namespaces extensively, but they vary widely in the > kinds of namespace URIs they use for the same prefixes? How did you > arrive in such a situation? I think we got a slight misunderstanding here. In my situation, each prefix belongs to exactly one namespace. Here's an example of what I'd like to do: Let's say there is a store that has both a print catalogue and an online shop. For whatever reason (this is a very stupid example) we want some of the items being sold to appear in the print catalogue and some others in the eshop. Here is the XML data that describes the items we sell: TurboItem 23 SuperItem 42 Now I want some way to "tag" each item either for print or eshop. But (and here's the twist: without altering the structure of the XML data. That means that I can't add an attribute to each element or "encapsulate" the items like this: ... ... However, adding namespace prefixes (and their xmlns definitions) is acceptable. If it had worked the way I intended it to in the beginning, the XPath expression "//print:item" would have returned all items that go into the print catalogue. Now why do I want to avoid using the namespace URIs in the expression? In what I'm actually up to, there are a lot more options than just print and eshop. It shall be easy for users to handle a larger amount of these "options" and requiring users to write out namespace-uris just isn't convenient. Prefixes, however, are. The only solution I see right now is to scan the XML data prior to the XPath query in order to map each prefix to its namespace-uri. I do understand now that this is such an exotic use case that it wouldn't make much sense to have lxml do these mappings automatically if the second argument of .xpath() is omitted. The reason I gave this rather lengthy example was to find out if anyone reading this has an idea of an alternative solution for my problem (applying metadata to specific parts of an XML document without making the XPath expressions to address these parts too complex). Regards, Torsten -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: This is a digitally signed message part Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070424/234f64c8/attachment-0001.pgp From jholg at gmx.de Tue Apr 24 18:18:53 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Tue, 24 Apr 2007 18:18:53 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <1177427947.7804.66.camel@gentop> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> <462CDA40.1070803@behnel.de> <1177427947.7804.66.camel@gentop> Message-ID: <20070424161853.232180@gmx.net> Hi, > The only solution I see right now is to scan the XML data prior to the > XPath query in order to map each prefix to its namespace-uri. > I do understand now that this is such an exotic use case that it > wouldn't make much sense to have lxml do these mappings automatically if > the second argument of .xpath() is omitted. > The reason I gave this rather lengthy example was to find out if anyone > reading this has an idea of an alternative solution for my problem > (applying metadata to specific parts of an XML document without making > the XPath expressions to address these parts too complex). Might be you can take advantage of nsmap (don't get confused by the result output, I'm using the lxml.objectify notion)? >>> root = etree.fromstring(""" ... ... 1 ... 1.2 ... 1.2 ... 1 ... 2 ... 2 ... what ... is ... this ... good ... for? ... ... 2006/08/09 13:19:01.000000+02:00 ... from another namespace ... ... ... ... 387.38 ... ... ... ... ... ... ... 387.38 ... ... ... ... ... ... ... 387.38 ... ... ... ... ... """) >>> prefixDict = dict(root.nsmap) >>> del prefixDict[None] >>> prefixDict[''] = root.nsmap[None] >>> print etree.XPath('//other:x', prefixDict)(root) [Decimal("387.38"), Decimal("387.38"), Decimal("387.38")] What's not so nice is that nsmap uses None for the empty prefix whereas XPath seems to expect an empty string in the prefix-URI-dict. Plus I'm not sure if you can simply use the root element nsmap, as I did here. Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From scel at users.sourceforge.net Tue Apr 24 18:40:49 2007 From: scel at users.sourceforge.net (Torsten Rehn) Date: Tue, 24 Apr 2007 18:40:49 +0200 Subject: [lxml-dev] Bug in XPath evaluation - not a bug :) In-Reply-To: <20070424161853.232180@gmx.net> References: <1177272342.7781.26.camel@gentop> <462C5133.80006@behnel.de> <1177343674.7781.23.camel@gentop> <462CDA40.1070803@behnel.de> <1177427947.7804.66.camel@gentop> <20070424161853.232180@gmx.net> Message-ID: <1177432849.9124.4.camel@gentop> I'll look into that, but it seems as if it were just what I've been looking for. Thank you :) Torsten -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: This is a digitally signed message part Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070424/450f3f62/attachment.pgp From philipp at weitershausen.de Tue Apr 24 22:04:50 2007 From: philipp at weitershausen.de (Philipp von Weitershausen) Date: Tue, 24 Apr 2007 22:04:50 +0200 Subject: [lxml-dev] lxml Mac OS X probs? In-Reply-To: <7515E5B4-06DF-4624-B62D-E07E62A0F996@zope.com> References: <462DFA98.7060608@behnel.de> <7515E5B4-06DF-4624-B62D-E07E62A0F996@zope.com> Message-ID: <462E62E2.2020707@weitershausen.de> Gary Poster wrote: > On Apr 24, 2007, at 8:39 AM, Stefan Behnel wrote: > >> >> Gary Poster wrote: >>> http://www.openplans.org/projects/bbq-sprint/nudgenudge >>> >>> the following text at the bottom: >>> >>> """Note that the Deliverance middleware requires lxml to do the >>> theming which is known to have problems on certain platforms, e.g. >>> Mac OS X.""" >> I am not aware of any problems with lxml on any platform. >> >> And I would also like to have such a statement clarified. > > Thank you Stefan. I suppose this link would be the place to do that > (http://www.openplans.org/projects/bbq-sprint/ > contact_project_admins), but your reply is good enough for me ATM. Initially, lxml would segfault for me. NudgeNudge being a toy project, I didn't have the time to investigate further and it worked on Linux. I was sprinting with Ian Bicking on this project and when I got the OSX segfault, he told me that it was a common thing to occur on that platform. I assumed it was a known issue. Since a few weeks ago, I changed the setup of the application (Gary: from zope.app.twisted to paste.deploy served) which magically made the problem go away on OSX... -- http://worldcookery.com -- Professional Zope documentation and training From fairwinds at eastlink.ca Wed Apr 25 22:48:09 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Wed, 25 Apr 2007 17:48:09 -0300 Subject: [lxml-dev] Building lxml on PPC Mac 10.4 Message-ID: <462FBE89.1030805@eastlink.ca> Hi. I am having trouble building lxml on PPC Mac with my buildouts and also easy_setup. It seems to build with warnings and killed my python as soon as it was accessed. The following is what happens with the build. I compiled and ran previously on PPC python without trouble. I am now running a Universally build python 2.4.4 with OSX 10.4.9 on a PPC. Many thanks. Regards, David Buildout build ============== zc.buildout.easy_install: Getting new distribution for lxml Building lxml version 1.3.beta warning: no previously-included files found matching 'doc/pyrex.txt' warning: no previously-included files found matching 'src/lxml/etree.pxi' /usr/bin/ld: for architecture i386 /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: for architecture ppc /usr/bin/ld: warning can't open dynamic library: /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib referenced from: /opt/local/lib/libxslt.dylib (checking for undefined symbols may be affected) (No such file or directory, errno = 2) /usr/bin/ld: for architecture i386 /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: for architecture ppc /usr/bin/ld: warning can't open dynamic library: /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib referenced from: /opt/local/lib/libxslt.dylib (checking for undefined symbols may be affected) (No such file or directory, errno = 2) zc.buildout.easy_install: Got lxml 1.3beta An easy_setup build =================== Searching for lxml Reading http://cheeseshop.python.org/pypi/lxml/ Reading http://cheeseshop.python.org/pypi/lxml/1.3beta Reading http://codespeak.net/lxml Reading http://cheeseshop.python.org/pypi/lxml/1.2.1 Best match: lxml 1.3beta Downloading http://cheeseshop.python.org/packages/source/l/lxml/lxml-1.3beta.tar.gz Processing lxml-1.3beta.tar.gz Running lxml-1.3beta/setup.py -q bdist_egg --dist-dir /tmp/easy_install-uCUEox/lxml-1.3beta/egg-dist-tmp-tOf7Pb Building lxml version 1.3.beta warning: no previously-included files found matching 'doc/pyrex.txt' warning: no previously-included files found matching 'src/lxml/etree.pxi' /usr/bin/ld: for architecture ppc /usr/bin/ld: warning can't open dynamic library: /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib referenced from: /opt/local/lib/libxslt.dylib (checking for undefined symbols may be affected) (No such file or directory, errno = 2) /usr/bin/ld: for architecture i386 /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: for architecture ppc /usr/bin/ld: warning can't open dynamic library: /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib referenced from: /opt/local/lib/libxslt.dylib (checking for undefined symbols may be affected) (No such file or directory, errno = 2) /usr/bin/ld: for architecture i386 /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture ppc) does not match cputype (7) for specified -arch flag: i386 (file not loaded) Adding lxml 1.3beta to easy-install.pth file Installed /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/lxml-1.3beta-py2.4-macosx-10.3-fat.egg Processing dependencies for lxml From my logs: ============ Apr 25 14:32:05 Mac-PG crashdump[27220]: Python crashed Apr 25 14:32:15 Mac-PG crashdump[27220]: crash report written to: /Users/davidpratt/Library/Logs/CrashReporter/Python.crash.log From fairwinds at eastlink.ca Wed Apr 25 23:42:20 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Wed, 25 Apr 2007 18:42:20 -0300 Subject: [lxml-dev] Building lxml on PPC Mac 10.4 In-Reply-To: <462FBE89.1030805@eastlink.ca> References: <462FBE89.1030805@eastlink.ca> Message-ID: <462FCB3C.8040009@eastlink.ca> Here are some further details of my system that may be helpful in diagnosing this. Many thanks. Regards, David ======== OSX: 10.4.9 gcc: powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367) libxml2 @2.6.27_0 (active) - from mac ports libxslt @1.1.20_0 (active) - from mac ports python: Python 2.4.4 (#1, Oct 18 2006, 10:34:39) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Python from python.org using dmg installer and universal built for OSX 10.3 + David Pratt wrote: > Hi. I am having trouble building lxml on PPC Mac with my buildouts and > also easy_setup. It seems to build with warnings and killed my python as > soon as it was accessed. The following is what happens with the build. I > compiled and ran previously on PPC python without trouble. I am now > running a Universally build python 2.4.4 with OSX 10.4.9 on a PPC. Many > thanks. > > Regards, > David > > Buildout build > ============== > > zc.buildout.easy_install: Getting new distribution for lxml > Building lxml version 1.3.beta > warning: no previously-included files found matching 'doc/pyrex.txt' > warning: no previously-included files found matching 'src/lxml/etree.pxi' > /usr/bin/ld: for architecture i386 > /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture > ppc) does not match cputype (7) for specified -arch flag: i386 (file not > loaded) > /usr/bin/ld: for architecture ppc > /usr/bin/ld: warning can't open dynamic library: > /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib > referenced from: /opt/local/lib/libxslt.dylib (checking for undefined > symbols may be affected) (No such file or directory, errno = 2) > /usr/bin/ld: for architecture i386 > /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture > ppc) does not match cputype (7) for specified -arch flag: i386 (file not > loaded) > /usr/bin/ld: for architecture ppc > /usr/bin/ld: warning can't open dynamic library: > /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib > referenced from: /opt/local/lib/libxslt.dylib (checking for undefined > symbols may be affected) (No such file or directory, errno = 2) > zc.buildout.easy_install: Got lxml 1.3beta > > > An easy_setup build > =================== > > Searching for lxml > Reading http://cheeseshop.python.org/pypi/lxml/ > Reading http://cheeseshop.python.org/pypi/lxml/1.3beta > Reading http://codespeak.net/lxml > Reading http://cheeseshop.python.org/pypi/lxml/1.2.1 > Best match: lxml 1.3beta > Downloading > http://cheeseshop.python.org/packages/source/l/lxml/lxml-1.3beta.tar.gz > Processing lxml-1.3beta.tar.gz > Running lxml-1.3beta/setup.py -q bdist_egg --dist-dir > /tmp/easy_install-uCUEox/lxml-1.3beta/egg-dist-tmp-tOf7Pb > Building lxml version 1.3.beta > warning: no previously-included files found matching 'doc/pyrex.txt' > warning: no previously-included files found matching 'src/lxml/etree.pxi' > /usr/bin/ld: for architecture ppc > /usr/bin/ld: warning can't open dynamic library: > /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib > referenced from: /opt/local/lib/libxslt.dylib (checking for undefined > symbols may be affected) (No such file or directory, errno = 2) > /usr/bin/ld: for architecture i386 > /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture > ppc) does not match cputype (7) for specified -arch flag: i386 (file not > loaded) > /usr/bin/ld: for architecture ppc > /usr/bin/ld: warning can't open dynamic library: > /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib > referenced from: /opt/local/lib/libxslt.dylib (checking for undefined > symbols may be affected) (No such file or directory, errno = 2) > /usr/bin/ld: for architecture i386 > /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, > architecture ppc) does not match cputype (7) for specified -arch flag: > i386 (file not loaded) > /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, architecture > ppc) does not match cputype (7) for specified -arch flag: i386 (file not > loaded) > Adding lxml 1.3beta to easy-install.pth file > > Installed > /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/lxml-1.3beta-py2.4-macosx-10.3-fat.egg > Processing dependencies for lxml > > From my logs: > ============ > > Apr 25 14:32:05 Mac-PG crashdump[27220]: Python crashed > Apr 25 14:32:15 Mac-PG crashdump[27220]: crash report written to: > /Users/davidpratt/Library/Logs/CrashReporter/Python.crash.log > > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From fairwinds at eastlink.ca Thu Apr 26 05:29:23 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Thu, 26 Apr 2007 00:29:23 -0300 Subject: [lxml-dev] Building lxml on PPC Mac 10.4 In-Reply-To: <462FCB3C.8040009@eastlink.ca> References: <462FBE89.1030805@eastlink.ca> <462FCB3C.8040009@eastlink.ca> Message-ID: <46301C93.4020308@eastlink.ca> Solved the problem with the build by removing mac ports version of libxml2 and libxslt and using mac defaults. Many thanks. Regards, David David Pratt wrote: > Here are some further details of my system that may be helpful in > diagnosing this. Many thanks. > > Regards, > David > > ======== > > OSX: 10.4.9 > gcc: powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. > build 5367) > libxml2 @2.6.27_0 (active) - from mac ports > libxslt @1.1.20_0 (active) - from mac ports > python: > Python 2.4.4 (#1, Oct 18 2006, 10:34:39) > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin > > Python from python.org using dmg installer and universal built for OSX > 10.3 + > > > > David Pratt wrote: >> Hi. I am having trouble building lxml on PPC Mac with my buildouts and >> also easy_setup. It seems to build with warnings and killed my python >> as soon as it was accessed. The following is what happens with the >> build. I compiled and ran previously on PPC python without trouble. I >> am now running a Universally build python 2.4.4 with OSX 10.4.9 on a >> PPC. Many thanks. >> >> Regards, >> David >> >> Buildout build >> ============== >> >> zc.buildout.easy_install: Getting new distribution for lxml >> Building lxml version 1.3.beta >> warning: no previously-included files found matching 'doc/pyrex.txt' >> warning: no previously-included files found matching 'src/lxml/etree.pxi' >> /usr/bin/ld: for architecture i386 >> /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: for architecture ppc >> /usr/bin/ld: warning can't open dynamic library: >> /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib >> referenced from: /opt/local/lib/libxslt.dylib (checking for undefined >> symbols may be affected) (No such file or directory, errno = 2) >> /usr/bin/ld: for architecture i386 >> /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: for architecture ppc >> /usr/bin/ld: warning can't open dynamic library: >> /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib >> referenced from: /opt/local/lib/libxslt.dylib (checking for undefined >> symbols may be affected) (No such file or directory, errno = 2) >> zc.buildout.easy_install: Got lxml 1.3beta >> >> >> An easy_setup build >> =================== >> >> Searching for lxml >> Reading http://cheeseshop.python.org/pypi/lxml/ >> Reading http://cheeseshop.python.org/pypi/lxml/1.3beta >> Reading http://codespeak.net/lxml >> Reading http://cheeseshop.python.org/pypi/lxml/1.2.1 >> Best match: lxml 1.3beta >> Downloading >> http://cheeseshop.python.org/packages/source/l/lxml/lxml-1.3beta.tar.gz >> Processing lxml-1.3beta.tar.gz >> Running lxml-1.3beta/setup.py -q bdist_egg --dist-dir >> /tmp/easy_install-uCUEox/lxml-1.3beta/egg-dist-tmp-tOf7Pb >> Building lxml version 1.3.beta >> warning: no previously-included files found matching 'doc/pyrex.txt' >> warning: no previously-included files found matching 'src/lxml/etree.pxi' >> /usr/bin/ld: for architecture ppc >> /usr/bin/ld: warning can't open dynamic library: >> /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib >> referenced from: /opt/local/lib/libxslt.dylib (checking for undefined >> symbols may be affected) (No such file or directory, errno = 2) >> /usr/bin/ld: for architecture i386 >> /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: for architecture ppc >> /usr/bin/ld: warning can't open dynamic library: >> /Developer/SDKs/MacOSX10.4u.sdk/opt/local/lib/libiconv.2.dylib >> referenced from: /opt/local/lib/libxslt.dylib (checking for undefined >> symbols may be affected) (No such file or directory, errno = 2) >> /usr/bin/ld: for architecture i386 >> /usr/bin/ld: warning /opt/local/lib/libxslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libexslt.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libxml2.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> /usr/bin/ld: warning /opt/local/lib/libz.dylib cputype (18, >> architecture ppc) does not match cputype (7) for specified -arch flag: >> i386 (file not loaded) >> Adding lxml 1.3beta to easy-install.pth file >> >> Installed >> /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/lxml-1.3beta-py2.4-macosx-10.3-fat.egg >> >> Processing dependencies for lxml >> >> From my logs: >> ============ >> >> Apr 25 14:32:05 Mac-PG crashdump[27220]: Python crashed >> Apr 25 14:32:15 Mac-PG crashdump[27220]: crash report written to: >> /Users/davidpratt/Library/Logs/CrashReporter/Python.crash.log >> >> >> _______________________________________________ >> lxml-dev mailing list >> lxml-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/lxml-dev >> > From stefan_ml at behnel.de Thu Apr 26 20:53:12 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 26 Apr 2007 20:53:12 +0200 Subject: [lxml-dev] lxml and binary eggs on Linux In-Reply-To: References: Message-ID: <4630F518.8040408@behnel.de> Hi Martijn, Martijn Faassen wrote: > In the past we've been in the habit of providing binary eggs for lxml on > Linux. We've been less diligent about this recently, which is actually a > good thing. I would in fact ask everybody to stop uploading binary eggs > for Linux, and only do so for Windows. I think this makes sense. While Linux is definitely not a straight forward platform for binaries, it's a rather uniform platform for source installations (as long as we don't require the most recent dependency versions installed). And we shouldn't forget that Debian and related distributions come with ready-to-install versions of lxml well integrated into their package management system. Any volunteers for a rewrite of build.txt? Stefan From stefan_ml at behnel.de Thu Apr 26 22:19:14 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 26 Apr 2007 22:19:14 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 Message-ID: <46310942.7030001@behnel.de> Hi all, lxml 1.3 is nearing completion. There were some major changes under the hood, but the most visible part of the new release is actually the new layout of the documentation site, which should make it much more accessible. As usual, the preview is here: http://codespeak.net/lxml/dev/ Some of you have mentioned their impression that it's hard to help out on lxml as it's written in Pyrex, not Python. Although the current code looks very C-ish in many places, this is more of a performance optimisation than a real requirement. Pyrex actually makes it possible to work on the code in a very Python-like style, and to make the C-ification a matter of later improvement. So Python(-like) implementations of new features are definitely welcome. A non-optimised implementation of an interesting feature is much better than the lack of this feature would be. So, everyone is invited to get involved in making the code even better than it is today. But there is another area where help is appreciated. A very important area in fact: *documentation*. While there is quite a bit of documentation both on ElementTree and lxml, there are certainly places where lxml's API and its way of doing XML are hard to access, especially for new users and those who have a fixed (should I say: Java-ish?) mindset on XML. If you want to contribute, helping out in this area is warmly appreciated. Here are a few ideas that would be truely helpful for lxml's user base. * I would love to see lxml's own tutorial that gets the main ideas and the most useful features across without caring too much about ElementTree (which already has a tutorial). * Some statistics: what /are/ the most useful features of lxml? What do people like or use most? What parts of lxml should be more accessible? Which parts are so well done that people grasp their usage immediately (and should therefore be promoted as an eye-catcher)? * We could benefit from a Wiki where users could contribute code examples, best practices, work-arounds or tool snippets. We should also start linking to external pages, blogs, presentations on lxml or ElementTree that others might find interesting. Obviously, this list is not complete, so if you want to contribute, I hope you will easily find places to do so. Please help us in making lxml 1.3 the best release ever - and the most accessible one! Have fun, Stefan From aguilar.roger at hotmail.com Thu Apr 26 23:14:05 2007 From: aguilar.roger at hotmail.com (=?iso-8859-1?B?UvNnZXIgQWd1aWxhcg==?=) Date: Thu, 26 Apr 2007 15:14:05 -0600 Subject: [lxml-dev] Error installing lxml Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070426/254d198b/attachment.htm From matthew at linuxfromscratch.org Thu Apr 26 23:21:17 2007 From: matthew at linuxfromscratch.org (Matthew Burgess) Date: Thu, 26 Apr 2007 22:21:17 +0100 Subject: [lxml-dev] Error installing lxml In-Reply-To: References: Message-ID: <200704262221.17667.matthew@linuxfromscratch.org> On Thursday 26 April 2007 22:14, R?ger Aguilar wrote: > The data provider has a installation script that should works fine, BUT when tries > to install lxml I get this error: [root at ocaria build]# > > src/lxml/etree.c:33:28: libxml/xmlsave.h: No such file or directory > src/lxml/etree.c:35:30: libxml/xmlstring.h: No such file or directory You need to install the libxml2 development package. On my system, Kubuntu, this is called libxml2-dev. Regards, Matt. From fairwinds at eastlink.ca Thu Apr 26 23:24:04 2007 From: fairwinds at eastlink.ca (David Pratt) Date: Thu, 26 Apr 2007 18:24:04 -0300 Subject: [lxml-dev] Error installing lxml In-Reply-To: References: Message-ID: <46311874.4050408@eastlink.ca> Hi Roger. lxml requires libxml2 and libxslt be installed before you can perform an easy_install. You will need install these packages on your system. Regards, David R?ger Aguilar wrote: > Hi, my name is Roger and i work in a scientific institute, I am a linux > newbie. > > I was trying to install a data provider that uses lxml. The data > provider has a installation script that should works fine, BUT when > tries to install lxml I get this error: > > [root at ocaria build]# ../python/bin/easy_install lxml > Searching for lxml > Reading http://cheeseshop.python.org/pypi/lxml/ > Reading http://cheeseshop.python.org/pypi/lxml/1.3beta > Reading http://codespeak.net/lxml > Reading http://cheeseshop.python.org/pypi/lxml/1.2.1 > Best match: lxml 1.3beta > Downloading > http://cheeseshop.python.org/packages/source/l/lxml/lxml-1.3beta.tar.gz > Processing lxml-1.3beta.tar.gz > Running lxml-1.3beta/setup.py -q bdist_egg --dist-dir > /tmp/easy_install-rV9hqn/lxml-1.3beta/egg-dist-tmp--d63_J > Building lxml version 1.3.beta > warning: no previously-included files found matching 'doc/pyrex.txt' > warning: no previously-included files found matching 'src/lxml/etree.pxi' > src/lxml/etree.c:33:28: libxml/xmlsave.h: No such file or directory > src/lxml/etree.c:35:30: libxml/xmlstring.h: No such file or directory > src/lxml/etree.c:46:26: libxslt/xslt.h: No such file or directory > src/lxml/etree.c:47:32: libxslt/xsltconfig.h: No such file or directory > src/lxml/etree.c:48:35: libxslt/xsltInternals.h: No such file or directory > src/lxml/etree.c:49:32: libxslt/extensions.h: No such file or directory > src/lxml/etree.c:50:31: libxslt/documents.h: No such file or directory > src/lxml/etree.c:51:31: libxslt/transform.h: No such file or directory > src/lxml/etree.c:52:31: libxslt/xsltutils.h: No such file or directory > src/lxml/etree.c:53:30: libxslt/security.h: No such file or directory > src/lxml/etree.c:54:27: libxslt/extra.h: No such file or directory > src/lxml/etree.c:55:28: libexslt/exslt.h: No such file or directory > src/lxml/etree.c:421: syntax error before "xmlError" > src/lxml/etree.c:423: syntax error before '}' token > src/lxml/etree.c:434: syntax error before "xmlError" > src/lxml/etree.c:436: syntax error before '}' token > src/lxml/etree.c:446: field `__pyx_base' has incomplete type > src/lxml/etree.c:447: confused by earlier errors, bailing out > error: Setup script exited with error: command 'gcc' failed with exit > status 1 > > I suppose it's something wrong with the environment, but don?t know what. > If someone could help me with this I?ll be very grateful. > > Thanks > > ------------------------------------------------------------------------ > MSN Amor Busca tu ? naranja > > > ------------------------------------------------------------------------ > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev From aguilar.roger at hotmail.com Thu Apr 26 23:26:20 2007 From: aguilar.roger at hotmail.com (=?iso-8859-1?B?UvNnZXIgQWd1aWxhcg==?=) Date: Thu, 26 Apr 2007 15:26:20 -0600 Subject: [lxml-dev] Error installing lxml In-Reply-To: <46311874.4050408@eastlink.ca> Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070426/e894c2c7/attachment.htm From pawel at praterm.com.pl Thu Apr 26 23:39:35 2007 From: pawel at praterm.com.pl (=?UTF-8?B?UGF3ZcWCIFBhxYJ1Y2hh?=) Date: Thu, 26 Apr 2007 23:39:35 +0200 Subject: [lxml-dev] Error installing lxml In-Reply-To: References: Message-ID: <46311C17.9040604@praterm.com.pl> R?ger Aguilar wrote: > lixml2 and libxslt are already installed. But you need libxml2 and libxslt _development_ packages. What Linux distribution do you use? Pawel Palucha From aguilar.roger at hotmail.com Thu Apr 26 23:42:49 2007 From: aguilar.roger at hotmail.com (=?iso-8859-1?B?UvNnZXIgQWd1aWxhcg==?=) Date: Thu, 26 Apr 2007 15:42:49 -0600 Subject: [lxml-dev] Error installing lxml In-Reply-To: <46311C17.9040604@praterm.com.pl> Message-ID: An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070426/89b48ce8/attachment-0001.htm From faassen at startifact.com Fri Apr 27 00:09:03 2007 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 27 Apr 2007 00:09:03 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: <46310942.7030001@behnel.de> References: <46310942.7030001@behnel.de> Message-ID: Hi there, Stefan Behnel wrote: > But there is another area where help is appreciated. A very important area in > fact: *documentation*. While there is quite a bit of documentation both on > ElementTree and lxml, there are certainly places where lxml's API and its way > of doing XML are hard to access, especially for new users and those who have a > fixed (should I say: Java-ish?) mindset on XML. If you want to contribute, > helping out in this area is warmly appreciated. Here are a few ideas that > would be truely helpful for lxml's user base. I think the lxml documentation project is a great initiative and I encourage everybody to join in! Besides the topics Stefan mentioned, I think we should consider creating complete API documentation for lxml looking similar to what's on www.python.org for the core library. I think this should include both the ElementTree API and the lxml extensions in one place. lxml extensions to the API should be marked in the docs. I think having a clear overview of the API will help people find and use the numerous somewhat hidden treasures that exist in lxml. So, API volunteers, you don't already need to be an expert on the lxml API. Writing a bit of API doc would be a good way to *become* an expert, though. I will be happy to help get any API docs volunteers on their way, so if you start this, you won't be on your own. I'm excited about this documentation project and I'm hoping we'll get a few great new contributors! Regards, Martijn From jholg at gmx.de Fri Apr 27 11:10:58 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 27 Apr 2007 11:10:58 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: <46310942.7030001@behnel.de> References: <46310942.7030001@behnel.de> Message-ID: <20070427091058.279240@gmx.net> Hi Stefan, hi all, > > lxml 1.3 is nearing completion. There were some major changes under the > hood, Is there a planned release date? Do you plan to get the xsi:type="xsd:" thingie into 1.3? I'd love to have this in and I might be able to contribute if needed, but would have to know how much time left until the final 1.3 release; because I certainly will not be able to do so until the end of next week. > [...] > But there is another area where help is appreciated. A very important area > in > fact: *documentation*. While there is quite a bit of documentation both on > ElementTree and lxml, there are certainly places where lxml's API and its > way > of doing XML are hard to access, especially for new users and those who > have a > fixed (should I say: Java-ish?) mindset on XML. If you want to contribute, > helping out in this area is warmly appreciated. Here are a few ideas that > would be truely helpful for lxml's user base. > > * I would love to see lxml's own tutorial that gets the main ideas and the > most useful features across without caring too much about ElementTree > (which > already has a tutorial). I do have kind of a tutorial introduction to lxml.objectify, but we tend to wrap some of the entry points into our custom API, and we use some extensions (namely datetime and decimal, so this does currently not match 1-to-1 to out-of-the-box objectify. As the official objectify documentation is kind of tutorial-like itself, maybe I could check where I could add enhancements to that. Regarding API documentation I vote for some reference doc that is actually generated from docstrings or source code documentation. What about pydoc? > * Some statistics: what /are/ the most useful features of lxml? What do > people > like or use most? What parts of lxml should be more accessible? Which > parts > are so well done that people grasp their usage immediately (and should > therefore be promoted as an eye-catcher)? For me, that's 1. standards-compliance by intelligently building on libxml2/libxslt 2. feature-richness: covers extremely convenient XML handling plus Schema/RelaxNG validation and XSLT 3. stability and maturity 4. extensibility 5. performance > * We could benefit from a Wiki where users could contribute code examples, > best practices, work-arounds or tool snippets. We should also start > linking to > external pages, blogs, presentations on lxml or ElementTree that others > might > find interesting. A wiki would be nice. I really think lxml has the potential to be THE python XML toolkit. The only thing users might keep from it sometimes is the dependency on the massive libxml2, which can be addressed by a good build/dependency system. And, as said, building on libxml2 is of course also lxml's biggest advantage. Btw I for one don't like eggs; I like to package libraries in my platform package format. Anyone know about a tool to convert an egg to a Sun package? Keep up the superb work, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From stefan_ml at behnel.de Fri Apr 27 12:26:30 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 27 Apr 2007 12:26:30 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: References: <46310942.7030001@behnel.de> Message-ID: <4631CFD6.5090104@behnel.de> Martijn Faassen wrote: > Stefan Behnel wrote: >> But there is another area where help is appreciated. A very important area in >> fact: *documentation*. While there is quite a bit of documentation both on >> ElementTree and lxml, there are certainly places where lxml's API and its way >> of doing XML are hard to access, especially for new users and those who have a >> fixed (should I say: Java-ish?) mindset on XML. If you want to contribute, >> helping out in this area is warmly appreciated. Here are a few ideas that >> would be truely helpful for lxml's user base. > > Besides the topics Stefan mentioned, I think we should consider creating > complete API documentation for lxml looking similar to what's on > www.python.org for the core library. Definitely. Docstrings are an important point here. They serve both for online-docs via help() and can be used to extract docs into other formats. I'm not aware of any doc-gen tools that reads Pyrex, though. While we could import the module and see what we get, we'd also need support for figuring out the signatures of methods and functions, which C-classes don't provide. Any ideas? Stefan From faassen at startifact.com Fri Apr 27 14:09:59 2007 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 27 Apr 2007 14:09:59 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: <20070427091058.279240@gmx.net> References: <46310942.7030001@behnel.de> <20070427091058.279240@gmx.net> Message-ID: <4631E817.3090606@startifact.com> jholg at gmx.de wrote: [snip useful thoughts] > Btw I for one don't like eggs; I like to package libraries in my > platform package format. Anyone know about a tool to convert an egg > to a Sun package? Converting eggs themselves, I don't know. Distutils/setuptools is able however is pluggable and should have the information to build all kinds of package formats, including tarballs, eggs, and rpms. This would be the right area to look into to get native package support. In addition, the zc.buildout infrastructure that I experimented with in the past does provide nice ways to get a lxml set up which includes libxml2 and so on. Unfortunately it only makes sense if you develop the rest of your application as a buildout. zc.buildout is rumored to be growing support for RPM-based deployement and such, so that might be something else to explore. Regards, Martijn From faassen at startifact.com Fri Apr 27 14:09:59 2007 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 27 Apr 2007 14:09:59 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: <20070427091058.279240@gmx.net> References: <46310942.7030001@behnel.de> <20070427091058.279240@gmx.net> Message-ID: <4631E817.3090606@startifact.com> jholg at gmx.de wrote: [snip useful thoughts] > Btw I for one don't like eggs; I like to package libraries in my > platform package format. Anyone know about a tool to convert an egg > to a Sun package? Converting eggs themselves, I don't know. Distutils/setuptools is able however is pluggable and should have the information to build all kinds of package formats, including tarballs, eggs, and rpms. This would be the right area to look into to get native package support. In addition, the zc.buildout infrastructure that I experimented with in the past does provide nice ways to get a lxml set up which includes libxml2 and so on. Unfortunately it only makes sense if you develop the rest of your application as a buildout. zc.buildout is rumored to be growing support for RPM-based deployement and such, so that might be something else to explore. Regards, Martijn From jholg at gmx.de Fri Apr 27 14:51:09 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 27 Apr 2007 14:51:09 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: <4631E817.3090606@startifact.com> References: <46310942.7030001@behnel.de> <20070427091058.279240@gmx.net> <4631E817.3090606@startifact.com> Message-ID: <20070427125109.258080@gmx.net> Hi, > jholg at gmx.de wrote: > [snip useful thoughts] > > Btw I for one don't like eggs; I like to package libraries in my > > platform package format. Anyone know about a tool to convert an egg > > to a Sun package? > > Converting eggs themselves, I don't know. Distutils/setuptools is able > however is pluggable and should have the information to build all kinds > of package formats, including tarballs, eggs, and rpms. This would be > the right area to look into to get native package support. > [...] I happen to have a bdist_sunpkg distutils command class that does the job. Still waiting for my company to allow me to officially contribute that to Python, what with the agreement you have to sign these days. Until then, it's python patch item 1589266 ;-): https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1589266&group_id=5470 However, the current egg-shipped stuff using setuptools tends to clutter things with egg-related stuff I'd rather not want. Happened with lxml at least, I now have an unnecessary lxml-1.2.1-py2.4.egg-info directory that I can't seem to get rid of :-) While the egg thing might have maximum ease-of-use for a lot of people, this can be different if you are a) not on linux/win (I'm on sparc solaris) b) not directly connected to the web with your workstation And I for one do not like the easy_install notion of starting to transparently download stuff. Thanks for you info, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From cz at gocept.com Mon Apr 30 08:03:31 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Mon, 30 Apr 2007 08:03:31 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 References: <46310942.7030001@behnel.de> Message-ID: On 2007-04-26 22:19:14 +0200, Stefan Behnel said: > Hi all, > > lxml 1.3 is nearing completion. There were some major changes under the hood, > but the most visible part of the new release is actually the new layout of > the documentation site, which should make it much more accessible. As usual, > the preview is here: > > http://codespeak.net/lxml/dev/ > > Some of you have mentioned their impression that it's hard to help out on lxml > as it's written in Pyrex, not Python. Although the current code looks very > C-ish in many places, this is more of a performance optimisation than a real > requirement. Pyrex actually makes it possible to work on the code in a very > Python-like style, and to make the C-ification a matter of later improvement. > So Python(-like) implementations of new features are definitely welcome. A > non-optimised implementation of an interesting feature is much better than the > lack of this feature would be. So, everyone is invited to get involved in > making the code even better than it is today. The problem for me always was that the Pyrex required was some special version. And if you'd just checkout the code you couldn't compile it just like that. If there's a way to "fix" that (like with a buildout) I'd be very willing to do changes, even in Pyrex. Pyrex doesn't look too strange to me. :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Mon Apr 30 08:27:44 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 30 Apr 2007 08:27:44 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: References: <46310942.7030001@behnel.de> Message-ID: <46358C60.9000103@behnel.de> Hi, Christian Zagrodnick wrote: > The problem for me always was that the Pyrex required was some special > version. And if you'd just checkout the code you couldn't compile it > just like that. If there's a way to "fix" that (like with a buildout) > I'd be very willing to do changes, even in Pyrex. There are currently two ways to get a working Pyrex. One is to download the source distribution of lxml which includes Pyrex. The other is to "svn co" the Pyrex source from the lxml repository. See http://codespeak.net/lxml/dev/build.html#pyrex The Subversion URL is: http://codespeak.net/svn/lxml/pyrex/ Here, it's actually sufficient to checkout the "Pyrex" directory under the lxml source tree, i.e. svn co http://codespeak.net/svn/lxml/trunk lxml cd lxml svn co http://codespeak.net/svn/lxml/pyrex/Pyrex Pyrex That has the additional advantage that you can "svn up" both with a single comand. Another thing to document ... Stefan From mike at it-loops.com Mon Apr 30 09:25:50 2007 From: mike at it-loops.com (Michael Guntsche) Date: Mon, 30 Apr 2007 09:25:50 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: <46358C60.9000103@behnel.de> References: <46310942.7030001@behnel.de> <46358C60.9000103@behnel.de> Message-ID: Stefan Behnel writes: > Here, it's actually sufficient to checkout the "Pyrex" directory under the > lxml source tree, i.e. > > svn co http://codespeak.net/svn/lxml/trunk lxml > cd lxml > svn co http://codespeak.net/svn/lxml/pyrex/Pyrex Pyrex > > That has the additional advantage that you can "svn up" both with a single command. You need to edit the svn:externals property so Pyrex gets updated as well. You can do the following. svn co http://codespeak.net/svn/lxml/trunk lxml svn ps svn:externals "Pyrex http://codespeak.net/svn/lxml/pyrex/Pyrex" lxml svn up lxml This we everything gets updated, when you do a "svn up". Maybe it makes sense to put svn:externals in trunk, since people who checkout from trunk need Pyrex anyway. Kind regards, Michael From stefan_ml at behnel.de Mon Apr 30 11:35:00 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 30 Apr 2007 11:35:00 +0200 Subject: [lxml-dev] Call for contribution towards lxml 1.3 In-Reply-To: References: <46310942.7030001@behnel.de> <46358C60.9000103@behnel.de> Message-ID: <4635B844.2080108@behnel.de> Hi Michael, Michael Guntsche wrote: > Stefan Behnel writes: >> Here, it's actually sufficient to checkout the "Pyrex" directory under the >> lxml source tree, i.e. >> >> svn co http://codespeak.net/svn/lxml/trunk lxml >> cd lxml >> svn co http://codespeak.net/svn/lxml/pyrex/Pyrex Pyrex >> >> That has the additional advantage that you can "svn up" both with a single command. > > You need to edit the svn:externals property so Pyrex gets updated as well. > You can do the following. > > svn co http://codespeak.net/svn/lxml/trunk lxml > svn ps svn:externals "Pyrex http://codespeak.net/svn/lxml/pyrex/Pyrex" > lxml > svn up lxml > > This way everything gets updated, when you do a "svn up". > Maybe it makes sense to put svn:externals in trunk, since people who > checkout from trunk need Pyrex anyway. I was always hoping we could get back to depending on a normal Pyrex release rather sooner than later, but I guess you're right. Since Greg doesn't follow a very open project management style, it's hard to predict when lxml will be able to build with an unpatched Pyrex release. I'll go with the above for now... Stefan From martin at martinthomas.net Mon Apr 30 18:12:42 2007 From: martin at martinthomas.net (martin at martinthomas.net) Date: Mon, 30 Apr 2007 11:12:42 -0500 Subject: [lxml-dev] Whoops, Internal Error Message-ID: <20070430111242.5nn8bxf4ragowck0@64.40.144.195> Using the lxml rpm for FC6 and Python 2.4, I get an internal error when I try validating a document against a XMLschema document. The xml document that I am trying to validate and the XMLschema which I am validating against both came from NIST (contained in the 'Complete 1.1.3 Schema Bundle .zip' at http://nvd.nist.gov/scap/xccdf/xccdf.cfm). The error message reads Internal error: xmlSchemaIDCRegisterMatchers, Could not find an augmented IDC item for an IDC definition. I'll write this up properly tonight and send in an error log, along with all the schema documents etc unless someone tells me otherwise. Cheers // Martin