From martin at martinthomas.net Tue May 1 04:55:24 2007 From: martin at martinthomas.net (Martin Thomas) Date: Mon, 30 Apr 2007 21:55:24 -0500 Subject: [lxml-dev] Whoops, Internal Error In-Reply-To: <20070430111242.5nn8bxf4ragowck0@64.40.144.195> References: <20070430111242.5nn8bxf4ragowck0@64.40.144.195> Message-ID: <1177988124.7775.31.camel@tigger> I have attached the file to be validated and the schema that was causing a problem (there are 5 more schemas involved but I didn't think you needed them - either download them or ask me to email them) as well as a python script that I used to create the problem. The output is as follows: ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: xmlSchemaIDCRegisterMatchers, Could not find an augmented IDC item for an IDC definition. ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: xmlSchemaValidateElem, calling xmlSchemaValidateElemDecl(). ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: xmlSchemaDocWalk, calling xmlSchemaValidateElem(). The line number in ios.xml corresponds to a cpe-list that is defined in the attached schema. If I remove it from ios.xml, everything else passes. Cheers // Martin On Mon, 2007-04-30 at 11:12 -0500, martin at martinthomas.net wrote: > Using the lxml rpm for FC6 and Python 2.4, I get an internal error > when I try validating a document against a XMLschema document. The > xml document that I am trying to validate and the XMLschema which I am > validating against both came from NIST (contained in the 'Complete > 1.1.3 Schema Bundle .zip' at http://nvd.nist.gov/scap/xccdf/xccdf.cfm). > > The error message reads Internal error: xmlSchemaIDCRegisterMatchers, > Could not find an augmented IDC item for an IDC definition. > > I'll write this up properly tonight and send in an error log, along > with all the schema documents etc unless someone tells me otherwise. > > Cheers // Martin > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: cpe-1.0.xsd Type: application/xml Size: 7544 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070430/4016ba70/attachment-0002.xml -------------- next part -------------- A non-text attachment was scrubbed... Name: ios.xml Type: application/xml Size: 14944 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070430/4016ba70/attachment-0003.xml -------------- next part -------------- A non-text attachment was scrubbed... Name: xsd.py Type: text/x-python Size: 215 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070430/4016ba70/attachment-0001.py From stefan_ml at behnel.de Tue May 1 08:52:15 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2007 08:52:15 +0200 Subject: [lxml-dev] Whoops, Internal Error In-Reply-To: <1177988124.7775.31.camel@tigger> References: <20070430111242.5nn8bxf4ragowck0@64.40.144.195> <1177988124.7775.31.camel@tigger> Message-ID: <4636E39F.1050900@behnel.de> Hi Martin, a quick test (after renaming cpe-1.0.xsd to xccdf-1.1.xsd) didn't show any problem with lxml trunk and libxml2 2.6.17. Since you didn't mention any of the versions of lxml or libxml2 you are using, I assume it's just a problem with an older libxml2 version. XML-Schema is still under development in libxml2, so any newer version is likely to provide better support and bug fixes. Please upgrade and retry. Regards, Stefan Martin Thomas wrote: > I have attached the file to be validated and the schema that was causing > a problem (there are 5 more schemas involved but I didn't think you > needed them - either download them or ask me to email them) as well as a > python script that I used to create the problem. > > The output is as follows: > ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: > xmlSchemaIDCRegisterMatchers, Could not find an augmented IDC item for > an IDC definition. > ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: > xmlSchemaValidateElem, calling xmlSchemaValidateElemDecl(). > ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: > xmlSchemaDocWalk, calling xmlSchemaValidateElem(). > > The line number in ios.xml corresponds to a cpe-list that is defined in > the attached schema. If I remove it from ios.xml, everything else > passes. > > Cheers // Martin > > > > On Mon, 2007-04-30 at 11:12 -0500, martin at martinthomas.net wrote: >> Using the lxml rpm for FC6 and Python 2.4, I get an internal error >> when I try validating a document against a XMLschema document. The >> xml document that I am trying to validate and the XMLschema which I am >> validating against both came from NIST (contained in the 'Complete >> 1.1.3 Schema Bundle .zip' at http://nvd.nist.gov/scap/xccdf/xccdf.cfm). >> >> The error message reads Internal error: xmlSchemaIDCRegisterMatchers, >> Could not find an augmented IDC item for an IDC definition. >> >> I'll write this up properly tonight and send in an error log, along >> with all the schema documents etc unless someone tells me otherwise. >> >> Cheers // Martin >> >> _______________________________________________ >> lxml-dev mailing list >> lxml-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/lxml-dev >> >> ------------------------------------------------------------------------ >> >> from lxml import etree >> >> xsd = etree.ElementTree(file='xccdf-1.1.xsd') >> >> doc = etree.ElementTree(file='ios.xml') >> >> xsv = etree.XMLSchema(xsd) >> try: >> xsv.validate(doc) >> except Exception, e: >> pass >> >> print e.error_log >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> lxml-dev mailing list >> lxml-dev at codespeak.net >> http://codespeak.net/mailman/listinfo/lxml-dev From martin at martinthomas.net Tue May 1 16:06:34 2007 From: martin at martinthomas.net (martin at martinthomas.net) Date: Tue, 01 May 2007 09:06:34 -0500 Subject: [lxml-dev] Whoops, Internal Error Message-ID: <20070501090634.69ncpr4ilxukg0co@64.40.144.195> Stefan, I can reproduce this problem on Cygwin and Fedora Core 6 which have libxml2 2.6.26 and 2.6.26 respectively. Sorry if I have confused things by sending that schema doc. There are 6 different schema documents involved: xccdf-1.1.xsd xccdfp-1.1.xsd xml.xsd platform-0.2.3.xsd cpe-1.0.xsd simpledc20021212.xsd They are available from nist.gov in the zip file at the URL I gave earlier. I only attached the CPE schema because the element causing the internal error belongs in the CPE namespace and I didn't want to attach documents that are publicly available. As it turns out, this is an error in libxml2.. if I use xmllint, I get the same error message. I'll send them a bug report. Thanks // M Quoting Stefan Behnel : > Hi Martin, > > a quick test (after renaming cpe-1.0.xsd to xccdf-1.1.xsd) didn't show any > problem with lxml trunk and libxml2 2.6.17. Since you didn't mention any of > the versions of lxml or libxml2 you are using, I assume it's just a problem > with an older libxml2 version. XML-Schema is still under development in > libxml2, so any newer version is likely to provide better support and bug > fixes. Please upgrade and retry. > > Regards, > Stefan > > > Martin Thomas wrote: >> I have attached the file to be validated and the schema that was causing >> a problem (there are 5 more schemas involved but I didn't think you >> needed them - either download them or ask me to email them) as well as a >> python script that I used to create the problem. >> >> The output is as follows: >> ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: >> xmlSchemaIDCRegisterMatchers, Could not find an augmented IDC item for >> an IDC definition. >> ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: >> xmlSchemaValidateElem, calling xmlSchemaValidateElemDecl(). >> ios.xml:41:ERROR:SCHEMASV:SCHEMAV_INTERNAL: Internal error: >> xmlSchemaDocWalk, calling xmlSchemaValidateElem(). >> >> The line number in ios.xml corresponds to a cpe-list that is defined in >> the attached schema. If I remove it from ios.xml, everything else >> passes. >> >> Cheers // Martin >> >> >> >> On Mon, 2007-04-30 at 11:12 -0500, martin at martinthomas.net wrote: >>> Using the lxml rpm for FC6 and Python 2.4, I get an internal error >>> when I try validating a document against a XMLschema document. The >>> xml document that I am trying to validate and the XMLschema which I am >>> validating against both came from NIST (contained in the 'Complete >>> 1.1.3 Schema Bundle .zip' at http://nvd.nist.gov/scap/xccdf/xccdf.cfm). >>> >>> The error message reads Internal error: xmlSchemaIDCRegisterMatchers, >>> Could not find an augmented IDC item for an IDC definition. >>> >>> I'll write this up properly tonight and send in an error log, along >>> with all the schema documents etc unless someone tells me otherwise. >>> >>> Cheers // Martin >>> >>> _______________________________________________ >>> lxml-dev mailing list >>> lxml-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/lxml-dev >>> >>> ------------------------------------------------------------------------ >>> >>> from lxml import etree >>> >>> xsd = etree.ElementTree(file='xccdf-1.1.xsd') >>> >>> doc = etree.ElementTree(file='ios.xml') >>> >>> xsv = etree.XMLSchema(xsd) >>> try: >>> xsv.validate(doc) >>> except Exception, e: >>> pass >>> >>> print e.error_log >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> lxml-dev mailing list >>> lxml-dev at codespeak.net >>> http://codespeak.net/mailman/listinfo/lxml-dev > From jholg at gmx.de Thu May 3 11:42:59 2007 From: jholg at gmx.de (Holger Joukl) Date: Thu, 03 May 2007 11:42:59 +0200 Subject: [lxml-dev] etree.XMLSchema generic error "Document is not valid XML Schema" Message-ID: <20070503094259.131360@gmx.net> Hi, is there any chance to expose more detailed information on why a schema document is not a valid XML Schema? A brief look at libxml2 schema API did not really tell me anything. Regards, Holger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070503/c712bb2d/attachment.htm From tseaver at palladion.com Thu May 3 19:20:12 2007 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 03 May 2007 13:20:12 -0400 Subject: [lxml-dev] etree.XMLSchema generic error "Document is not valid XML Schema" In-Reply-To: <20070503094259.131360@gmx.net> References: <20070503094259.131360@gmx.net> Message-ID: <463A19CC.7020809@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Holger Joukl wrote: > Hi, > is there any chance to expose more detailed information on why a schema > document is not a valid XML Schema? > A brief look at libxml2 schema API did not really tell me anything. Can you try validating it with xmllint? If you get the same error message, then the bug / problem is in libxml2, rather than lxml. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGOhnM+gerLs4ltQ4RAufUAKCJXoSOljP09ufsvyS8O0jS+D+8XACg1n7W TigLDp78G/4O3wcdKqXmy7c= =209+ -----END PGP SIGNATURE----- From jholg at gmx.de Fri May 4 09:23:09 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 04 May 2007 09:23:09 +0200 Subject: [lxml-dev] etree.XMLSchema generic error "Document is not valid XML Schema" In-Reply-To: <463A19CC.7020809@palladion.com> References: <20070503094259.131360@gmx.net> <463A19CC.7020809@palladion.com> Message-ID: <20070504072309.223240@gmx.net> Hi, > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Holger Joukl wrote: > > Hi, > > is there any chance to expose more detailed information on why a schema > > document is not a valid XML Schema? > > A brief look at libxml2 schema API did not really tell me anything. > > Can you try validating it with xmllint? If you get the same error > message, then the bug / problem is in libxml2, rather than lxml. This is not a bug in libxml2 nor lxml, it's a feature request. I know that the schema is not a valid schema, but I'd like to see the reason why. lxml does currently not present any errors from the libxml2 layer when instantiating a schema. If this is possible at all, I don't know. Of course, it is always possible to use another tool, like a good schema editor, but for quick small hacks it would be really convenient to see what you missed. Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From jholg at gmx.de Fri May 4 09:23:09 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 04 May 2007 09:23:09 +0200 Subject: [lxml-dev] etree.XMLSchema generic error "Document is not valid XML Schema" In-Reply-To: <463A19CC.7020809@palladion.com> References: <20070503094259.131360@gmx.net> <463A19CC.7020809@palladion.com> Message-ID: <20070504072309.223240@gmx.net> Hi, > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Holger Joukl wrote: > > Hi, > > is there any chance to expose more detailed information on why a schema > > document is not a valid XML Schema? > > A brief look at libxml2 schema API did not really tell me anything. > > Can you try validating it with xmllint? If you get the same error > message, then the bug / problem is in libxml2, rather than lxml. This is not a bug in libxml2 nor lxml, it's a feature request. I know that the schema is not a valid schema, but I'd like to see the reason why. lxml does currently not present any errors from the libxml2 layer when instantiating a schema. If this is possible at all, I don't know. Of course, it is always possible to use another tool, like a good schema editor, but for quick small hacks it would be really convenient to see what you missed. Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From erik.swanson at gmail.com Fri May 4 18:23:01 2007 From: erik.swanson at gmail.com (Erik Swanson) Date: Fri, 4 May 2007 09:23:01 -0700 Subject: [lxml-dev] lxml.sax.saxify breaks on comments; `make test` failure on MacPython 2.5.1 Message-ID: <57993d730705040923h56da9c8fta43bebbb85556e0@mail.gmail.com> There appears to be a bug with lxml.sax's handling of comments, as the following code causes lxml.sax.saxify to fail: """ import lxml.etree, lxml.sax, xml.sax.handler from cStringIO import StringIO p = lxml.etree.HTMLParser(remove_blank_text=True) h = xml.sax.handler.ContentHandler() f = StringIO("

bar

") t = lxml.etree.parse(f, p) lxml.sax.saxify(t, h) """ """ Traceback (most recent call last): File "saxBug.py", line 11, in lxml.sax.saxify(t, h) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml- 1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 178, in saxify return ElementTreeProducer(element_or_tree, content_handler).saxify() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml- 1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 130, in saxify self._recursive_saxify(self._element, {}) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml- 1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 160, in _recursive_saxify self._recursive_saxify(child, prefixes) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml- 1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 160, in _recursive_saxify self._recursive_saxify(child, prefixes) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml- 1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 149, in _recursive_saxify ns_uri, local_name = _getNsTag(element.tag) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/lxml- 1.3beta-py2.5-macosx-10.4-i386.egg/lxml/sax.py", line 8, in _getNsTag if tag[0] == '{': TypeError: 'builtin_function_or_method' object is unsubscriptable """ I have been able to replicate the above error with both release and svn lxml, as well as with both Apple-supplied libxml2/libxslt and up-to-date libraries. Also, and I doubt this is related, but `make test` fails for me on OS X 10.4.9 with MacPython 2.5.1 (python.org binary): """ python test.py -p -v TESTED VERSION: Python: (2, 5, 1, 'final', 0) lxml.etree: (1, 3, -1, 42667) libxml used: (2, 6, 28) libxml compiled: (2, 6, 28) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 733/733 (100.0%): Doctest: xpathxslt.txt ====================================================================== FAIL: test_module_HTML_unicode ( lxml.tests.test_htmlparser.HtmlParserTestCaseBase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 260, in run testMethod() File "/Users/erik/Projects/lxml/src/lxml/tests/test_htmlparser.py", line 33, in test_module_HTML_unicode self.uhtml_str) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", line 334, in failUnlessEqual (msg or '%r != %r' % (first, second)) AssertionError: u'test \xc3\x83\xc2\xa1\xef\xa3\x92

page \xc3\x83\xc2\xa1\xef\xa3\x92 title

' != u'test \xc3\xa1\uf8d2

page \xc3\xa1\uf8d2 title

' ---------------------------------------------------------------------- Ran 733 tests in 1.380s FAILED (failures=1) """ -- Erik Swanson -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070504/ccd2b4b6/attachment-0001.htm From stefan_ml at behnel.de Fri May 4 19:26:15 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 04 May 2007 19:26:15 +0200 Subject: [lxml-dev] lxml.sax.saxify breaks on comments; `make test` failure on MacPython 2.5.1 In-Reply-To: <57993d730705040923h56da9c8fta43bebbb85556e0@mail.gmail.com> References: <57993d730705040923h56da9c8fta43bebbb85556e0@mail.gmail.com> Message-ID: <463B6CB7.4050404@behnel.de> Hi, Erik Swanson wrote: > There appears to be a bug with lxml.sax's handling of comments, as the > following code causes lxml.sax.saxify to fail: > > """ > import lxml.etree , lxml.sax, xml.sax.handler > from cStringIO import StringIO > > p = lxml.etree.HTMLParser(remove_blank_text=True) > h = xml.sax.handler.ContentHandler() > f = StringIO("

bar

") > t = lxml.etree.parse(f, p) > lxml.sax.saxify(t, h) > """ ah, yes, thanks for the report. This is due to the way ElementTree handles Element.tag for comments and processing instructions. They actually return their factory functions and lxml.etree follows them for compatibility. But the real problem is obviously in lxml.sax. It should handle comments correctly. I'll fix it. > Also, and I doubt this is related, but `make test` fails for me on OS X > 10.4.9 with MacPython 2.5.1 (python.org binary): > > """ > python test.py -p -v > > TESTED VERSION: > Python: (2, 5, 1, 'final', 0) > lxml.etree : (1, 3, -1, 42667) > libxml used: (2, 6, 28) > libxml compiled: (2, 6, 28) > libxslt used: (1, 1, 20) > libxslt compiled: (1, 1, 20) > > 733/733 (100.0%): Doctest: xpathxslt.txt > > ====================================================================== > FAIL: test_module_HTML_unicode ( > lxml.tests.test_htmlparser.HtmlParserTestCaseBase) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", > line 260, in run > testMethod() > File "/Users/erik/Projects/lxml/src/lxml/tests/test_htmlparser.py", > line 33, in test_module_HTML_unicode > self.uhtml_str) > File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py", > line 334, in failUnlessEqual > (msg or '%r != %r' % (first, second)) > AssertionError: u'test > \xc3\x83\xc2\xa1\xef\xa3\x92

page > \xc3\x83\xc2\xa1\xef\xa3\x92 title

' != > u'test \xc3\xa1\uf8d2

page > \xc3\xa1\uf8d2 title

' > > ---------------------------------------------------------------------- > Ran 733 tests in 1.380s > > FAILED (failures=1) > """ Good to know. Not a big problem, but an annoying one, as it breaks the test suite. I'll look into that, too. Thanks for the reports, Stefan From stefan_ml at behnel.de Sat May 5 12:30:33 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 05 May 2007 12:30:33 +0200 Subject: [lxml-dev] prepending PIs and serialising them - finally Message-ID: <463C5CC9.1020805@behnel.de> Hi, since the problem came up again when fixing the SAX issue, I finally decided that it was time for a way to prepend processing instructions to a tree. Elements now have general methods 'el.addprevious(sibling)' and 'el.addnext(sibling)'. They move the new sibling either before or after the element. The methods also check that you can't create a second root node next to another (TypeError) and will discard the tail text if adding at the top level. So they will (try to) prevent you from creating broken XML. Only PIs and comments are allowed as siblings of a root node, but any element can be used inside the tree. While I was at it, I also fixed the issue with writing out comments and PIs that are siblings of a root node. Note that only root nodes are special cased here, so that you get the complete document if you serialise the root node. Elements in the tree will not write out their siblings if you serialise them. Another step towards a shining new lxml 1.3, I'd say. :) Have fun, Stefan From stefan_ml at behnel.de Sat May 5 19:10:06 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 05 May 2007 19:10:06 +0200 Subject: [lxml-dev] feedback on XPath docs? Message-ID: <463CBA6E.1050302@behnel.de> Hi all, I happened to take a deeper skip over the XPath documentation page of lxml, and since there was loads of stuff missing, I decided to give it half a rewrite. Now I'm interested in feedback to see if it is understandable or if any other important (or interesting) stuff is missing. http://codespeak.net/lxml/dev/xpathxslt.html Any comments? Stefan PS: feed back on the other doc pages will be appreciated, too. :) From stefan_ml at behnel.de Mon May 7 05:45:36 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2007 05:45:36 +0200 Subject: [lxml-dev] etree.XMLSchema generic error "Document is not valid XML Schema" In-Reply-To: <20070503094259.131360@gmx.net> References: <20070503094259.131360@gmx.net> Message-ID: <463EA0E0.8020001@behnel.de> Hi, Holger Joukl wrote: > is there any chance to expose more detailed information on why a schema > document is not a valid XML Schema? > A brief look at libxml2 schema API did not really tell me anything. Have you looked at the error log of the exception? >>> from lxml.etree import XML, XMLSchema >>> try: ... XMLSchema(XML("")) ... except Exception, e: ... print e.error_log :0:ERROR:SCHEMASP:SCHEMAP_NOT_SCHEMA: The XML document '(null)' is not a schema document. I assume the reported errors here are more telling if you pass a real schema document, though. Stefan From jholg at gmx.de Mon May 7 11:29:40 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 07 May 2007 11:29:40 +0200 Subject: [lxml-dev] etree.XMLSchema generic error "Document is not valid XML Schema" In-Reply-To: <463EA0E0.8020001@behnel.de> References: <20070503094259.131360@gmx.net> <463EA0E0.8020001@behnel.de> Message-ID: <20070507092940.181680@gmx.net> Hi, > Have you looked at the error log of the exception? > > >>> from lxml.etree import XML, XMLSchema > >>> try: > ... XMLSchema(XML("")) > ... except Exception, e: > ... print e.error_log > :0:ERROR:SCHEMASP:SCHEMAP_NOT_SCHEMA: The XML document '(null)' > is > not a schema document. Oh, right. I wasn't aware of the error_log concept in lxml exceptions. > I assume the reported errors here are more telling if you pass a real > schema > document, though. They are, indeed. Thanks, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From jholg at gmx.de Mon May 7 13:26:35 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 07 May 2007 13:26:35 +0200 Subject: [lxml-dev] feedback on XPath docs? In-Reply-To: <463CBA6E.1050302@behnel.de> References: <463CBA6E.1050302@behnel.de> Message-ID: <20070507112635.225700@gmx.net> Hi, > > http://codespeak.net/lxml/dev/xpathxslt.html > > Any comments? I think it's well understandable. I'd grant the getpath convenience its own paragraph + heading as it is a nice little goodie and does not seem especially related to XPath return values ("A related convenience method of ElementTree objects is getpath(element),[...]") Suggestion: "XPath expression generation for Elements" If interesting stuff is missing I wouldn't know as the documentation now covers more than what I've ever used of lxml XPath capabilities. Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From Curtis at DAYCOS.com Mon May 7 21:03:44 2007 From: Curtis at DAYCOS.com (Curtis Scheer) Date: Mon, 7 May 2007 14:03:44 -0500 Subject: [lxml-dev] relaxNQ errors Message-ID: <031936836C46D611BB1B00508BE7345D0511687D@gatekeeper.daycos.com> New to the lxml library so forgive me if I missed this in the documentation, I am trying to validate an xml file against RelaxNG. The function relaxing() appears to only returned a 1 or 0 based on whether it is valid. Is there a way I can get the specific error as to why it failed? Thanks, Curtis -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070507/ce9f2bff/attachment.htm From stefan_ml at behnel.de Mon May 7 22:19:41 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2007 22:19:41 +0200 Subject: [lxml-dev] relaxNQ errors In-Reply-To: <031936836C46D611BB1B00508BE7345D0511687D@gatekeeper.daycos.com> References: <031936836C46D611BB1B00508BE7345D0511687D@gatekeeper.daycos.com> Message-ID: <463F89DD.4050401@behnel.de> Hi, Curtis Scheer wrote: > New to the lxml library so forgive me if I missed this in the > documentation, I am trying to validate an xml file against RelaxNG. The > function relaxing() appears to only returned a 1 or 0 based on whether > it is valid. Is there a way I can get the specific error as to why it > failed? Please refer to the in in-development docs of lxml, they are much easier to read. http://codespeak.net/lxml/dev/validation.html Hope it helps, Stefan From Curtis at DAYCOS.com Mon May 7 22:49:05 2007 From: Curtis at DAYCOS.com (Curtis Scheer) Date: Mon, 7 May 2007 15:49:05 -0500 Subject: [lxml-dev] relaxNQ errors Message-ID: <031936836C46D611BB1B00508BE7345D051168FC@gatekeeper.daycos.com> Thanks for the help, so far I am quite impressed with this library. -----Original Message----- From: Stefan Behnel [mailto:stefan_ml at behnel.de] Sent: Monday, May 07, 2007 3:20 PM To: Curtis Scheer Cc: lxml-dev at codespeak.net Subject: Re: [lxml-dev] relaxNQ errors Hi, Curtis Scheer wrote: > New to the lxml library so forgive me if I missed this in the > documentation, I am trying to validate an xml file against RelaxNG. The > function relaxing() appears to only returned a 1 or 0 based on whether > it is valid. Is there a way I can get the specific error as to why it > failed? Please refer to the in in-development docs of lxml, they are much easier to read. http://codespeak.net/lxml/dev/validation.html Hope it helps, Stefan _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev From stefan_ml at behnel.de Sat May 12 17:35:57 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 12 May 2007 17:35:57 +0200 Subject: [lxml-dev] XPath exceptions Message-ID: <4645DEDD.9090604@behnel.de> Hi all, I've just rewritten the XPath exception generation code to provide better error messages on failure. lxml 1.3 will have two main XPath exceptions that inherit from XPathError: XPathSyntaxError and XPathEvalError. However, the problem is that they are not entirely, well, consistent. When you create an XPath object, you will nicely get an XPathSyntaxError if something goes wrong in the instantiation (parsing) and an XPathEvalError if you call evaluate(). But when you use the other two evaluators (or the xpath() method), parsing and evaluating are really one step, so you will always get an eval error. Meaning, you may get different exceptions from XPath() and the xpath() method for the same XPath expression. It is easy to raise different errors from the xpath() method depending on what libxml2 tells us about the error source. However, libxml2 also uses a generic error code for both syntax and eval errors if the error is not more specific (e.g. for a completely unparsable expression), so there are cases where lxml cannot tell what kind of error it was, so it would have to default to an eval error for the xpath() method. This would still give you different exceptions from the XPath() class and the xpath() method for the same unparsable expression. Another problem is backward compatibility: if we introduce a new exception for the evaluation, these errors will no longer be caught by existing code that catches XPathSyntaxError (or even plain SyntaxError). To make such code run with older versions of lxml, you would have to catch XPathError instead. So, it's easy to do, but you'd have change your code if it uses the original exception. BTW, this fixes the issue of missing namespace prefixes raising a syntax error. You will now get an eval error saying "Undefined namespace prefix". I would like to hear opinions about this before it becomes the official behaviour of lxml 1.3. The trunk currently implements the variant that raises different errors from xpath(). Stefan From stefan_ml at behnel.de Sun May 13 20:27:07 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 13 May 2007 20:27:07 +0200 Subject: [lxml-dev] XPath exceptions In-Reply-To: <4645DEDD.9090604@behnel.de> References: <4645DEDD.9090604@behnel.de> Message-ID: <4647587B.2070408@behnel.de> Hi again, Stefan Behnel wrote: > It is easy to raise different errors from the xpath() method depending on what > libxml2 tells us about the error source. However, libxml2 also uses a generic > error code for both syntax and eval errors if the error is not more specific > (e.g. for a completely unparsable expression), so there are cases where lxml > cannot tell what kind of error it was, so it would have to default to an eval > error for the xpath() method. This would still give you different exceptions > from the XPath() class and the xpath() method for the same unparsable expression. I now believe that always raising an eval error here is more consistent and easier to handle. The semantics of raising different errors are flawed anyway, so having a single evaluation error for an evaluation function is as consistent as we can get. Note that this still breaks backwards compatibility as the XPath evaluators and the xpath() method no longer raise a syntax error but an eval error. You can work around this to support older lxml versions by excepting on XPathError. Stefan From eric at detede.com Mon May 14 13:10:48 2007 From: eric at detede.com (Eric Garin) Date: Mon, 14 May 2007 13:10:48 +0200 Subject: [lxml-dev] Details of XML Validation errors with lxml Message-ID: <288780425.20070514131048@detede.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070514/73400253/attachment.htm From stefan_ml at behnel.de Mon May 14 13:27:38 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 14 May 2007 13:27:38 +0200 Subject: [lxml-dev] Details of XML Validation errors with lxml In-Reply-To: <288780425.20070514131048@detede.com> References: <288780425.20070514131048@detede.com> Message-ID: <464847AA.8060006@behnel.de> Hi, Eric Garin wrote: > I've found some informations > here http://codespeak.net/lxml/dev/api.html#error-handling-on-exceptions and most likely also http://codespeak.net/lxml/dev/validation.html > Where Can I found the documentation API for error_log ? The section in api.html which you mentioned above is already the main source. You might also try 'help(etree)', which will tell you what fields there are in the error log entries, or read the source code, class _LogEntry in http://codespeak.net/svn/lxml/trunk/src/lxml/xmlerror.pxi > I've found a bit by chance how to render the line number error but I > would like for example to be abble to give to the users the offset (or > position in line) where the error occur. Sadly, this information is not provided by libxml2, so lxml can't provide it either. Note also that the document might be constructed by hand (don't know your application), in which case a line number would be meaningless already. This is obviously a problem in 1-line XML documents, but a work around could be to pretty print the document, parse it back in and then run the validation a second time. This will give you a meaningful line number that you can then use to provide the user with more context such as the surrounding tags. Stefan From stefan_ml at behnel.de Mon May 14 14:37:06 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 14 May 2007 14:37:06 +0200 Subject: [lxml-dev] Details of XML Validation errors with lxml In-Reply-To: <464847AA.8060006@behnel.de> References: <288780425.20070514131048@detede.com> <464847AA.8060006@behnel.de> Message-ID: <464857F2.70209@behnel.de> Stefan Behnel wrote: > Eric Garin wrote: >> I've found a bit by chance how to render the line number error but I >> would like for example to be abble to give to the users the offset (or >> position in line) where the error occur. > > Sadly, this information is not provided by libxml2 Sorry, looks like I was mistaken here. This information *is* provided by libxml2 in some contexts, so I'll try to make it available at the lxml API level. Please check the SVN trunk (becoming lxml 1.3) to see how things advance. Stefan From stefan_ml at behnel.de Mon May 14 22:44:08 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 14 May 2007 22:44:08 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? In-Reply-To: <20070423080131.114650@gmx.net> References: <20070416095901.169710@gmx.net> <462BAC61.8040109@behnel.de> <20070423080131.114650@gmx.net> Message-ID: <4648CA18.5000909@behnel.de> Hi Holger, jholg at gmx.de wrote: >>> Is it easily possible to use QNames in the xsi-type lookup system? >> I believe this would be the right thing to do, as lxml should be >> consistent. I gave it a preliminary implementation on the trunk, could you check if this works for you? Stefan From stefan_ml at behnel.de Tue May 15 16:56:52 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 15 May 2007 16:56:52 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? In-Reply-To: <20070515143723.18920@gmx.net> References: <20070416095901.169710@gmx.net> <462BAC61.8040109@behnel.de> <20070423080131.114650@gmx.net> <4648CA18.5000909@behnel.de> <20070515143723.18920@gmx.net> Message-ID: <4649CA34.8070801@behnel.de> Hi Holger, jholg at gmx.de wrote: >> jholg at gmx.de wrote: >>>>> Is it easily possible to use QNames in the xsi-type lookup system? >>>> I believe this would be the right thing to do, as lxml should be >>>> consistent. >> I gave it a preliminary implementation on the trunk, could you check if >> this >> works for you? > > With this small fix: > > *** src/lxml/objectify.pyx.ORIG Tue May 15 16:24:31 2007 > --- src/lxml/objectify.pyx Tue May 15 16:23:14 2007 > *************** > *** 1768,1774 **** > name = _xsi > for p, ns in nsmap.items(): > if ns == XML_SCHEMA_NS: > ! _xsi = prefix + ':' + _xsi > break > else: > raise TypeError, "XSD types require the XSD namespace" > --- 1768,1775 ---- > name = _xsi > for p, ns in nsmap.items(): > if ns == XML_SCHEMA_NS: > ! if p: > ! _xsi = p + ':' + _xsi > break > else: > raise TypeError, "XSD types require the XSD namespace" > > > it mostly works for me. Sure, thanks. > However, I detected that the nice nsmap-unification currently does not > handle attributes: > >>>> msg = etree.fromstring("""""") >>>> s = DataElement("234837", _xsi="string", nsmap={'myXSD': 'http://www.w3.org/2001/XMLSchema'}) >>>> >>>> print etree.tostring(s, pretty_print=1) > 234837 >>>> print etree.tostring(msg, pretty_print=1) > >>>> msg.s = s >>>> print etree.tostring(msg, pretty_print=1) > > 234837 > > > Now that nsmap-unification has taken place, the myXSD-prefix has not been changed to FOOBAR. > I fear this is not a nice one to fix as all the attribute-value prefixes had to be checked. So basically I think this is not a problem of objectify. Definitely not. While we could potentially look for attributes that we created ourselves, I don't think it is worth having lxml handle this. Users should take care when they use their own prefixes. Maybe worth a remark in the docs... > Not really related to this, I detected what I think is a slight inconsistency regarding nsmap when using None vs '' as nsmap-keys (=prefixes): > >>>> s = DataElement("234837", _xsi="string", nsmap={'': 'http://www.w3.org/2001/XMLSchema'}) >>>> print etree.tostring(s, pretty_print=1) > 234837 >>>> s = DataElement("234837", _xsi="string", nsmap={None: 'http://www.w3.org/2001/XMLSchema'}) >>>> print etree.tostring(s, pretty_print=1) > 234837 > > Though I'm not sure if this really a bug, a mere inconvenience, or even valid (is xmlns:=... allowed?) Maybe this is rather a user error, but I don't see why lxml should not just convert this to None internally. So, two fixes applied, please check again. :) Stefan From jholg at gmx.de Wed May 16 08:56:38 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Wed, 16 May 2007 08:56:38 +0200 Subject: [lxml-dev] Fwd: Re: XPath exceptions Message-ID: <20070516065638.229930@gmx.net> Hi, a quick question on nsmaps in XPath: Is it intentional that this >>> >>> root = etree.fromstring("what?") >>> root.xpath("//a", {'':'my/foo/bar/URI'}) ['what?'] >>> root.xpath("//a") ['what?'] >>> but >>> root = etree.fromstring("what?") >>> root.xpath("//a", {'':'my/foo/bar/URI'}) [] does not? Of course you should do: >>> root.xpath("//foo:a", {'foo':'my/foo/bar/URI'}) ['what?'] >>> Seems like an empty string prefix in an XPath-nsmap does not have the desired effect, basically it is ignored as ns-prefix. I notice that None is explicitly disallowed in an XPath-nsmap argument: >>> root.xpath("//a", {None:'my/foo/bar/URI'}) Traceback (most recent call last): File "", line 1, in ? File "etree.pyx", line 1042, in etree._Element.xpath File "xpath.pxi", line 222, in etree.XPathElementEvaluator.__init__ File "xpath.pxi", line 102, in etree._XPathEvaluatorBase.__init__ File "xpath.pxi", line 54, in etree._XPathContext.__init__ File "extensions.pxi", line 73, in etree._BaseContext.__init__ TypeError: empty namespace prefix is not supported in XPath >>> Maybe '' should internally get converted to None and thus raise the same error when used in an xpath nsmap argument dictionary? Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail From stefan_ml at behnel.de Wed May 16 10:19:33 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 May 2007 10:19:33 +0200 Subject: [lxml-dev] XPath exceptions In-Reply-To: <20070516065535.229910@gmx.net> References: <4645DEDD.9090604@behnel.de> <20070516065535.229910@gmx.net> Message-ID: <464ABE95.6010700@behnel.de> jholg at gmx.de wrote: > a quick question on nsmaps in XPath: > Is it intentional that this > >>>> root = etree.fromstring("what?") >>>> root.xpath("//a", {'':'my/foo/bar/URI'}) > ['what?'] >>>> root.xpath("//a") > ['what?'] > > but > >>>> root = etree.fromstring("what?") >>>> root.xpath("//a", {'':'my/foo/bar/URI'}) > [] > > does not? No. Passing an empty prefix should raise the same exception as passing None. Fixed on the trunk. Stefan From eric at detede.com Wed May 16 11:07:08 2007 From: eric at detede.com (Eric Garin) Date: Wed, 16 May 2007 11:07:08 +0200 Subject: [lxml-dev] find Message-ID: <221418083.20070516110708@detede.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070516/536cb907/attachment-0001.htm From stefan_ml at behnel.de Wed May 16 11:14:20 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 May 2007 11:14:20 +0200 Subject: [lxml-dev] find In-Reply-To: <221418083.20070516110708@detede.com> References: <221418083.20070516110708@detede.com> Message-ID: <464ACB6C.2060900@behnel.de> Hi, Eric Garin wrote: > Using the find method with a tag with namespace failed > > if node.find("ns:tag"): find*() uses "{qualified}tags". See http://effbot.org/zone/element.htm#xml-namespaces on this. Stefan From lkraider at gmail.com Wed May 16 21:22:32 2007 From: lkraider at gmail.com (Paul Eipper) Date: Wed, 16 May 2007 16:22:32 -0300 Subject: [lxml-dev] case-insensitive xpath search Message-ID: <2ee02670705161222v4ee082c6ie2d4abcbaa2721d2@mail.gmail.com> Hello, I wonder if this is the right place to ask, but I am trying to run a case-insensitive search on a XML using lxml XPath. This is the current search (not case-insensitive): keyword = "what to find" o.xpath( '//*[ contains( @*, "%s" ) ]' % keyword ) how could I make that be a case-insensitive search ? thanks, -- Paul Eipper Brasil From stefan_ml at behnel.de Wed May 16 21:42:34 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 May 2007 21:42:34 +0200 Subject: [lxml-dev] case-insensitive xpath search In-Reply-To: <2ee02670705161222v4ee082c6ie2d4abcbaa2721d2@mail.gmail.com> References: <2ee02670705161222v4ee082c6ie2d4abcbaa2721d2@mail.gmail.com> Message-ID: <464B5EAA.40400@behnel.de> Paul Eipper wrote: > I wonder if this is the right place to ask, but I am trying to run a > case-insensitive search on a XML using lxml XPath. > > This is the current search (not case-insensitive): > > keyword = "what to find" > o.xpath( '//*[ contains( @*, "%s" ) ]' % keyword ) > > how could I make that be a case-insensitive search ? It will be easy to do in lxml 1.3, as it will support regexps. http://codespeak.net/lxml/dev/xpathxslt.html#the-xpath-class http://www.exslt.org/regexp/index.html If you don't want to wait: have you tried if the translate() function from XSLT is available? Stefan From stefan_ml at behnel.de Wed May 16 22:40:05 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 May 2007 22:40:05 +0200 Subject: [lxml-dev] first take on an lxml.etree tutorial Message-ID: <464B6C25.9030304@behnel.de> Hi everyone, I finally found some time to get started on a tutorial for lxml.etree. http://codespeak.net/lxml/dev/tutorial.html http://codespeak.net/svn/lxml/trunk/doc/tutorial.txt The intention is to give beginners (regarding etree, ElementTree and to a certain extent even XML) an idea about how lxml.etree works and what features can help them in getting their problems solved. I happily borrowed ideas from Fredrik's ET tutorial. Still, there's lots of stuff missing, so if someone feels ambitious and finds some time over the week-end, I'd be glad to add another name to the list of authors. :) Everything from fixes and remarks to readily written sections will be very much appreciated. The source for the HTML page is the ReST text file above. Hoping for some helpful hands, Stefan From jholg at gmx.de Thu May 17 18:34:01 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Thu, 17 May 2007 18:34:01 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? Message-ID: <20070517163401.76590@gmx.net> Hi Stefan, couldn't respond earlier as I have no svn access at work currently. I've tested your changes and they work just perfect for me. Find attached a little patch that adds some information on this topic to the objectify docs, and a test method also. Thanks, Holger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070517/59d45dc7/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/x-patch Size: 5775 bytes Desc: attachment Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070517/59d45dc7/attachment.bin From lkraider at gmail.com Thu May 17 20:11:32 2007 From: lkraider at gmail.com (Paul Eipper) Date: Thu, 17 May 2007 15:11:32 -0300 Subject: [lxml-dev] Question o xpath Message-ID: <2ee02670705171111n3f44f5e3x74e346122472ae30@mail.gmail.com> Hello again :) I'm hitting an issue... hope someone can help me. Say I have this data: ***snip*** 6:1, 128 kbps#44100Hz, Joint stereo ***snip*** if I do a xpath search like this, I get a result: >>> o.xpath('//*[ contains( @*, "Andy" ) ]' ) [] but if I try to search for this string: >>> o.xpath('//*[ contains( @*, "SimCity" ) ]' ) [] ...I get no result. Is the problem on the xpath query ? What am I missing here ? I thought " @* " is supposed to look at all tag attributes ? thanks, -- Paul Eipper Brasil From etiffany at alum.mit.edu Fri May 18 16:16:10 2007 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Fri, 18 May 2007 10:16:10 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope Message-ID: I have been prototyping some XMLSchema parsing/validating using lxml 1.3beta. Everthing works great from python 2.4.4 started from the command line, or running from inside Eclipse. However, when I moved my code over to my Plone product, python crashes when Zope is initializing the product. I am creating my XMLSchema object there. THe code is essentially the same, running under the same python and with the same libs (afaik). Some earlier attempts (with python 2.4.3) gave me this error: python(11139) malloc: *** Deallocation of a pointer not malloced: 0x80; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug I'm not seeing that now, as I have "upgraded" to python 2.4.4 which seems to be stripped or something. I don't have a test case immediately available, but here is the stack backtrace from the python crash log: OS Version: 10.4.9 (Build 8P2137) Report Version: 4 Command: Python Path: /opt/local/Library/Frameworks/Python.framework/Versions/2.4/Resources/Python .app/Contents/MacOS/Python Parent: bash [11055] Version: 2.4a0 (2.4alpha1) PID: 11135 Thread: 0 Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000001 Thread 0 Crashed: 0 libxml2.2.dylib 0x033f6294 xmlDictFree + 45 1 libxml2.2.dylib 0x033f62e2 xmlDictFree + 123 2 libxml2.2.dylib 0x033f62e2 xmlDictFree + 123 3 etree.so 0x06406a08 __pyx_f_5etree_14_ParserContext_initThreadDictRef + 63 (etree.c:19725) 4 etree.so 0x06406a5b __pyx_f_5etree_14_ParserContext_initParserDict + 35 (etree.c:19740) 5 etree.so 0x06431bca __pyx_f_5etree_11_BaseParser__parseDocFromFile + 99 (etree.c:21340) 6 etree.so 0x0646a8f4 __pyx_f_5etree__parseDocument + 395 (etree.c:22486) 7 etree.so 0x0646cc50 __pyx_f_5etree_parse + 176 (etree.c:10300) 8 org.python.python 0x0027faca PyEval_EvalFrame + 22777 (ceval.c:3568) 9 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 (ceval.c:2741) 10 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 (ceval.c:3661) 11 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 (ceval.c:2741) 12 org.python.python 0x0027e49f PyEval_EvalFrame + 17102 (ceval.c:3661) 13 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 14 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 15 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 16 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 17 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 18 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 19 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 20 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 21 org.python.python 0x0027ebaa PyEval_EvalFrame + 18905 (ceval.c:3651) 22 org.python.python 0x00280665 PyEval_EvalCodeEx + 1774 (ceval.c:2741) 23 org.python.python 0x002808a5 PyEval_EvalCode + 87 (ceval.c:490) 24 org.python.python 0x002a75fd PyRun_FileExFlags + 200 (pythonrun.c:1285) 25 org.python.python 0x002a78e0 PyRun_SimpleFileExFlags + 640 (pythonrun.c:869) 26 org.python.python 0x002b13ec Py_Main + 3199 (main.c:493) 27 org.python.python 0x00001fae 0x1000 + 4014 28 org.python.python 0x00001ed5 0x1000 + 3797 Thread 0 crashed with X86 Thread State (32-bit): eax: 0x00000000 ebx: 0x033f6275 ecx: 0x00000000 edx: 0x00000000 edi: 0x00000001 esi: 0x05856fd0 ebp: 0xbfffcfe8 esp: 0xbfffcfb0 ss: 0x0000001f efl: 0x00010286 eip: 0x033f6294 cs: 0x00000017 ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037 Binary Images Description: 0x1000 - 0x2fff org.python.python 2.4a0 (2.4alpha1) /opt/local/Library/Frameworks/Python.framework/Versions/2.4/Resources/Python .app/Contents/MacOS/Python 0xa0000 - 0xa1fff icglue.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/icglue.so 0xb8000 - 0xb9fff time.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/time.so 0xc3000 - 0xc6fff strop.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/strop.so 0x205000 - 0x2d9fff org.python.python 2.4a0 (2.2) /opt/local/Library/Frameworks/Python.framework/Versions/2.4/Python 0x575000 - 0x576fff cStringIO.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/cStringIO.so 0x580000 - 0x581fff collections.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/collections.so 0x5cc000 - 0x5d2fff _socket.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_socket.so 0x5e7000 - 0x5e8fff _ssl.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_ssl.so 0x705000 - 0x737fff libssl.0.9.8.dylib /opt/local/lib/libssl.0.9.8.dylib 0x74b000 - 0x75cfff libz.1.dylib /opt/local/lib/libz.1.dylib 0x761000 - 0x763fff struct.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/struct.so 0x76f000 - 0x771fff binascii.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/binascii.so 0x7bc000 - 0x7bdfff math.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/math.so 0x7c5000 - 0x7c6fff _random.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_random.so 0x7cd000 - 0x7cefff fcntl.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/fcntl.so 0x7d6000 - 0x7d7fff md5.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/md5.so 0x7de000 - 0x7dffff sha.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/sha.so 0x7e6000 - 0x7e6fff _bisect.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_bisect.so 0x1008000 - 0x10f7fff libcrypto.0.9.8.dylib /opt/local/lib/libcrypto.0.9.8.dylib 0x11a0000 - 0x11abfff datetime.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/datetime.so 0x120e000 - 0x1211fff array.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/array.so 0x1222000 - 0x1222fff grp.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/grp.so 0x126e000 - 0x127bfff cPickle.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/cPickle.so 0x129a000 - 0x129dfff _Res.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_Res.so 0x12ac000 - 0x12b2fff _File.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_File.so 0x12cb000 - 0x12ccfff MacOS.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/MacOS.so 0x1348000 - 0x1374fff pyexpat.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/pyexpat.so 0x1427000 - 0x1427fff _weakref.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_weakref.so 0x142d000 - 0x142dfff _initgroups.so /Applications/Plone-2.5.2/lib/python/initgroups/_initgroups.so 0x1433000 - 0x1433fff crypt.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/crypt.so 0x1439000 - 0x143afff _heapq.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_heapq.so 0x1442000 - 0x1442fff _Missing.so /Applications/Plone-2.5.2/lib/python/Missing/_Missing.so 0x144e000 - 0x1450fff cPersistence.so /Applications/Plone-2.5.2/lib/python/persistent/cPersistence.so 0x145b000 - 0x145cfff TimeStamp.so /Applications/Plone-2.5.2/lib/python/persistent/TimeStamp.so 0x1463000 - 0x1464fff cPickleCache.so /Applications/Plone-2.5.2/lib/python/persistent/cPickleCache.so 0x146e000 - 0x146ffff _zope_interface_coptimizations.so /Applications/Plone-2.5.2/lib/python/zope/interface/_zope_interface_coptimiz ations.so 0x1477000 - 0x1477fff _MultiMapping.so /Applications/Plone-2.5.2/lib/python/MultiMapping/_MultiMapping.so 0x14fe000 - 0x1502fff cAccessControl.so /Applications/Plone-2.5.2/lib/python/AccessControl/cAccessControl.so 0x1510000 - 0x1512fff _ExtensionClass.so /Applications/Plone-2.5.2/lib/python/ExtensionClass/_ExtensionClass.so 0x151c000 - 0x1520fff _Acquisition.so /Applications/Plone-2.5.2/lib/python/Acquisition/_Acquisition.so 0x152e000 - 0x152ffff _Record.so /Applications/Plone-2.5.2/lib/python/Record/_Record.so 0x1537000 - 0x1539fff cDocumentTemplate.so /Applications/Plone-2.5.2/lib/python/DocumentTemplate/cDocumentTemplate.so 0x1585000 - 0x1591fff parser.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/parser.so 0x15e8000 - 0x15eafff zlib.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/zlib.so 0x1633000 - 0x1635fff operator.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/operator.so 0x167f000 - 0x1681fff _zope_proxy_proxy.so /Applications/Plone-2.5.2/lib/python/zope/proxy/_zope_proxy_proxy.so 0x168f000 - 0x1691fff itertools.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/itertools.so 0x16a0000 - 0x16a0fff _zope_i18nmessageid_message.so /Applications/Plone-2.5.2/lib/python/zope/i18nmessageid/_zope_i18nmessageid_ message.so 0x16e7000 - 0x16e7fff _zope_thread.so /Applications/Plone-2.5.2/lib/python/zope/thread/_zope_thread.so 0x16ee000 - 0x16f1fff _proxy.so /Applications/Plone-2.5.2/lib/python/zope/security/_proxy.so 0x1701000 - 0x1702fff _zope_security_checker.so /Applications/Plone-2.5.2/lib/python/zope/security/_zope_security_checker.so 0x174a000 - 0x174afff _zope_hookable.so /Applications/Plone-2.5.2/lib/python/zope/hookable/_zope_hookable.so 0x1790000 - 0x1790fff _ComputedAttribute.so /Applications/Plone-2.5.2/lib/python/ComputedAttribute/_ComputedAttribute.so 0x17d6000 - 0x17d9fff _zope_app_container_contained.so /Applications/Plone-2.5.2/lib/python/zope/app/container/_zope_app_container_ contained.so 0x17e8000 - 0x17e8fff _Persistence.so /Applications/Plone-2.5.2/lib/python/Persistence/_Persistence.so 0x17ee000 - 0x17eefff _MethodObject.so /Applications/Plone-2.5.2/lib/python/MethodObject/_MethodObject.so 0x17f4000 - 0x17f5fff select.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/select.so 0x20c8000 - 0x20d3fff _OOBTree.so /Applications/Plone-2.5.2/lib/python/BTrees/_OOBTree.so 0x20f0000 - 0x20f0fff stopper.so /Applications/Plone-2.5.2/lib/python/Products/ZCTextIndex/stopper.so 0x20f6000 - 0x20f6fff okascore.so /Applications/Plone-2.5.2/lib/python/Products/ZCTextIndex/okascore.so 0x2245000 - 0x2250fff _OIBTree.so /Applications/Plone-2.5.2/lib/python/BTrees/_OIBTree.so 0x226d000 - 0x2278fff _IOBTree.so /Applications/Plone-2.5.2/lib/python/BTrees/_IOBTree.so 0x2297000 - 0x22a3fff _IIBTree.so /Applications/Plone-2.5.2/lib/python/BTrees/_IIBTree.so 0x2abd000 - 0x2ae7fff _imaging.so /opt/local/lib/python2.4/site-packages/PIL/_imaging.so 0x2b79000 - 0x2b94fff libjpeg.62.dylib /opt/local/lib/libjpeg.62.dylib 0x2be2000 - 0x2be4fff _csv.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/_csv.so 0x2bf0000 - 0x2bf0fff _ThreadLock.so /Applications/Plone-2.5.2/lib/python/ThreadLock/_ThreadLock.so 0x2f85000 - 0x2f87fff unicodedata.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/li b-dynload/unicodedata.so 0x3285000 - 0x32a8fff libxml2mod.so /opt/local/lib/python2.4/site-packages/libxml2mod.so 0x3337000 - 0x3427fff libxml2.2.dylib /opt/local/lib/libxml2.2.dylib 0x3456000 - 0x354bfff libiconv.2.dylib /opt/local/lib/libiconv.2.dylib 0x3c05000 - 0x3c10fff _fsBTree.so /Applications/Plone-2.5.2/lib/python/BTrees/_fsBTree.so 0x52eb000 - 0x52ecfff ZopeSplitter.so /Applications/Plone-2.5.2/lib/python/Products/PluginIndexes/TextIndex/Splitt er/ZopeSplitter/ZopeSplitter.so 0x56c5000 - 0x56d0fff libexslt.0.dylib /opt/local/lib/libexslt.0.dylib 0x5a05000 - 0x5a2efff libxslt.1.dylib /opt/local/lib/libxslt.1.dylib 0x6405000 - 0x6477fff etree.so /opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/si te-packages/lxml-1.3beta-py2.4-macosx-10.4-i386.egg/lxml/etree.so 0x8fe00000 - 0x8fe4afff dyld 46.12 /usr/lib/dyld 0x90000000 - 0x90170fff libSystem.B.dylib /usr/lib/libSystem.B.dylib 0x901c0000 - 0x901c2fff libmathCommon.A.dylib /usr/lib/system/libmathCommon.A.dylib 0x901c4000 - 0x90201fff com.apple.CoreText 1.1.2 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/CoreText.framework/Versions/A/CoreText 0x90228000 - 0x902fefff ATS /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ATS.framework/Versions/A/ATS 0x9031e000 - 0x90773fff com.apple.CoreGraphics 1.258.61 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/CoreGraphics.framework/Versions/A/CoreGraphics 0x9080a000 - 0x908d2fff com.apple.CoreFoundation 6.4.7 (368.28) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundatio n 0x90910000 - 0x90910fff com.apple.CoreServices 10.4 (???) /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices 0x90912000 - 0x90a05fff libicucore.A.dylib /usr/lib/libicucore.A.dylib 0x90a55000 - 0x90ad4fff libobjc.A.dylib /usr/lib/libobjc.A.dylib 0x90afd000 - 0x90b61fff libstdc++.6.dylib /usr/lib/libstdc++.6.dylib 0x90bd0000 - 0x90bd7fff libgcc_s.1.dylib /usr/lib/libgcc_s.1.dylib 0x90bdc000 - 0x90c4ffff com.apple.framework.IOKit 1.4.6 (???) /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit 0x90c64000 - 0x90c76fff libauto.dylib /usr/lib/libauto.dylib 0x90c7c000 - 0x90f22fff com.apple.CoreServices.CarbonCore 682.21 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Carb onCore.framework/Versions/A/CarbonCore 0x90f65000 - 0x90fcdfff com.apple.CoreServices.OSServices 4.1 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/OSSe rvices.framework/Versions/A/OSServices 0x91006000 - 0x91044fff com.apple.CFNetwork 129.20 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CFNe twork.framework/Versions/A/CFNetwork 0x91057000 - 0x91067fff com.apple.WebServices 1.1.3 (1.1.0) /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/WebS ervicesCore.framework/Versions/A/WebServicesCore 0x91072000 - 0x910f1fff com.apple.SearchKit 1.0.5 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Sear chKit.framework/Versions/A/SearchKit 0x9112b000 - 0x91149fff com.apple.Metadata 10.4.4 (121.36) /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Meta data.framework/Versions/A/Metadata 0x91155000 - 0x91163fff libz.1.dylib /usr/lib/libz.1.dylib 0x91166000 - 0x91305fff com.apple.security 4.5.2 (29774) /System/Library/Frameworks/Security.framework/Versions/A/Security 0x91403000 - 0x9140bfff com.apple.DiskArbitration 2.1.1 /System/Library/Frameworks/DiskArbitration.framework/Versions/A/DiskArbitrat ion 0x91412000 - 0x91419fff libbsm.dylib /usr/lib/libbsm.dylib 0x9141d000 - 0x91443fff com.apple.SystemConfiguration 1.8.6 /System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemCo nfiguration 0x91455000 - 0x914cbfff com.apple.audio.CoreAudio 3.0.4 /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio 0x9151c000 - 0x9151cfff com.apple.ApplicationServices 10.4 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Applicat ionServices 0x9151e000 - 0x9154afff com.apple.AE 314 (313) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/AE.framework/Versions/A/AE 0x9155d000 - 0x91631fff com.apple.ColorSync 4.4.9 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ColorSync.framework/Versions/A/ColorSync 0x9166c000 - 0x916dffff com.apple.print.framework.PrintCore 4.6 (177.13) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/PrintCore.framework/Versions/A/PrintCore 0x9170d000 - 0x917b6fff com.apple.QD 3.10.24 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/QD.framework/Versions/A/QD 0x917dc000 - 0x91827fff com.apple.HIServices 1.5.2 (???) /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/HIServices.framework/Versions/A/HIServices 0x91846000 - 0x9185cfff com.apple.LangAnalysis 1.6.3 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/LangAnalysis.framework/Versions/A/LangAnalysis 0x91868000 - 0x91883fff com.apple.FindByContent 1.5 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/FindByContent.framework/Versions/A/FindByContent 0x9188e000 - 0x918cbfff com.apple.LaunchServices 182 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/LaunchServices.framework/Versions/A/LaunchServices 0x918df000 - 0x918ebfff com.apple.speech.synthesis.framework 3.5 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/SpeechSynthesis.framework/Versions/A/SpeechSynthesis 0x918f2000 - 0x91931fff com.apple.ImageIO.framework 1.5.4 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/ImageIO 0x91944000 - 0x919f6fff libcrypto.0.9.7.dylib /usr/lib/libcrypto.0.9.7.dylib 0x91a3c000 - 0x91a52fff libcups.2.dylib /usr/lib/libcups.2.dylib 0x91a57000 - 0x91a75fff libJPEG.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libJPEG.dylib 0x91a7a000 - 0x91ad9fff libJP2.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libJP2.dylib 0x91aeb000 - 0x91aeffff libGIF.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libGIF.dylib 0x91af1000 - 0x91b75fff libRaw.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libRaw.dylib 0x91b79000 - 0x91bb6fff libTIFF.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libTIFF.dylib 0x91bbc000 - 0x91bd6fff libPng.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libPng.dylib 0x91bdb000 - 0x91bddfff libRadiance.dylib /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Framewor ks/ImageIO.framework/Versions/A/Resources/libRadiance.dylib 0x91bdf000 - 0x91cbdfff libxml2.2.dylib /usr/lib/libxml2.2.dylib 0x91cda000 - 0x91cdafff com.apple.Accelerate 1.3.1 (Accelerate 1.3.1) /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate 0x91cdc000 - 0x91d6afff com.apple.vImage 2.5 /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage .framework/Versions/A/vImage 0x91d71000 - 0x91d71fff com.apple.Accelerate.vecLib 3.3.1 (vecLib 3.3.1) /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib .framework/Versions/A/vecLib 0x91d73000 - 0x91dccfff libvMisc.dylib /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib .framework/Versions/A/libvMisc.dylib 0x91dd5000 - 0x91df9fff libvDSP.dylib /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib .framework/Versions/A/libvDSP.dylib 0x91e01000 - 0x9220afff libBLAS.dylib /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib .framework/Versions/A/libBLAS.dylib 0x92244000 - 0x925f8fff libLAPACK.dylib /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib .framework/Versions/A/libLAPACK.dylib 0x92625000 - 0x92712fff libiconv.2.dylib /usr/lib/libiconv.2.dylib 0x92714000 - 0x92791fff com.apple.DesktopServices 1.3.6 /System/Library/PrivateFrameworks/DesktopServicesPriv.framework/Versions/A/D esktopServicesPriv 0x927d2000 - 0x92a02fff com.apple.Foundation 6.4.8 (567.29) /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation 0x92baa000 - 0x92baafff com.apple.Carbon 10.4 (???) /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon 0x92bac000 - 0x92bbcfff com.apple.ImageCapture 3.0.4 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/ImageCaptu re.framework/Versions/A/ImageCapture 0x92bcb000 - 0x92bd3fff com.apple.speech.recognition.framework 3.6 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/SpeechReco gnition.framework/Versions/A/SpeechRecognition 0x92bd9000 - 0x92bdffff com.apple.securityhi 2.0.1 (24742) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/SecurityHI .framework/Versions/A/SecurityHI 0x92be5000 - 0x92c76fff com.apple.ink.framework 101.2.1 (71) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Ink.framew ork/Versions/A/Ink 0x92c8a000 - 0x92c8efff com.apple.help 1.0.3 (32.1) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Help.frame work/Versions/A/Help 0x92c91000 - 0x92caffff com.apple.openscripting 1.2.5 (???) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/OpenScript ing.framework/Versions/A/OpenScripting 0x92cc1000 - 0x92cc7fff com.apple.print.framework.Print 5.2 (192.4) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Print.fram ework/Versions/A/Print 0x92ccd000 - 0x92d30fff com.apple.htmlrendering 66.1 (1.1.3) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HTMLRender ing.framework/Versions/A/HTMLRendering 0x92d57000 - 0x92d98fff com.apple.NavigationServices 3.4.4 (3.4.3) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Navigation Services.framework/Versions/A/NavigationServices 0x92dbf000 - 0x92dcdfff com.apple.audio.SoundManager 3.9.1 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/CarbonSoun d.framework/Versions/A/CarbonSound 0x92dd4000 - 0x92dd9fff com.apple.CommonPanels 1.2.3 (73) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/CommonPane ls.framework/Versions/A/CommonPanels 0x92dde000 - 0x930d3fff com.apple.HIToolbox 1.4.9 (???) /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox. framework/Versions/A/HIToolbox -- From hongqn at gmail.com Fri May 18 18:32:18 2007 From: hongqn at gmail.com (Qiangning Hong) Date: Sat, 19 May 2007 00:32:18 +0800 Subject: [lxml-dev] etree.tostring generate invalid XML? Message-ID: >>> from lxml import etree >>> e = lxml.etree.Element('root') >>> e.text = u'\x08' >>> xml = etree.tostring(e, 'utf8') >>> xml '\x08' >>> etree.XML(xml) >>> etree.XML(xml) Traceback (most recent call last): File "", line 1, in File "etree.pyx", line 1749, in etree.XML File "parser.pxi", line 934, in etree._parseMemoryDocument File "parser.pxi", line 830, in etree._parseDoc File "parser.pxi", line 516, in etree._BaseParser._parseDoc File "parser.pxi", line 619, in etree._handleParseResult File "parser.pxi", line 590, in etree._raiseParseError etree.XMLSyntaxError: line 1: PCDATA invalid Char value 8 Shouldn't xml be '' ? Is it a bug of lxml? -- Qiangning Hong http://www.douban.com/people/hongqn/ From stefan_ml at behnel.de Sun May 20 09:11:57 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 20 May 2007 09:11:57 +0200 Subject: [lxml-dev] Question o xpath In-Reply-To: <2ee02670705171111n3f44f5e3x74e346122472ae30@mail.gmail.com> References: <2ee02670705171111n3f44f5e3x74e346122472ae30@mail.gmail.com> Message-ID: <464FF4BD.70308@behnel.de> Hi, Paul Eipper wrote: > ***snip*** > time="Dec 13 13:36 2006"> > 6:1, 128 kbps#44100Hz, Joint stereo > album="SimCity 4 Rush Hour Soundtrack" year="2003"> > > ***snip*** > > if I do a xpath search like this, I get a result: > >>>> o.xpath('//*[ contains( @*, "Andy" ) ]' ) > [] > > but if I try to search for this string: >>>> o.xpath('//*[ contains( @*, "SimCity" ) ]' ) > [] > > ...I get no result. > > Is the problem on the xpath query ? What am I missing here ? I thought > " @* " is supposed to look at all tag attributes ? The problem is that the contains() function requires a string as parameter, not a node set (such as a set of attributes, as in your example). The way XPath converts a node set to a string is: take the first entry and convert that to a string. In your case, this happens to hit the "artist" attribute, not the "album" attribute (no guarantee here!). Consider reading a good book or some other documentation on XPath, to learn the full details. This expression should do what you want: //*[ @*[contains(., $whatyouwant) ]] then set "whatyouwant" to whatever you are looking for during an evaluation. Stefan From stefan_ml at behnel.de Sun May 20 09:25:42 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 20 May 2007 09:25:42 +0200 Subject: [lxml-dev] etree.tostring generate invalid XML? In-Reply-To: References: Message-ID: <464FF7F6.3080004@behnel.de> Hi, without looking it up, I don't think this is a bug (and definitely not in lxml). The XML spec simply forbids certain characters in serialised XML. Qiangning Hong wrote: > >>> from lxml import etree > >>> e = lxml.etree.Element('root') > >>> e.text = u'\x08' > >>> xml = etree.tostring(e, 'utf8') > >>> xml > '\x08' Don't tell me you didn't expect that. :) >>>> etree.XML(xml) Interesting, no output here? >>>> etree.XML(xml) > Traceback (most recent call last): > File "", line 1, in > File "etree.pyx", line 1749, in etree.XML > File "parser.pxi", line 934, in etree._parseMemoryDocument > File "parser.pxi", line 830, in etree._parseDoc > File "parser.pxi", line 516, in etree._BaseParser._parseDoc > File "parser.pxi", line 619, in etree._handleParseResult > File "parser.pxi", line 590, in etree._raiseParseError > etree.XMLSyntaxError: line 1: PCDATA invalid Char value 8 > > Shouldn't xml be '' ? Is it a bug of lxml? When you're dealing with binary data in XML, you should always encode it in a way that makes it 'XML compatible', such as uuencode, base64 or what ever. If you want, you can ask on the libxml2 mailing list, but I doubt they'll tell you anything different. You might get an answer, though, that gives you a bit more of insight into what goes on. Stefan From stefan_ml at behnel.de Sun May 20 10:33:19 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 20 May 2007 10:33:19 +0200 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: Message-ID: <465007CF.2090405@behnel.de> Hi, from the stack traces I can see that you are on MacOS-X. Could you check which libxml2 version you are using? Just run "test.py" from the source distribution or look at "lxml.etree.LIBXML*". There are known issues with older versions of libxml2, so especially when you are using XML-Schema, you should care for installing a recent version. Stefan From etiffany at alum.mit.edu Sun May 20 21:29:42 2007 From: etiffany at alum.mit.edu (Eric Tiffany) Date: Sun, 20 May 2007 15:29:42 -0400 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: <303c45610705201224i4e2ba2dctecff6ae67adbe1e4@mail.gmail.com> References: <465007CF.2090405@behnel.de> <303c45610705201224i4e2ba2dctecff6ae67adbe1e4@mail.gmail.com> Message-ID: <303c45610705201229p43f3fba5n5cc24f45986f00ca@mail.gmail.com> Thanks, I am using the latest "darwinports" version of libxml2, which I just updated -- this problem occurs with the version below. Note that this problem doesn't happen with the same code run from the python shell -- only on when used inside Zope. [sorry for the repeat post -- sent from wrong acct first time] ET >>> lxml.etree.LIBXML_VERSION (2, 6, 28) >>> lxml.etree.LIBXML_COMPILED_VERSION (2, 6, 28) On 5/20/07, Stefan Behnel wrote: > Hi, > > from the stack traces I can see that you are on MacOS-X. Could you check which > libxml2 version you are using? Just run "test.py" from the source distribution > or look at "lxml.etree.LIBXML*". There are known issues with older versions of > libxml2, so especially when you are using XML-Schema, you should care for > installing a recent version. > > Stefan > From stefan_ml at behnel.de Mon May 21 06:47:12 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 21 May 2007 06:47:12 +0200 Subject: [lxml-dev] [objectify] schema type registry: QNames for xsi:type? In-Reply-To: <20070517163401.76590@gmx.net> References: <20070517163401.76590@gmx.net> Message-ID: <46512450.5070007@behnel.de> Hi Holger, jholg at gmx.de wrote: > couldn't respond earlier as I have no svn access at work currently. > I've tested your changes and they work just perfect for me. > > Find attached a little patch that adds some information on this topic > to the objectify docs, and a test method also. thanks for the doc patch. I had to beautify it in a couple of places (after all, it's more of a documentation thing than a test), and that brought me to a larger restructuring of the existing page (it now actually *has* a structure). Thanks again, Stefan From stefan_ml at behnel.de Mon May 21 08:21:11 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 21 May 2007 08:21:11 +0200 Subject: [lxml-dev] python crashes in xmlDictFree inside Zope In-Reply-To: References: Message-ID: <46513A57.2050405@behnel.de> Hi, Eric Tiffany wrote: > I have been prototyping some XMLSchema parsing/validating using lxml > 1.3beta. > > Everthing works great from python 2.4.4 started from the command line, or > running from inside Eclipse. > > However, when I moved my code over to my Plone product, python crashes when > Zope is initializing the product. I am creating my XMLSchema object there. > > Some earlier attempts (with python 2.4.3) gave me this error: > > python(11139) malloc: *** Deallocation of a pointer not malloced: 0x80; This > could be a double free(), or free() called with the middle of an allocated > block; Try setting environment variable MallocHelp to see tools to help > debug > > OS Version: 10.4.9 (Build 8P2137) > Report Version: 4 > > Version: 2.4a0 (2.4alpha1) Is this the Python version? I'm asking because you said it was 2.4.3. lxml.etree behaves different for Python <= 2.4.2, as there are known bugs with threading in earlier versions. If you're sure you can reproduce the *same* bug with a version older than 2.4.2 (and libxml 2.6.28, as you also mentioned), that would completely shift the focus of the bug hunt. Is there any way to detect MacOS-X at the C level? In that case, we could try to disable thread concurrency support completely for this platform - in case that's the source of the segfault. You can try to see if this would fix the problem by passing the option "--without-threading" to setup.py when building lxml. Could you please try that with your current setup and report back to the list? Another question: are you using a custom parser (i.e. passing a second argument to the parse() function) here or is it the default parser that crashes here? Stefan From sidnei at enfoldsystems.com Mon May 21 16:08:13 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Mon, 21 May 2007 11:08:13 -0300 Subject: [lxml-dev] Resolvers not passed on to sub-documents? Message-ID: This might or might not have been fixed recently. I am using lxml 1.2. I'm writing a self-contained test for this. If you have a XSLT that includes another XSLT, which in turn includes a third one, the resolvers doesn't seem to be passed on. theme.xsl: xsl:import sub.xsl sub.xsl: xsl:import common.xsl The custom resolver I passed when parsing theme.xsl is used to resolve 'sub.xsl' but not to resolve 'common.xsl'. I remember discussing a similar issue, maybe this is a problem with libxml2? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Mon May 21 16:28:01 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 21 May 2007 16:28:01 +0200 Subject: [lxml-dev] Resolvers not passed on to sub-documents? In-Reply-To: References: Message-ID: <4651AC71.9080608@behnel.de> Hi, Sidnei da Silva wrote: > This might or might not have been fixed recently. I am using lxml 1.2. > I'm writing a self-contained test for this. > > If you have a XSLT that includes another XSLT, which in turn includes > a third one, the resolvers doesn't seem to be passed on. > > theme.xsl: > xsl:import sub.xsl > > sub.xsl: > xsl:import common.xsl > > The custom resolver I passed when parsing theme.xsl is used to resolve > 'sub.xsl' but not to resolve 'common.xsl'. That's quite possible as keeping track of parsed documents in XSLT isn't the most simple thing on earth. We had the same problem with XInclude (where nothing would help), but it should be fixable for XSLT. Could you check if the attached (and completely untested) patch fixes the problem? It's against the trunk, but should apply to 1.2 also. In case this patch helps, 'mind coming up with a test case for it? That could easily convince me that I should include it in lxml 1.3. :) Stefan Index: src/lxml/xslt.pxi =================================================================== --- src/lxml/xslt.pxi (Revision 43508) +++ src/lxml/xslt.pxi (Arbeitskopie) @@ -141,6 +141,8 @@ c_doc = _xslt_resolve_stylesheet(c_uri, c_pcontext) if c_doc is not NULL: python.PyGILState_Release(gil_state) + if c_type == xslt.XSLT_LOAD_STYLESHEET: + c_doc._private = c_pcontext return c_doc c_doc = _xslt_resolve_from_python(c_uri, c_pcontext, parse_options, &error) @@ -151,6 +153,8 @@ _xslt_store_resolver_exception(c_uri, c_pcontext, c_type) python.PyGILState_Release(gil_state) + if c_doc is not NULL and c_type == xslt.XSLT_LOAD_STYLESHEET: + c_doc._private = c_pcontext return c_doc cdef xslt.xsltDocLoaderFunc XSLT_DOC_DEFAULT_LOADER From sidnei at enfoldsystems.com Mon May 21 17:41:00 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Mon, 21 May 2007 12:41:00 -0300 Subject: [lxml-dev] Resolvers not passed on to sub-documents? In-Reply-To: <4651AC71.9080608@behnel.de> References: <4651AC71.9080608@behnel.de> Message-ID: Yup! That fixes it! I will come up with a test to add to the test suite. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From sidnei at enfoldsystems.com Tue May 22 06:42:47 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Tue, 22 May 2007 01:42:47 -0300 Subject: [lxml-dev] Building LXML Trunk Message-ID: Hi, I've tried to build lxml from trunk today, on Win32. Got the following error: src\lxml\etree.c(880) : error C2059: syntax error : ')' src\lxml\etree.c(881) : error C2059: syntax error : ')' src\lxml\etree.c(882) : error C2059: syntax error : ')' src\lxml\etree.c(883) : error C2059: syntax error : ')' I'm attaching etree.c. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 -------------- next part -------------- A non-text attachment was scrubbed... Name: etree.c.bz2 Type: application/x-bzip2 Size: 133412 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070522/69e0a303/attachment-0001.bin From sidnei at enfoldsystems.com Tue May 22 19:56:41 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Tue, 22 May 2007 14:56:41 -0300 Subject: [lxml-dev] Building LXML Trunk Message-ID: Hi, I've tried to build lxml from trunk today, on Win32. Got the following error: src\lxml\etree.c(880) : error C2059: syntax error : ')' src\lxml\etree.c(881) : error C2059: syntax error : ')' src\lxml\etree.c(882) : error C2059: syntax error : ')' src\lxml\etree.c(883) : error C2059: syntax error : ')' Any clue? Smells like a Pyrex issue? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From cz at gocept.com Wed May 23 08:50:39 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Wed, 23 May 2007 08:50:39 +0200 Subject: [lxml-dev] Bug in objectify node[:].index Message-ID: Hi, following little script fails at the last assert: ---------------------------- import lxml.objectify tree = lxml.objectify.fromstring( """\ foo bar foo bar """) trees = tree.findall('//a[@name="tree"]') print trees foo_tree = trees[0] assert foo_tree.get('name') == 'tree' parent = foo_tree.getparent() assert parent.tag == 'root' node_list = parent[foo_tree.tag] import pdb; pdb.set_trace() foo_index = node_list[:].index(foo_tree) assert foo_index == 3, foo_index # FAILS: foo_index is 0 ---------------------------- So, fo_index == 0. Which is foo. Apparently the .index only looks at the text or something?! Anyway, all I *actually* want is to remove the nodes found by the xpath. The way you'd think it would be 'normal' doesn't work unfortunately: (Pdb) p parent.index(foo_tree) 2 (Pdb) del parent[2] *** TypeError: deleting items not supported by root element This is obviously because of the sort fo strange list/attribute handling (i.e. parent is parent[0]) But then there is parent.remove(...) which works. Apart from the bug above objectify feels kind of strange sometimes... :/ But then at other times its really nice again :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From jholg at gmx.de Wed May 23 09:44:25 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Wed, 23 May 2007 09:44:25 +0200 Subject: [lxml-dev] Bug in objectify node[:].index In-Reply-To: References: Message-ID: <20070523074425.103190@gmx.net> > Hi, > > following little script fails at the last assert: > > ---------------------------- > import lxml.objectify > > > tree = lxml.objectify.fromstring( > """\ > > foo > bar > foo > bar > > """) > > trees = tree.findall('//a[@name="tree"]') > print trees > > foo_tree = trees[0] > assert foo_tree.get('name') == 'tree' > > parent = foo_tree.getparent() > assert parent.tag == 'root' > > > node_list = parent[foo_tree.tag] > import pdb; pdb.set_trace() > foo_index = node_list[:].index(foo_tree) > assert foo_index == 3, foo_index # FAILS: foo_index is 0 > > ---------------------------- > > So, fo_index == 0. Which is foo. Apparently the > .index only looks at the text or something?! Note that you use [].index, not ObjectifiedElement.index, with the slice you apply: >>> node_list[:].index >>> parent.index >>> node_list[:].index(foo_tree) 0 >>> parent.index(foo_tree) 2 >>> Seems like [].index returns the first list item that compares equal to its argument. As StringElements behave much like strings, this is what happens here, as the first element in your list also has the element.text "foo": >>> node_list[:][0] == foo_tree True >>> node_list[:][1] == foo_tree False >>> node_list[:][2] == foo_tree True >>> >>> print foo_tree.text foo >>> print node_list[:][0].text foo >>> So I'd rather not say this is a bug... > Anyway, all I *actually* want is to remove the nodes found by the > xpath. The way you'd think it would be 'normal' doesn't work > unfortunately: > > > (Pdb) p parent.index(foo_tree) > 2 > (Pdb) del parent[2] > *** TypeError: deleting items not supported by root element > > This is obviously because of the sort fo strange list/attribute > handling (i.e. parent is parent[0]) It is different from the lxml.etree (ElementTree) API but clearly stated in the docs: Indexed access returns siblings (aka "neighbour" elements with the same name) rather than children, and every ObjectifiedElement has list behaviour; unindexed access is just a *shortcut* to retrieve the first sibling. Cheers, Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal f?r Modem und ISDN: http://www.gmx.net/de/go/smartsurfer From cz at gocept.com Wed May 23 15:27:19 2007 From: cz at gocept.com (Christian Zagrodnick) Date: Wed, 23 May 2007 15:27:19 +0200 Subject: [lxml-dev] Bug in objectify node[:].index References: <20070523074425.103190@gmx.net> Message-ID: On 2007-05-23 09:44:25 +0200, jholg at gmx.de said: > >> Hi, >> > >> following little script fails at the last assert: >> > >> ---------------------------- >> import lxml.objectify >> > >> > >> tree = lxml.objectify.fromstring( >> """\ >> >> foo >> bar >> foo >> bar >> >> """) >> > >> trees = tree.findall('//a[@name="tree"]') >> print trees >> > >> foo_tree = trees[0] >> assert foo_tree.get('name') == 'tree' >> > >> parent = foo_tree.getparent() >> assert parent.tag == 'root' >> > >> > >> node_list = parent[foo_tree.tag] >> import pdb; pdb.set_trace() >> foo_index = node_list[:].index(foo_tree) >> assert foo_index == 3, foo_index # FAILS: foo_index is 0 >> > >> ---------------------------- >> > >> So, fo_index == 0. Which is foo. Apparently the > >> .index only looks at the text or something?! > > Note that you use [].index, not ObjectifiedElement.index, with the > slice you apply: > >>>> node_list[:].index > >>>> parent.index > >>>> node_list[:].index(foo_tree) > 0 >>>> parent.index(foo_tree) > 2 >>>> > > > Seems like [].index returns the first list item that compares equal to > its argument. As StringElements behave much like strings, this is what > happens here, as the first element in your list also has the element.text > "foo": >>>> node_list[:][0] == foo_tree > True >>>> node_list[:][1] == foo_tree > False >>>> node_list[:][2] == foo_tree > True >>>> >>>> print foo_tree.text > foo >>>> print node_list[:][0].text > foo >>>> > > > So I'd rather not say this is a bug... Oh I see. Actually I was trying to see if they [0] was equal to [2]. In my case there were not but I probably did something wrong. And as I indeed thought they should be wrong I haven't looked closer. > >> Anyway, all I *actually* want is to remove the nodes found by the > >> xpath. The way you'd think it would be 'normal' doesn't work > >> unfortunately: >> > >> > >> (Pdb) p parent.index(foo_tree) >> 2 >> (Pdb) del parent[2] >> *** TypeError: deleting items not supported by root element >> > >> This is obviously because of the sort fo strange list/attribute > >> handling (i.e. parent is parent[0]) > > It is different from the lxml.etree (ElementTree) API but clearly stated > in the docs: Indexed access returns siblings (aka "neighbour" elements > with the same name) rather than children, and every ObjectifiedElement > has list behaviour; unindexed access is just a *shortcut* to retrieve > the first sib ling. Yeah. In general I even know that. All I'm saying is that if feels strange. Thanks by the way :) -- Christian Zagrodnick gocept gmbh & co. kg ? forsterstrasse 29 ? 06112 halle/saale www.gocept.com ? fon. +49 345 12298894 ? fax. +49 345 12298891 From stefan_ml at behnel.de Wed May 23 21:37:13 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 23 May 2007 21:37:13 +0200 Subject: [lxml-dev] Building LXML Trunk In-Reply-To: References: Message-ID: <465497E9.9090808@behnel.de> Hi Sidnei, Sidnei da Silva wrote: > I've tried to build lxml from trunk today, on Win32. Got the following error: > > src\lxml\etree.c(880) : error C2059: syntax error : ')' > src\lxml\etree.c(881) : error C2059: syntax error : ')' > src\lxml\etree.c(882) : error C2059: syntax error : ')' > src\lxml\etree.c(883) : error C2059: syntax error : ')' > > Any clue? Smells like a Pyrex issue? Looks like it, yes. The problem lies in the following lines: void ((*registerGlobalFunctions)(struct __pyx_obj_5etree__BaseContext *,void (*),int ((void (*),PyObject *,PyObject *)))); void ((*registerLocalFunctions)(struct __pyx_obj_5etree__BaseContext *,void (*),int ((void (*),PyObject *,PyObject *)))); PyObject *((*unregisterAllFunctions)(struct __pyx_obj_5etree__BaseContext *,void (*),int ((void (*),PyObject *,PyObject *)))); PyObject *((*unregisterGlobalFunctions)(struct __pyx_obj_5etree__BaseContext *,void (*),int ((void (*),PyObject *,PyObject *)))); These are new in lxml 1.3. Looks like MS's "C" compiler can't handle that. Any idea how we could get this to work? I mean, without the obvious approach of switching to MinGW. :) Stefan From sidnei at enfoldsystems.com Wed May 23 21:53:34 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Wed, 23 May 2007 16:53:34 -0300 Subject: [lxml-dev] Building LXML Trunk In-Reply-To: <465497E9.9090808@behnel.de> References: <465497E9.9090808@behnel.de> Message-ID: I don't have an idea myself, but I can ask Mark to take a look at it. My shallow C experience is not enough to parse that. :) On 5/23/07, Stefan Behnel wrote: > Hi Sidnei, > > Sidnei da Silva wrote: > > I've tried to build lxml from trunk today, on Win32. Got the following error: > > > > src\lxml\etree.c(880) : error C2059: syntax error : ')' > > src\lxml\etree.c(881) : error C2059: syntax error : ')' > > src\lxml\etree.c(882) : error C2059: syntax error : ')' > > src\lxml\etree.c(883) : error C2059: syntax error : ')' > > > > Any clue? Smells like a Pyrex issue? > > Looks like it, yes. The problem lies in the following lines: > > void ((*registerGlobalFunctions)(struct __pyx_obj_5etree__BaseContext *,void > (*),int ((void (*),PyObject *,PyObject *)))); > void ((*registerLocalFunctions)(struct __pyx_obj_5etree__BaseContext *,void > (*),int ((void (*),PyObject *,PyObject *)))); > PyObject *((*unregisterAllFunctions)(struct __pyx_obj_5etree__BaseContext > *,void (*),int ((void (*),PyObject *,PyObject *)))); > PyObject *((*unregisterGlobalFunctions)(struct __pyx_obj_5etree__BaseContext > *,void (*),int ((void (*),PyObject *,PyObject *)))); > > These are new in lxml 1.3. Looks like MS's "C" compiler can't handle that. > > Any idea how we could get this to work? I mean, without the obvious approach > of switching to MinGW. :) > > Stefan > -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From ianb at colorstudy.com Thu May 24 17:57:08 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 May 2007 10:57:08 -0500 Subject: [lxml-dev] lhtml Message-ID: <4655B5D4.2080101@colorstudy.com> I really want to take all our HTML-related routines and put them into a proper package -- right now they are scattered all over the place. lhtml seems like a nice name for this. I thought it would be good to also place it on codespeak, but I'm not sure where. New top-level? In /lxml/lhtml/(trunk|branches|tags) ? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Thu May 24 18:42:30 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 May 2007 11:42:30 -0500 Subject: [lxml-dev] lhtml In-Reply-To: <4655B5D4.2080101@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> Message-ID: <4655C076.7050901@colorstudy.com> Ian Bicking wrote: > I really want to take all our HTML-related routines and put them into a > proper package And maybe a bit of advice -- we could just do this as a set of functions (what we currently have), or potentially explore objectify and add the routines as methods. E.g., el.find_by_class('classname') This feels like a cleaner API, but I'm worried that it will mean problems when mixing non-objectify-HTML with other elements, and if there's problems with threads or memory overhead, or any other issues. I don't really mind functions, which is why I am unsure; OTOH, almost every function has a first argument of "el", which makes them seem like methods. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From stefan_ml at behnel.de Thu May 24 16:36:44 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 24 May 2007 16:36:44 +0200 Subject: [lxml-dev] Bug in objectify node[:].index In-Reply-To: References: Message-ID: <4655A2FC.1070204@behnel.de> Hi, I only noticed now that your paragraph below was a) well hidden and b) left unanswered. Christian Zagrodnick wrote: > Anyway, all I *actually* want is to remove the nodes found by the > xpath. The way you'd think it would be 'normal' doesn't work > unfortunately: > > (Pdb) p parent.index(foo_tree) > 2 > (Pdb) del parent[2] > *** TypeError: deleting items not supported by root element Sounds pretty inefficient to me, it even has to traverse the children twice. The following is usually much faster, clearer and also works in this case: parent = foo_tree.getparent() if parent is not None: parent.remove(foo_tree) Stefan From stefan_ml at behnel.de Fri May 25 08:10:27 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 25 May 2007 08:10:27 +0200 Subject: [lxml-dev] lhtml In-Reply-To: <4655B5D4.2080101@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> Message-ID: <46567DD3.8090700@behnel.de> Hi Ian, Ian Bicking wrote: > I really want to take all our HTML-related routines and put them into a > proper package -- right now they are scattered all over the place. > lhtml seems like a nice name for this. I thought it would be good to > also place it on codespeak, but I'm not sure where. New top-level? In > /lxml/lhtml/(trunk|branches|tags) ? Are they based on lxml.etree or rather ET compatible? In the first case, I'd make them a module of lxml (part of the same project), in the second case, it might be worth a separate project. Note that lxml already has quite a number of modules like lxml.sax and lxml.objectify, so lxml.html would nicely fit in here. Stefan From stefan_ml at behnel.de Fri May 25 08:14:30 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 25 May 2007 08:14:30 +0200 Subject: [lxml-dev] lhtml In-Reply-To: <4655C076.7050901@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> <4655C076.7050901@colorstudy.com> Message-ID: <46567EC6.2000201@behnel.de> Hi Ian, Ian Bicking wrote: > Ian Bicking wrote: >> I really want to take all our HTML-related routines and put them into a >> proper package > > And maybe a bit of advice -- we could just do this as a set of functions > (what we currently have), or potentially explore objectify and add the > routines as methods. E.g., el.find_by_class('classname') You're not using objectify as a base, are you? I mean, HTML is mainly about text, so objectify will not help you much. > This feels like a cleaner API, but I'm worried that it will mean > problems when mixing non-objectify-HTML with other elements, and if > there's problems with threads or memory overhead, or any other issues. > I don't really mind functions, which is why I am unsure; OTOH, almost > every function has a first argument of "el", which makes them seem like > methods. What about implementing the HTML namespace in a couple of Element subclasses and add the methods where they are appropriate? That sounds like a nice API to me. Any chance you could post your code somewhere so that I could take a look at what you're really contributing here? Stefan From ianb at colorstudy.com Fri May 25 17:34:11 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 May 2007 10:34:11 -0500 Subject: [lxml-dev] lhtml In-Reply-To: <46567DD3.8090700@behnel.de> References: <4655B5D4.2080101@colorstudy.com> <46567DD3.8090700@behnel.de> Message-ID: <465701F3.2060707@colorstudy.com> Stefan Behnel wrote: > Hi Ian, > > Ian Bicking wrote: >> I really want to take all our HTML-related routines and put them into a >> proper package -- right now they are scattered all over the place. >> lhtml seems like a nice name for this. I thought it would be good to >> also place it on codespeak, but I'm not sure where. New top-level? In >> /lxml/lhtml/(trunk|branches|tags) ? > > Are they based on lxml.etree or rather ET compatible? In the first case, I'd > make them a module of lxml (part of the same project), in the second case, it > might be worth a separate project. Many of them use xpath, getparent, or something lxml-specific. Also without lxml.etree.HTML the whole thing is rather academic. > Note that lxml already has quite a number of modules like lxml.sax and > lxml.objectify, so lxml.html would nicely fit in here. I had thought about that, but I don't know if it should have the same release schedule...? It's a somewhat random collection of functions that we've written here and there in other modules. I can clean them up, of course, but exactly what functionality is in there has been an on-demand sort of thing. Which is to say, it's young. OTOH, we could distribute it as a namespace package with its own release cycle but still in lxml.html. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Fri May 25 18:11:12 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 May 2007 11:11:12 -0500 Subject: [lxml-dev] lhtml In-Reply-To: <46567EC6.2000201@behnel.de> References: <4655B5D4.2080101@colorstudy.com> <4655C076.7050901@colorstudy.com> <46567EC6.2000201@behnel.de> Message-ID: <46570AA0.7030705@colorstudy.com> Stefan Behnel wrote: > Hi Ian, > > Ian Bicking wrote: >> Ian Bicking wrote: >>> I really want to take all our HTML-related routines and put them into a >>> proper package >> And maybe a bit of advice -- we could just do this as a set of functions >> (what we currently have), or potentially explore objectify and add the >> routines as methods. E.g., el.find_by_class('classname') > > You're not using objectify as a base, are you? I mean, HTML is mainly about > text, so objectify will not help you much. I'm not using it now, no. But if I used objectify as a base, it would be to add methods like .html_serialize() to elements, or any number of other handy methods. At least "handy" for dealing with the mixed content that HTML has, which is relatively uncommon in other XML. >> This feels like a cleaner API, but I'm worried that it will mean >> problems when mixing non-objectify-HTML with other elements, and if >> there's problems with threads or memory overhead, or any other issues. >> I don't really mind functions, which is why I am unsure; OTOH, almost >> every function has a first argument of "el", which makes them seem like >> methods. > > What about implementing the HTML namespace in a couple of Element subclasses > and add the methods where they are appropriate? That sounds like a nice API to me. The HTML() parser doesn't actually use namespaces. Well, maybe it does if you give it XHTML, or maybe you really have to use XML() to get that. It's never come up because I don't deal with any XHTML sites (because there are almost no XHTML sites ;). I'm not entirely clear on how namespaces fit in. Most of the methods would apply to all HTML elements, but HTML 4 elements aren't easy to distinguish. > Any chance you could post your code somewhere so that I could take a look at > what you're really contributing here? Sure; I started collecting a few of the routines from various libraries yesterday. There's still stuff in Deliverance and htmldiff that I haven't integrated. I haven't copied over any tests and there may be broken imports in many of the modules, but it should give you a vague idea of scope. (I'm actually looking for a home for htmldiff, so it's possible it could also go in this library; it's at https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/htmldiff2.py and https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/test_htmldiff2.txt) Anyway, it's not too big so I'll just attach the stuff I have collected. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers -------------- next part -------------- A non-text attachment was scrubbed... Name: lhtml.tar.gz Type: application/x-gzip Size: 5480 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070525/9e5fcd8c/attachment.bin From stefan_ml at behnel.de Fri May 25 20:58:48 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 25 May 2007 20:58:48 +0200 Subject: [lxml-dev] lhtml In-Reply-To: <46570AA0.7030705@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> <4655C076.7050901@colorstudy.com> <46567EC6.2000201@behnel.de> <46570AA0.7030705@colorstudy.com> Message-ID: <465731E8.40603@behnel.de> Hi Ian, Ian Bicking wrote: > Stefan Behnel wrote: >> Ian Bicking wrote: >>> Ian Bicking wrote: >>>> I really want to take all our HTML-related routines and put them >>>> into a proper package >>> And maybe a bit of advice -- we could just do this as a set of >>> functions (what we currently have), or potentially explore objectify >>> and add the routines as methods. E.g., el.find_by_class('classname') >> >> You're not using objectify as a base, are you? I mean, HTML is mainly >> about text, so objectify will not help you much. > > I'm not using it now, no. But if I used objectify as a base, it would > be to add methods like .html_serialize() to elements, or any number of > other handy methods. I don't know what you mean here. Maybe I'm just missing something that's more obvious to you, or are talking about custom element classes in general rather than objectify? http://codespeak.net/lxml/dev/element_classes.html http://codespeak.net/lxml/dev/objectify.html http://codespeak.net/lxml/dev/FAQ.html#what-is-the-difference-between-lxml-etree-and-lxml-objectify >>> This feels like a cleaner API, but I'm worried that it will mean >>> problems when mixing non-objectify-HTML with other elements, and if >>> there's problems with threads or memory overhead, or any other >>> issues. I don't really mind functions, which is why I am unsure; >>> OTOH, almost every function has a first argument of "el", which makes >>> them seem like methods. >> >> What about implementing the HTML namespace in a couple of Element >> subclasses >> and add the methods where they are appropriate? That sounds like a >> nice API to me. > > The HTML() parser doesn't actually use namespaces. True, I forgot. Still, you can use something like: >>> class HtmlElement(etree._ElementBase): ... # your implementation here >>> # some more subclasses for different HTML tags, e.g. AnchorElement >>> HTML_CLASSES = { ... "a" : AnchorElement, ... # ... ... } >>> class HtmlLookup(etree.CustomElementClassLookup): ... def lookup(self, node_type, document, namespace, name): ... if node_type == "element": ... return HTML_CLASSES.get(name, HtmlElement) ... else: ... return None # delegate >>> html_parser = etree.HTMLParser() >>> html_parser.setElementClassLookup(HtmlLookup()) >>> def HTML(html): ... return etree.HTML(html, html_parser) That does almost the same as the Namespace classes would. > I'm not entirely clear on how namespaces fit in. Most of the methods > would apply to all HTML elements, but HTML 4 elements aren't easy to > distinguish. I would expect only HTML elements in an HTML4 document. That makes it rather easy. But if you like, you can add any kind of special casing into the lookup method above, such as: if the tag has a namespace that's not XHTML, return None (i.e. the default Element class). >> Any chance you could post your code somewhere so that I could take a >> look at what you're really contributing here? > > Sure; I started collecting a few of the routines from various libraries > yesterday. There's still stuff in Deliverance and htmldiff that I > haven't integrated. I haven't copied over any tests and there may be > broken imports in many of the modules, but it should give you a vague > idea of scope. I took a quick look at it and I totally like the doctestcompare module. I'd love to use it for lxml's own doctests first of all, so, sure, that's a perfect companion to lxml's other modules. You already have write access to lxml's SVN repository, so there's not much of a problem with release cycles or anything. If you want to add new stuff, that may even be a good reason for a new version of lxml. :) Questions: doctest module: - is there any reason why you require a call to "lxmldoctest.install()"? I'd rather execute that immediately when you import the module. That's less intrusive for doctests (which is the main use case after all). - I'd like to call that module "lxml.xmldoctest" or something like that, so that you can "import xmldoctest" in a doctest file, which is rather readable. serialise.py: - libxml2 actually has some internal support for serialising HTML, so maybe it's worth looking at that first, in case we ever decide to wrap it. parse, serialize and fixuplinks: - I'll have to take a closer look at that to see if this makes sense in general. __init__.py: - some of this can be rewritten using plain XPath, e.g. get_parent_with_class (there's now RegExp support in lxml 1.3) or get_text (basically what 'string()' does). contains_class_xpath is not really much better than an XPath expression with variables, dito for get_elements_by_class and get_rel_links, e.g. the latter is better written as: get_rel_links = etree.XPath("descendant-or-self::a[@rel=$rel]") get_rel_links(el, rel="whatever") > (I'm actually looking for a home for htmldiff, so it's > possible it could also go in this library; it's at > https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/htmldiff2.py > and > https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/test_htmldiff2.txt) I'll take a look at this when I have a bit more time. Stefan From ianb at colorstudy.com Fri May 25 21:44:52 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 May 2007 14:44:52 -0500 Subject: [lxml-dev] lhtml In-Reply-To: <465731E8.40603@behnel.de> References: <4655B5D4.2080101@colorstudy.com> <4655C076.7050901@colorstudy.com> <46567EC6.2000201@behnel.de> <46570AA0.7030705@colorstudy.com> <465731E8.40603@behnel.de> Message-ID: <46573CB4.60601@colorstudy.com> Stefan Behnel wrote: > Hi Ian, > > Ian Bicking wrote: >> Stefan Behnel wrote: >>> Ian Bicking wrote: >>>> Ian Bicking wrote: >>>>> I really want to take all our HTML-related routines and put them >>>>> into a proper package >>>> And maybe a bit of advice -- we could just do this as a set of >>>> functions (what we currently have), or potentially explore objectify >>>> and add the routines as methods. E.g., el.find_by_class('classname') >>> You're not using objectify as a base, are you? I mean, HTML is mainly >>> about text, so objectify will not help you much. >> I'm not using it now, no. But if I used objectify as a base, it would >> be to add methods like .html_serialize() to elements, or any number of >> other handy methods. > > I don't know what you mean here. Maybe I'm just missing something that's more > obvious to you, or are talking about custom element classes in general rather > than objectify? > > http://codespeak.net/lxml/dev/element_classes.html > http://codespeak.net/lxml/dev/objectify.html > http://codespeak.net/lxml/dev/FAQ.html#what-is-the-difference-between-lxml-etree-and-lxml-objectify I probably confused the terms/modules, since I haven't used any of them. I think you are right, I'm just thinking about a custom element class. >>>> This feels like a cleaner API, but I'm worried that it will mean >>>> problems when mixing non-objectify-HTML with other elements, and if >>>> there's problems with threads or memory overhead, or any other >>>> issues. I don't really mind functions, which is why I am unsure; >>>> OTOH, almost every function has a first argument of "el", which makes >>>> them seem like methods. >>> What about implementing the HTML namespace in a couple of Element >>> subclasses >>> and add the methods where they are appropriate? That sounds like a >>> nice API to me. >> The HTML() parser doesn't actually use namespaces. > > True, I forgot. Still, you can use something like: > > >>> class HtmlElement(etree._ElementBase): > ... # your implementation here > > >>> # some more subclasses for different HTML tags, e.g. AnchorElement > > >>> HTML_CLASSES = { > ... "a" : AnchorElement, > ... # ... > ... } > > >>> class HtmlLookup(etree.CustomElementClassLookup): > ... def lookup(self, node_type, document, namespace, name): > ... if node_type == "element": > ... return HTML_CLASSES.get(name, HtmlElement) > ... else: > ... return None # delegate > > >>> html_parser = etree.HTMLParser() > >>> html_parser.setElementClassLookup(HtmlLookup()) > > >>> def HTML(html): > ... return etree.HTML(html, html_parser) > > That does almost the same as the Namespace classes would. Yes, that's the sort of thing I was thinking about (but was fuzzy on the details because I haven't tried it). It relies on a different parser from lxml.etree.HTML, and I would guess that elements created with etree.Element wouldn't necessarily use the right class. I'm just worried it adds more confusion, because things act differently depending on how the element was created or how a document is parsed. Functions are fairly straight-forward in comparison -- they just do stuff. They are also somewhat easier to document and browse through as a new user. For instance, it would be amusing to have an AnchorElement.GET() method. But what exactly would it do? Which HTTP library would it use? I don't know; if it was a function then it wouldn't matter, you'd just implement however many functions were necessary to do what people wanted to do. And those functions may or may not be implemented in lxml.html -- someone else could distribute their own implementations using whatever library they liked. But not all methods are like GET(). find_by_class() is probably more obvious -- it gets all elements according to a class name, and multiple implementations aren't necessary. >> I'm not entirely clear on how namespaces fit in. Most of the methods >> would apply to all HTML elements, but HTML 4 elements aren't easy to >> distinguish. > > I would expect only HTML elements in an HTML4 document. That makes it rather > easy. But if you like, you can add any kind of special casing into the lookup > method above, such as: if the tag has a namespace that's not XHTML, return > None (i.e. the default Element class). > > >>> Any chance you could post your code somewhere so that I could take a >>> look at what you're really contributing here? >> Sure; I started collecting a few of the routines from various libraries >> yesterday. There's still stuff in Deliverance and htmldiff that I >> haven't integrated. I haven't copied over any tests and there may be >> broken imports in many of the modules, but it should give you a vague >> idea of scope. > > I took a quick look at it and I totally like the doctestcompare module. I'd > love to use it for lxml's own doctests first of all, so, sure, that's a > perfect companion to lxml's other modules. You already have write access to > lxml's SVN repository, so there's not much of a problem with release cycles or > anything. If you want to add new stuff, that may even be a good reason for a > new version of lxml. :) > > Questions: > > doctest module: > > - is there any reason why you require a call to "lxmldoctest.install()"? I'd > rather execute that immediately when you import the module. That's less > intrusive for doctests (which is the main use case after all). I dislike having modules do something to the system when you import them. OTOH, I dislike that I have to monkeypatch doctest to get the comparison function in, but it's not practical to do anything else. So maybe I just have to put up with it. > - I'd like to call that module "lxml.xmldoctest" or something like that, so > that you can "import xmldoctest" in a doctest file, which is rather readable. I'd be surprised it this would actually work -- I'd expect that it would be too late once you were running the doctest. But I haven't tried. > serialise.py: > > - libxml2 actually has some internal support for serialising HTML, so maybe > it's worth looking at that first, in case we ever decide to wrap it. Sure; this was just the most expedient thing we figured out. The XSLT serialization probably uses something else in libxml2, so maybe direct access to that is possible. There's nothing terribly wrong about it either, it's just a little roundabout (which isn't so bad if it is implemented in a reusable function, of course -- but reimplementing it each time you want to serialize HTML isn't so good). > parse, serialize and fixuplinks: > > - I'll have to take a closer look at that to see if this makes sense in general. The parse stuff is really just charset detection. I don't think lxml/libxml2 does this natively (checking the meta tag), but I'm not actually 100% sure. It should include parsing HTML fragments too, which is a little hard (HTML() interprets all text as complete documents, and adds in elements to make the document valid, which often isn't what you'd want). > __init__.py: > > - some of this can be rewritten using plain XPath, e.g. get_parent_with_class > (there's now RegExp support in lxml 1.3) or get_text (basically what > 'string()' does). contains_class_xpath is not really much better than an XPath > expression with variables, dito for get_elements_by_class and get_rel_links, > e.g. the latter is better written as: > > get_rel_links = etree.XPath("descendant-or-self::a[@rel=$rel]") > get_rel_links(el, rel="whatever") I tried doing class name matching with a regular expression, but never got it to work. It might have been a bug in my or lxml's code, I'm not sure -- whatever it was, I was in a mind to move on ;). General CSS selector support would be wonderful. But anyway, these are things that weren't obvious to me, so I think it's still useful to include the functions even if their implementation is fairly trivial. I'm sure there's other implementation details that could be improved. Most of those particular functions came from some microformat parsing, and most microformats are just built on a small number of queries. Deliverance and htmldiff had more stuff for modifying the structure, which is often quite awkward with the ElementTree model (doing something that seems easy, like removing a tag, is nontrivial). A lot of those things aren't that specific to HTML, except that HTML has lots of situations where tags and text are mixed together. >> (I'm actually looking for a home for htmldiff, so it's >> possible it could also go in this library; it's at >> https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/htmldiff2.py >> and >> > https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/test_htmldiff2.txt) > > I'll take a look at this when I have a bit more time. Sure; note it's very much oriented towards human-readable diffs, not formal diffs. Which fits HTML fairly well (where the tags are more like annotations of the text), but not most other XML documents. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Fri May 25 22:34:26 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 May 2007 15:34:26 -0500 Subject: [lxml-dev] lhtml In-Reply-To: <46573CB4.60601@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> <4655C076.7050901@colorstudy.com> <46567EC6.2000201@behnel.de> <46570AA0.7030705@colorstudy.com> <465731E8.40603@behnel.de> <46573CB4.60601@colorstudy.com> Message-ID: <46574852.2000405@colorstudy.com> On the list-of-things-I-could-imagine-in-lxml.html, I've been wanting to reimplement formencode.htmlfill for a long time (http://formencode.org/htmlfill.html). It uses HTMLParser, and I hate HTMLParser. It fills in forms, and also fills in errors. I'd probably change the way errors are given, and separate it into two functions -- HTMLParser rewards doing everything in one pass, but with a document model it would both be much easier to write and using multiple passes is no problem. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From stefan_ml at behnel.de Fri May 25 22:37:44 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 25 May 2007 22:37:44 +0200 Subject: [lxml-dev] lhtml In-Reply-To: <46573CB4.60601@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> <4655C076.7050901@colorstudy.com> <46567EC6.2000201@behnel.de> <46570AA0.7030705@colorstudy.com> <465731E8.40603@behnel.de> <46573CB4.60601@colorstudy.com> Message-ID: <46574918.70900@behnel.de> Hi Ian, Ian Bicking wrote: > Stefan Behnel wrote: > It relies on a different parser from lxml.etree.HTML, and I would guess > that elements created with etree.Element wouldn't necessarily use the > right class. objectify replicates the XML() and Element() factories for exactly this purpose. lxml.html could do likewise. > Functions are fairly straight-forward in comparison > -- they just do stuff. They are also somewhat easier to document and > browse through as a new user. ET has functions like tostring(), but methods like write() or xpath(). Some are obvious decisions, others are a matter of taste. There's nothing bad to it. > For instance, it would be amusing to have an AnchorElement.GET() method. > But what exactly would it do? Which HTTP library would it use? Well, lxml already has a custom resolver API. We should be able to reuse that in some way. On the other hand, I'd give a clear vote for a function approach here rather than a method. I think a do-what-I-mean GET() method is a sign of overdesign. > But not all methods are like GET(). find_by_class() is probably more > obvious -- it gets all elements according to a class name, and multiple > implementations aren't necessary. Definitely. That's the kind of argument that works well for and against functions and methods. >> - I'd like to call that module "lxml.xmldoctest" or something like >> that, so >> that you can "import xmldoctest" in a doctest file, which is rather >> readable. > > I'd be surprised it this would actually work -- I'd expect that it would > be too late once you were running the doctest. But I haven't tried. Me neither. But *if* it works, *then* requiring a call to install() shouldn't be necessary. >> parse, serialize and fixuplinks: >> >> - I'll have to take a closer look at that to see if this makes sense >> in general. > > The parse stuff is really just charset detection. I don't think > lxml/libxml2 does this natively (checking the meta tag), but I'm not > actually 100% sure. It does actually. You will see that when you pass in a unicode string that contains a meta-tag with some byte encoding (say, UTF-8). This will break immediately. Note, however, that libxml2 requires a bit of structure to actually find the tag. Simply prepending a complete HTML document with such a tag (which I've seen in a couple of real-life broken HTML documents) will not work. > It should include parsing HTML fragments too, which > is a little hard (HTML() interprets all text as complete documents, and > adds in elements to make the document valid, which often isn't what > you'd want). Maybe a simple approach here would be to check if a string starts with a known inner HTML tag, then just prefix it with before parsing and return their child (or children) after parsing. >> __init__.py: >> >> - some of this can be rewritten using plain XPath, e.g. >> get_parent_with_class >> (there's now RegExp support in lxml 1.3) or get_text (basically what >> 'string()' does). contains_class_xpath is not really much better than >> an XPath >> expression with variables, dito for get_elements_by_class and >> get_rel_links, >> e.g. the latter is better written as: >> >> get_rel_links = etree.XPath("descendant-or-self::a[@rel=$rel]") >> get_rel_links(el, rel="whatever") > > I tried doing class name matching with a regular expression, but never > got it to work. It might have been a bug in my or lxml's code, I'm not > sure -- whatever it was, I was in a mind to move on ;). I recently fixed a few problems with the regexp support, quite possible that it were those that stopped you. > General CSS > selector support would be wonderful. But anyway, these are things that > weren't obvious to me, so I think it's still useful to include the > functions even if their implementation is fairly trivial. Sure, little helpers keep you from (re-)writing them yourself, but more interestingly, they encourage a certain programming style that can make your life easier. > I'm sure there's other implementation details that could be improved. > Most of those particular functions came from some microformat parsing, > and most microformats are just built on a small number of queries. > Deliverance and htmldiff had more stuff for modifying the structure, > which is often quite awkward with the ElementTree model (doing something > that seems easy, like removing a tag, is nontrivial). A lot of those > things aren't that specific to HTML, except that HTML has lots of > situations where tags and text are mixed together. That's just another reason for a wrapper API on top of lxml.etree that makes working with HTML more intuitive. Fredrik wrote a nice factory class for generating (X|HT)ML a while ago, I felt free to add it as "lxml.htmlbuilder" (although I'm still waiting for his reply to see if it can stay there to become part of lxml 1.3). But the other API side of parsing and treating HTML document in a convenient way is much more ambitious. Stefan From faassen at startifact.com Fri May 25 22:41:16 2007 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 25 May 2007 22:41:16 +0200 Subject: [lxml-dev] lhtml In-Reply-To: <465701F3.2060707@colorstudy.com> References: <4655B5D4.2080101@colorstudy.com> <46567DD3.8090700@behnel.de> <465701F3.2060707@colorstudy.com> Message-ID: Ian Bicking wrote: > Stefan Behnel wrote: [snip] >> Note that lxml already has quite a number of modules like lxml.sax and >> lxml.objectify, so lxml.html would nicely fit in here. > > I had thought about that, but I don't know if it should have the same > release schedule...? It's a somewhat random collection of functions > that we've written here and there in other modules. I can clean them > up, of course, but exactly what functionality is in there has been an > on-demand sort of thing. Which is to say, it's young. If you're willing to help getting releases out of the door, I could imagine we let the html functionality drive the release schedule for a while. It is not like we have a lot of other features lined up right now, so this would be a good way to actually drive a few new releases. What do you think, Ste