From jkrukoff at ltgc.com Fri Dec 1 02:50:06 2006 From: jkrukoff at ltgc.com (John Krukoff) Date: Thu, 30 Nov 2006 18:50:06 -0700 Subject: [lxml-dev] find/findall not accepting qnames? Message-ID: <1164937806.22052.44.camel@localhost> I find it surprising that find & findall do not accept QName objects in lxml, and instead require a manual cast to string, like so: etree.XML( '' ).find( str( etree.QName( 'http://test', 'b' ) ) ) ElementTree appears to accept QNames transparently, in my limited testing. I haven't tested with earlier revisions, but after tracking down a couple of bugs related to this, is this a change in behavior in recent (1.1) lxml versions? -- John Krukoff Land Title Guarantee Company From fredrik at pythonware.com Fri Dec 1 08:29:41 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 01 Dec 2006 08:29:41 +0100 Subject: [lxml-dev] find/findall not accepting qnames? In-Reply-To: <1164937806.22052.44.camel@localhost> References: <1164937806.22052.44.camel@localhost> Message-ID: John Krukoff wrote: > I find it surprising that find & findall do not accept QName objects in > lxml, and instead require a manual cast to string, like so: > > etree.XML( ' xmlns:ns0="http://test"/>' ).find( str( etree.QName( 'http://test', > 'b' ) ) ) > > ElementTree appears to accept QNames transparently, in my limited > testing. I find it surprising that it does, though. Not sure that's intentional ;-) From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Dec 1 09:27:02 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri, 01 Dec 2006 09:27:02 +0100 Subject: [lxml-dev] find/findall not accepting qnames? In-Reply-To: <1164937806.22052.44.camel@localhost> References: <1164937806.22052.44.camel@localhost> Message-ID: <456FE756.8050402@gkec.informatik.tu-darmstadt.de> Hi, John Krukoff wrote: > I find it surprising that find & findall do not accept QName objects in > lxml True, that's about the only place where we do not parse the input ourselves but hand it to the ElementPath module of ElementTree. All other places use the same function for parsing tag names, so that makes QNames completely transparent in lxml. I agree that this is unexpected behaviour and since we accept QNames in loads of other places, we should make it a special case if a QName is passed as path to find*(). After all, it's a common use case to look for a specific tag instead of a path. BTW, using getiterator(tag) for this purpose should be a bit faster. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Dec 1 10:21:19 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri, 01 Dec 2006 10:21:19 +0100 Subject: [lxml-dev] lxml 1.1 problems with python 2.3 In-Reply-To: <456F102F.10802@infrae.com> References: <456726B9.2050800@infrae.com> <45672C1B.6060707@palladion.com> <45682053.7040108@gkec.informatik.tu-darmstadt.de> <456872CF.7020400@palladion.com> <456A8C6A.9000508@gkec.informatik.tu-darmstadt.de> <456DF1B4.5010104@infrae.com> <456EA12E.4040504@gkec.informatik.tu-darmstadt.de> <456F102F.10802@infrae.com> Message-ID: <456FF40F.1040209@gkec.informatik.tu-darmstadt.de> Hi Martijn, Martijn Faassen wrote: > Stupid of me not to see it earlier, but that's because it's trying to > import from lxl.local_doctest and you added it as local_doctest. Ah, stupid me then. :) >>> Things then fail with what looks like a new, unrelated issue: >>> >>> Traceback (most recent call last): >>> File "test.py", line 591, in ? >>> exitcode = main(sys.argv) >>> File "test.py", line 554, in main >>> test_cases = get_test_cases(test_files, cfg, tracer=tracer) >>> File "test.py", line 254, in get_test_cases >>> module = import_module(file, cfg, tracer=tracer) >>> File "test.py", line 197, in import_module >>> mod = __import__(modname) >>> File >>> "/home/faassen/working/lxml/lxml-trunk/src/lxml/tests/test_objectify.py", >>> >>> line 16, in ? >>> from lxml import objectify >>> ImportError: >>> /home/faassen/working/lxml/lxml-trunk/src/lxml/objectify.so: undefined >>> symbol: previousElement >> >> That's rather bizarre, previousElement is definitely a public function >> (i.e. >> defined in etree.so). I have no idea how that could be missing. > > It's consistently missing though in Python 2.3. Perhaps it accidentally > gets turned off together with thread support? I did try to test this > theory yesterday though on Python 2.4 by explicitly disabling tests, and > that didn't help. Ok, then, first thing to check: does "previousElement" turn up as a static function in the generated src/lxml/etree.h? Could you check what the preprocessor sees in objectify.c (gcc -E)? On my side (Py 2.5), it sees the following: ----------------------- ... static xmlNode (*((*nextElement)(xmlNode (*)))); static xmlNode (*((*previousElement)(xmlNode (*)))); ... {"nextElement", &nextElement}, {"previousElement", &previousElement}, ... __pyx_v_next = nextElement; ... __pyx_v_next = previousElement; ... ----------------------- I'm showing both functions here, as both are used in objectify, but only the second seems to be missing according to your report. If this looks the same on your side, I'm really out of ideas. Stefan From faassen at infrae.com Fri Dec 1 13:44:53 2006 From: faassen at infrae.com (Martijn Faassen) Date: Fri, 01 Dec 2006 13:44:53 +0100 Subject: [lxml-dev] lxml 1.1 problems with python 2.3 In-Reply-To: <456FF40F.1040209@gkec.informatik.tu-darmstadt.de> References: <456726B9.2050800@infrae.com> <45672C1B.6060707@palladion.com> <45682053.7040108@gkec.informatik.tu-darmstadt.de> <456872CF.7020400@palladion.com> <456A8C6A.9000508@gkec.informatik.tu-darmstadt.de> <456DF1B4.5010104@infrae.com> <456EA12E.4040504@gkec.informatik.tu-darmstadt.de> <456F102F.10802@infrae.com> <456FF40F.1040209@gkec.informatik.tu-darmstadt.de> Message-ID: <457023C5.8030908@infrae.com> Stefan Behnel wrote: [snip] >> It's consistently missing though in Python 2.3. Perhaps it accidentally >> gets turned off together with thread support? I did try to test this >> theory yesterday though on Python 2.4 by explicitly disabling tests, and >> that didn't help. > > Ok, then, first thing to check: does "previousElement" turn up as a static > function in the generated src/lxml/etree.h? The only reference to previousElement (and nextElement) in etree.h are here: extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*)))); extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*)))); > Could you check what the > preprocessor sees in objectify.c (gcc -E)? Hm, I wasn't previously familiar with gcc -E. I tried running it against objectify.c but got a lot of missing includes for Python and libxml2 (which is odd as these things are in /usr/include). I'm not quite sure how you generate your output, but here's my reference to previousElement when I do gcc -E: extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*)))); extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*)))); ... __pyx_v_next = nextElement; ... __pyx_v_next = previousElement; ... Hm, is it possible I'm using the wrong version of Pyrex? I have lxml's version installed for Python 2.4 but I guess I don't have that one for Python 2.3... Us having to maintain our own version of Pyrex rather sucks. I just installed lxml's version of Pyrex, and now the tests start. We still get some failures, though. Most of them are because 'assertFalse' doesn't appear to exist. I added this to HelperTestCase and made those errors go away. There's also the use of operator.itemgetter, which was only introduced in Python 2.4. I hacked up a simplistic implementation too. Now we're down to one failure in Python 2.3: ====================================================================== FAIL: test_findall (lxml.tests.test_objectify.ObjectifyTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/faassen/working/lxml/src/lxml/tests/test_objectify.py", line 218, in test_findall root.getchildren()[:2]) File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual raise self.failureException, \ AssertionError: [, ''] != [, ''] You'd think that this *should* be equal and thus succeed. Possibly some rich comparison feature that doesn't exist yet in Python 2.3? Back to you, Stephan. :) Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Dec 1 14:39:55 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri, 01 Dec 2006 14:39:55 +0100 Subject: [lxml-dev] lxml 1.1 problems with python 2.3 In-Reply-To: <457023C5.8030908@infrae.com> References: <456726B9.2050800@infrae.com> <45672C1B.6060707@palladion.com> <45682053.7040108@gkec.informatik.tu-darmstadt.de> <456872CF.7020400@palladion.com> <456A8C6A.9000508@gkec.informatik.tu-darmstadt.de> <456DF1B4.5010104@infrae.com> <456EA12E.4040504@gkec.informatik.tu-darmstadt.de> <456F102F.10802@infrae.com> <456FF40F.1040209@gkec.informatik.tu-darmstadt.de> <457023C5.8030908@infrae.com> Message-ID: <457030AB.7020302@gkec.informatik.tu-darmstadt.de> Hi Martijn, Martijn Faassen wrote: > Hm, I wasn't previously familiar with gcc -E. I tried running it against > objectify.c but got a lot of missing includes for Python and libxml2 You can use the same command line that distutils use to compile the module, except for the "-c xxx.so" part. > Us having to maintain our own version of Pyrex rather sucks. Sure, but it's currently not that easy to push things upstream back into Pyrex. Maybe Greg manages to get some work done over Christmas. > I just installed lxml's version of Pyrex, and now the tests > start. Ah, finally. :) > Now we're down to one failure in Python 2.3: > > ====================================================================== > FAIL: test_findall (lxml.tests.test_objectify.ObjectifyTestCase) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/faassen/working/lxml/src/lxml/tests/test_objectify.py", > line 218, in test_findall > root.getchildren()[:2]) > File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual > raise self.failureException, \ > AssertionError: [, ''] != [ b787f0cc>, ''] > > You'd think that this *should* be equal and thus succeed. Possibly some > rich comparison feature that doesn't exist yet in Python 2.3? Or maybe just works differently. That was a bad test case anyway, as equality of objectified elements is not really well defined in general. It can be type specific, which might be the problem here already. I changed that to an identity test, so it should work now. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Dec 2 22:14:30 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sat, 02 Dec 2006 22:14:30 +0100 Subject: [lxml-dev] XInclude does not support Resolvers? In-Reply-To: <456DEEE0.9090606@infrae.com> References: <1164728726.7952.134.camel@ltucker.openplans.org> <456C6D8B.6040004@gkec.informatik.tu-darmstadt.de> <456DEEE0.9090606@infrae.com> Message-ID: <4571ECB6.9040200@gkec.informatik.tu-darmstadt.de> Hi, Martijn Faassen wrote: > I'm fine with supporting something Python-based in addition to the > libxml2 version, but I think the XInclude implementation in libxml2 has > the benefit in that it's probably fairly complete and besides, *they*'re > maintaining it, not us. :) So, I'm fine with adding our own XInclude > support, as long as it's in addition and not a replacement, along the > same lines as the way we support ElementTree's 'find' together with our > own 'xpath'. I copied ET's ElementInclude module over to lxml (trunk) and modified it a bit. The related tests in ET's selftest.py pass (with one minor exception), although the serialisations can look a little different (so I had to fix the doctests a little). The implementation is adapted in that it uses Element.getiterator() to find the XInclude elements. I also had to extend lxml's API in order to make the original parser of a document available at the API level. There is now a 'parser' property on _ElementTree that is used by ElementInclude to provide the same parser configuration (including resolvers) as for the source document. It's not tested much, so I'd be glad if others could give it a try. Hope it's useful, Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 4 08:37:47 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 04 Dec 2006 08:37:47 +0100 Subject: [lxml-dev] Customised xmlReconciliateNs() for lxml Message-ID: <4573D04B.5060000@gkec.informatik.tu-darmstadt.de> Hi all, we had a couple of problems in the past that were related to the xmlReconciliateNs() function in libxml2. Basically, it cleans up the namespaces declared in a subtree after moving it to a new position inside a document or from one document to another. I rewrote this function in Pyrex and customised it to what we need in lxml. It now tries to drop redundant declarations that were already available in the new ancestors, and it avoids the bug that made lxml crash when parsing with the COMPACT option. It also sets the new _Document reference in the same step, which reduces the need for a second traversal step. There may be other possible optimisations, but it's not always obvious how they behave in the various possible use cases, so I'm a bit conservative here. This is a pretty critical function, it can both make lxml crash and break namespace handling... Anyway, I hope that having this function inside lxml will help us to further optimise it in the future. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 4 08:49:22 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 04 Dec 2006 08:49:22 +0100 Subject: [lxml-dev] redundant namespace declarations In-Reply-To: <4564B5CB.9050001@gkec.informatik.tu-darmstadt.de> References: <20061120144135.GA23359@tttech.com> <4564B5CB.9050001@gkec.informatik.tu-darmstadt.de> Message-ID: <4573D302.7090207@gkec.informatik.tu-darmstadt.de> Hi again, Stefan Behnel wrote: > Albert Brandl wrote: >> The problem occurs with the following code: >> >> nsmap = dict (foo="http://foo.org", bar = "http://bar.org") >> e = Element("{http://foo.org}somefoo", nsmap = nsmap) >> s = Element("{http://bar.org}somebar", nsmap = nsmap) >> e.append(s1) >> et = ElementTree(e) >> et.write("foo.xml", pretty_print = True) >> >> This code creates the following XML file: >> >> >> >> >> >> Is this a known bug? > > It's known - though not really a bug but rather an inconvenience. Currently, > we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces > when merging trees. This function shows the above behaviour. To fix this, we'd > have to implement our own version, which is a bit tricky and just wasn't > important enough to try to get right so far. Note that even libxml2 had a > (minor) bug up to version 2.6.26 here, so it's really not trivial to get this > kind of thing right. I finally took a(nother) shot at it and I now have an implementation that can avoid this kind of problem. It's currently stored in the "nscleanup" branch, but I will move it to the trunk ASAP. Please give it a try then, to see if it works nicely for you in other cases where you encountered this. Stefan From faassen at startifact.com Mon Dec 4 16:53:19 2006 From: faassen at startifact.com (Martijn Faassen) Date: Mon, 4 Dec 2006 15:53:19 +0000 (UTC) Subject: [lxml-dev] Python 2.4.1 and threading Message-ID: Hi there, I think I just discovered that lxml 1.1.2 doesn't work clearly with Python 2.4.1 either. It appears to work with Python 2.4.3 and 4, but when I compile it for Python 2.4.1, it segfaults when you get an error during parsing. When I take lxml trunk and compile it without threading support, it does all work with Python 2.4.1. This leads me to suspect threading support is again the issue. Stefan, perhaps you can turn off threading support not only in Python 2.3 but also in (at least) Python 2.4.1. Should we be going for a new release? Perhaps the world is ready for a lxml 1.2. We haven't done a lot of changes except for the setup.py stuff (that is, I can't see any mentioned in the CHANGES.txt..), but I think those changes might warrant a new version number. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 4 17:29:07 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 04 Dec 2006 17:29:07 +0100 Subject: [lxml-dev] Python 2.4.1 and threading In-Reply-To: References: Message-ID: <45744CD3.9000604@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > I think I just discovered that lxml 1.1.2 doesn't work clearly with > Python 2.4.1 either. It appears to work with Python 2.4.3 and 4, but > when I compile it for Python 2.4.1, it segfaults when you get an > error during parsing. > > When I take lxml trunk and compile it without threading support, it > does all work with Python 2.4.1. This leads me to suspect threading > support is again the issue. Hmm, ok, I saw a couple of differences in the PyGILState_* API functions between 2.3.6 and 2.4.4, so maybe they make a difference for us. It's still possible that we're doing something wrong in lxml, but having it work with newer versions lets me suspect that it's a race condition that was solved in later Python versions). > Stefan, perhaps you can turn off threading support not only in > Python 2.3 but also in (at least) Python 2.4.1. Ok, no problem. It's just a plain version number comparison. > Should we be going for a new release? Perhaps the world is ready for > a lxml 1.2. We haven't done a lot of changes except for the setup.py > stuff (that is, I can't see any mentioned in the CHANGES.txt..), but > I think those changes might warrant a new version number. There are a couple of things that I expected to make it into 1.2, especially the xmlReconciliateNs() replacement. But that one definitely needs more testing before a release. There's also the integration of ElementInclude.py that should be easier to integrate. I don't think it's a good time to release "right now" or even next week, but I agree that having a simpler-to-hack build process can become an opener and should get a second-level version number to show that there may be things to do to get it back working. Stefan From albert.brandl at tttech.com Wed Dec 6 10:21:12 2006 From: albert.brandl at tttech.com (Albert Brandl) Date: Wed, 6 Dec 2006 10:21:12 +0100 Subject: [lxml-dev] redundant namespace declarations In-Reply-To: <45745E7C.900@gkec.informatik.tu-darmstadt.de> References: <20061120144135.GA23359@tttech.com> <4564B5CB.9050001@gkec.informatik.tu-darmstadt.de> <4573D302.7090207@gkec.informatik.tu-darmstadt.de> <20061204092647.GA1898@tttech.com> <45745E7C.900@gkec.informatik.tu-darmstadt.de> Message-ID: <20061206092112.GA19902@tttech.com> Hi, On Mon, Dec 04, 2006 at 06:44:28PM +0100, Stefan Behnel wrote: > > Well, maybe that won't be that soon. It's currently undecided if this will be > in the next release, so it will stay out of the trunk for now. No problem. I've adapted the code responsible for building the element tree to use SubElement instead of append, so that "pretty_print = True" does what I want. A minor problem remains, but I have a workaround for this. If you try to write the document to a file without write access, "write_c14n" raises a C14NError: >>> from lxml.etree import * >>> et = ElementTree(Element("abc")) >>> et.write_c14n("nonwritable.xml") >>> et.write_c14n("notwritable.xml") Traceback (most recent call last): File "", line 1, in ? File "etree.pyx", line 657, in etree._ElementTree.write_c14n File "serializer.pxi", line 224, in etree._tofilelikeC14N etree.C14NError: C14N failed But if you use "write" instead, lxml silently ignores the fact that the file can't be written: >>> et.write("notwritable.xml") >>> For now, I'm just writing to a StringIO buffer and open the file manually, so this is no serious problem for me. But others might well be bitten by this bug, even more so since lxml does not give any feedback about what just happened. Best regards, Albert Brandl From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Dec 6 23:18:28 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed, 06 Dec 2006 23:18:28 +0100 Subject: [lxml-dev] redundant namespace declarations In-Reply-To: <20061206092112.GA19902@tttech.com> References: <20061120144135.GA23359@tttech.com> <4564B5CB.9050001@gkec.informatik.tu-darmstadt.de> <4573D302.7090207@gkec.informatik.tu-darmstadt.de> <20061204092647.GA1898@tttech.com> <45745E7C.900@gkec.informatik.tu-darmstadt.de> <20061206092112.GA19902@tttech.com> Message-ID: <457741B4.7010103@gkec.informatik.tu-darmstadt.de> Hi Albert, Albert Brandl wrote: > If you try to write the document to a file without write access, > lxml silently ignores the fact that the file can't be written: > >.>>> et.write("notwritable.xml") >.>>> True, thanks for the report. Opening the file is done by libxml2 in this case and we did not handle the case where it failed to do so. Fixed on the trunk. Stefan From dsoulayrol at free.fr Mon Dec 11 15:15:34 2006 From: dsoulayrol at free.fr (David Soulayrol) Date: Mon, 11 Dec 2006 15:15:34 +0100 Subject: [lxml-dev] About lxml status Message-ID: <1165846534.30509.13.camel@dsoulayr.neotip> Hello, I am currently using 4Suite-XML for a project on my own, which makes use of DOM, XSLT and XPath. I've always looked around if I could find other libraries to replace 4Suite eventually, and I (re-) discovered libxml today, and found a link to lxml from there. What I'd like to know is (before I dive deeply in documentation) how you would compare the XML support between 4Suite and lxml. Would you say lxml is ready for common DOM manipulations, XSLT transformations and simple XPath usage ? I ask this because I've read in http://xmlsoft.org/index.html: "Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the document model, but it doesn't implement the API itself, gdome2 does this on top of libxml2" 'Hope I was not offending :) Thanks, -- David. From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 11 15:24:26 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon, 11 Dec 2006 15:24:26 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: <1165846534.30509.13.camel@dsoulayr.neotip> References: <1165846534.30509.13.camel@dsoulayr.neotip> Message-ID: <457D6A1A.7080904@gkec.informatik.tu-darmstadt.de> Hi, David Soulayrol wrote: > I am currently using 4Suite-XML for a project on my own, which makes use > of DOM, XSLT and XPath. I've always looked around if I could find other > libraries to replace 4Suite eventually, and I (re-) discovered libxml > today, and found a link to lxml from there. > > What I'd like to know is (before I dive deeply in documentation) how you > would compare the XML support between 4Suite and lxml. Would you say > lxml is ready for common DOM manipulations, XSLT transformations and > simple XPath usage ? > > I ask this because I've read in http://xmlsoft.org/index.html: > > "Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the > document model, but it doesn't implement the API itself, gdome2 does > this on top of libxml2" lxml does not implement the DOM API either. Instead, as the cheeseshop page nicely states: --------------------- lxml is a Pythonic binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API. It extends the ElementTree API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and much more. --------------------- Feel free to find out more from the documentation, it's full of examples: http://codespeak.net/lxml/#documentation Stefan From ogrisel at nuxeo.com Mon Dec 11 15:34:41 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Mon, 11 Dec 2006 15:34:41 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: <1165846534.30509.13.camel@dsoulayr.neotip> References: <1165846534.30509.13.camel@dsoulayr.neotip> Message-ID: David Soulayrol a ?crit : > Hello, > > I am currently using 4Suite-XML for a project on my own, which makes use > of DOM, XSLT and XPath. I've always looked around if I could find other > libraries to replace 4Suite eventually, and I (re-) discovered libxml > today, and found a link to lxml from there. > > What I'd like to know is (before I dive deeply in documentation) how you > would compare the XML support between 4Suite and lxml. Would you say > lxml is ready for common DOM manipulations, XSLT transformations and > simple XPath usage ? > > I ask this because I've read in http://xmlsoft.org/index.html: > > "Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the > document model, but it doesn't implement the API itself, gdome2 does > this on top of libxml2" lxml is does provide a DOM API implementation but an ElementTree API which is similar to DOM but simpler to use (more "pythonic"). As for XSLT and XPATH, lxml support them out of the box. If you really need a DOM API, the you probably should look at this project: http://www.python.org/pypi/libxml2dom -- Olivier From faassen at startifact.com Mon Dec 11 20:30:58 2006 From: faassen at startifact.com (Martijn Faassen) Date: Mon, 11 Dec 2006 20:30:58 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: <457D6A1A.7080904@gkec.informatik.tu-darmstadt.de> References: <1165846534.30509.13.camel@dsoulayr.neotip> <457D6A1A.7080904@gkec.informatik.tu-darmstadt.de> Message-ID: Hey, Stefan Behnel wrote: [snip] > lxml does not implement the DOM API either. Instead, as the cheeseshop page > nicely states: > > --------------------- > lxml is a Pythonic binding for the libxml2 and libxslt libraries. It provides > safe and convenient access to these libraries using the ElementTree API. Note that the ElementTree API is a developing Python standard, implemented by 3 separate libraries, ElementTree, cElementTree and lxml. ElementTree and cElement have become part of the core Python distribution as of Python 2.5. A lot of ElementTree documentation can be found here: http://effbot.org/zone/element-index.htm You can do common DOM-style manipulations through this API, just in a more convenient manner. As to XPath and XSLT support, lxml has that, including the ability to create extension functions and the like. There are differences in feature set here and there, but overall lxml should be able to compete with 4Suite. Regards, Martijn From lee.brown at elecdev.com Mon Dec 11 20:53:18 2006 From: lee.brown at elecdev.com (Lee Brown) Date: Mon, 11 Dec 2006 14:53:18 -0500 Subject: [lxml-dev] About lxml status In-Reply-To: Message-ID: <200612111953.kBBJr70e002706@mail.elecdev.com> Greetings! This discussion reminded me of a question that I've been pondering: will the XPATH support in LXML eventually include support for axes and predicates? Also, congratulations on the thorough Xinclude support. Other than 4Suite, LXML is one of the few parsers that handle the "parse=" attribute and the "fallback" elements correctly. -----Original Message----- From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Martijn Faassen Sent: Monday, December 11, 2006 2:31 PM To: lxml-dev at codespeak.net Subject: Re: [lxml-dev] About lxml status Hey, Stefan Behnel wrote: [snip] > lxml does not implement the DOM API either. Instead, as the cheeseshop > page nicely states: > > --------------------- > lxml is a Pythonic binding for the libxml2 and libxslt libraries. It > provides safe and convenient access to these libraries using the ElementTree API. Note that the ElementTree API is a developing Python standard, implemented by 3 separate libraries, ElementTree, cElementTree and lxml. ElementTree and cElement have become part of the core Python distribution as of Python 2.5. A lot of ElementTree documentation can be found here: http://effbot.org/zone/element-index.htm You can do common DOM-style manipulations through this API, just in a more convenient manner. As to XPath and XSLT support, lxml has that, including the ability to create extension functions and the like. There are differences in feature set here and there, but overall lxml should be able to compete with 4Suite. Regards, Martijn _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev From dsoulayrol at free.fr Mon Dec 11 23:10:53 2006 From: dsoulayrol at free.fr (David Soulayrol) Date: Mon, 11 Dec 2006 23:10:53 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: References: <1165846534.30509.13.camel@dsoulayr.neotip> <457D6A1A.7080904@gkec.informatik.tu-darmstadt.de> Message-ID: <1165875053.6426.3.camel@localhost> Good evening. > Hey, > > Stefan Behnel wrote: > [snip] > > lxml does not implement the DOM API either. Instead, as the cheeseshop page > > nicely states: > > > > --------------------- > > lxml is a Pythonic binding for the libxml2 and libxslt libraries. It provides > > safe and convenient access to these libraries using the ElementTree API. > > Note that the ElementTree API is a developing Python standard, > implemented by 3 separate libraries, ElementTree, cElementTree and lxml. > ElementTree and cElement have become part of the core Python > distribution as of Python 2.5. > > A lot of ElementTree documentation can be found here: > > http://effbot.org/zone/element-index.htm > > You can do common DOM-style manipulations through this API, just in a > more convenient manner. Thanks for all your answers and you Martin for these precisions. I'm not very used to Python 2.5 yet. I will have a deeper look in the new Python features and the ElementTree API. > As to XPath and XSLT support, lxml has that, including the ability to > create extension functions and the like. There are differences in > feature set here and there, but overall lxml should be able to compete > with 4Suite. > > Regards, > > Martijn > Thanks again. -- David From faassen at startifact.com Tue Dec 12 18:59:55 2006 From: faassen at startifact.com (Martijn Faassen) Date: Tue, 12 Dec 2006 18:59:55 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: <200612111953.kBBJr70e002706@mail.elecdev.com> References: <200612111953.kBBJr70e002706@mail.elecdev.com> Message-ID: Hello, Lee Brown wrote: > This discussion reminded me of a question that I've been pondering: will the > XPATH support in LXML eventually include support for axes and predicates? Could you explain how lxml is lacking in support for axes and predicates? Possibly you've only been looking at '.find()'. which is the ElementTree compatible buth limited xpath implementation, and not at the full '.xpath()' functionality? Regards, Martijn From ceplm at seznam.cz Thu Dec 14 11:14:52 2006 From: ceplm at seznam.cz (Matej Cepl) Date: Thu, 14 Dec 2006 10:14:52 +0000 (UTC) Subject: [lxml-dev] jbrout fails to work second time on Fedora Core 6/RHEL 5b2 Message-ID: Hi, trying to package jbrout (photo management application) for Fedora Extras and I get always this error when running jbrout for the second time (this time on RHEL 5beta2; verision of python-lxml is still just 1.0.3.2): [matej at hubmaier ~]$ jbrout GTK Accessibility Module initialized Traceback (most recent call last): File "/usr/share/jbrout/jbrout.py", line 2164, in ? main() File "/usr/share/jbrout/jbrout.py", line 2133, in main JBrout.init(canModify) File "/usr/share/jbrout/db.py", line 1297, in init JBrout.db = DBPhotos( JBrout.getConfFile("db.xml") ) File "/usr/share/jbrout/db.py", line 118, in __init__ self.root = ElementTree(file=file).getroot() File "etree.pyx", line 1504, in etree.ElementTree File "parser.pxi", line 687, in etree._parseDocument File "parser.pxi", line 624, in etree._parseDocFromFile File "parser.pxi", line 364, in etree._BaseParser._parseDocFromFile File "parser.pxi", line 432, in etree._handleParseResult File "parser.pxi", line 403, in etree._raiseParseError etree.XMLSyntaxError: line 1: PCDATA invalid Char value 2 [matej at hubmaier ~]$ Further discussion of this bug is on jbrout list at http://groups.google.com/group/jbrout/browse_thread/thread/73aaa54115930c5b Could anybody help me how to make this package work? Thanks a lot, Mat?j -- http://www.ceplovi.cz/matej/blog/, Jabber: ceplmajabber.cz GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC My life has been full of terrible misfortunes most of which never happened. -- Michel de Montaigne From fredrik at pythonware.com Thu Dec 14 13:05:19 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 14 Dec 2006 13:05:19 +0100 Subject: [lxml-dev] jbrout fails to work second time on Fedora Core 6/RHEL 5b2 In-Reply-To: References: Message-ID: Matej Cepl wrote: > File "/usr/share/jbrout/db.py", line 1297, in init > JBrout.db = DBPhotos( JBrout.getConfFile("db.xml") ) ... > etree.XMLSyntaxError: line 1: PCDATA invalid Char value 2 > > Further discussion of this bug is on jbrout list at > http://groups.google.com/group/jbrout/browse_thread/thread/73aaa54115930c5b > > Could anybody help me how to make this package work? the error message says that the "db.xml" file is broken. what does the first few lines in that file look like? From marian.schubert at gmail.com Thu Dec 14 14:38:49 2006 From: marian.schubert at gmail.com (Marian Schubert) Date: Thu, 14 Dec 2006 14:38:49 +0100 Subject: [lxml-dev] lxml segfaults while instantiating ElementBase Message-ID: Hello, i guess it should not be instantiated but still... Python 2.4.4 (#2, Oct 20 2006, 00:23:25) [GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2 >>> from lxml.etree import ElementBase >>> ElementBase() Segmentation fault lxml.etree: (1, 1, 1, 0) libxml used: (2, 6, 27) libxml compiled: (2, 6, 26) libxslt used: (1, 1, 18) libxslt compiled: (1, 1, 17) cu, Maio From lee.brown at elecdev.com Thu Dec 14 14:51:57 2006 From: lee.brown at elecdev.com (Lee Brown) Date: Thu, 14 Dec 2006 08:51:57 -0500 Subject: [lxml-dev] About lxml status In-Reply-To: Message-ID: <200612141351.kBEDpj0e021058@mail.elecdev.com> Greetings! I apologize for my mistaken presumption. I presumed it only supported basic Xpath functions because the examples on the lxml API web page only show basic examples. I am at a significant disadvantage when it comes to LXML and the underlying libxml2/libxslt libraries as I cannot read the C source. (Back when I took formal programming courses, the three choices offered to engineering students were Basic, Fortran, and this hot, up-and-coming language called Pascal which was supposed to set the world on fire.) All of the documentation for the libxml2/libxslt libraries on xmlsoft.org is written from a C perspective and checking out the source code for lxml won't help me much, either. So I am limited to whatever I can glean from the lxml web site examples and whatever I can discover using the usual Python code inspection techniques. (Which don't go very far when much of the functionality resides in precompiled binaries.) So really, the only way I'd be able to determine how far support for a given X-standard goes in lxml is to write a whole bunch of test cases. (This is how I figured out that lxml has broad support for the Xinclude standard, even though the lxml API page states that "simple" Xinclude suport exists.) Please don't infer from this that I have a negative tone towards lxml; I do not. I think it's absolutely great. I have tried pretty much every Python-based XML/XSLT/Xwhatever code base out there and lxml is really the only one that is robust enough, reliable enough, and FAST enough to be useful for production use. I am currently using lxml in conjunction with Mod Python on an Apache web server to serve XML content data, merging dynamic data through Xincludes and transforming the output on-the-fly into XHTML using XSLT templates. It works great! If there's one thing I'd like to add to the lxml "wish list" it would be some more in-depth examples on the web site - there's a lot more things I'd like to be doing with lxml if I could just figure out if it will do them and how. -----Original Message----- From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Martijn Faassen Sent: Tuesday, December 12, 2006 1:00 PM To: lxml-dev at codespeak.net Subject: Re: [lxml-dev] About lxml status Hello, Lee Brown wrote: > This discussion reminded me of a question that I've been pondering: > will the XPATH support in LXML eventually include support for axes and predicates? Could you explain how lxml is lacking in support for axes and predicates? Possibly you've only been looking at '.find()'. which is the ElementTree compatible buth limited xpath implementation, and not at the full '.xpath()' functionality? Regards, Martijn _______________________________________________ lxml-dev mailing list lxml-dev at codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev From hjh at alterras.de Thu Dec 14 15:49:40 2006 From: hjh at alterras.de (=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay) Date: Thu, 14 Dec 2006 15:49:40 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: <200612141351.kBEDpj0e021058@mail.elecdev.com> References: <200612141351.kBEDpj0e021058@mail.elecdev.com> Message-ID: <1166107780.1729.63.camel@hera.local> Dear Lee Brown, I use lxml in the same context but their is an issue I like to warn you about, it is not possible to access global precompiled XSLT styles from different theads. But mod python uses multiple threads. The only solution I found up till now is to prevent mod_python from using multiple threads by globaly setting PythonInterpreter somename in the mod_python related apache config while using the prefork apache module or to build the style on each request new. If you see a better solution please tell me. Regards Hans Am Donnerstag, den 14.12.2006, 08:51 -0500 schrieb Lee Brown: I am currently using lxml in conjunction with Mod Python on an Apache web server to serve XML content data, merging dynamic data through Xincludes and transforming the output on-the-fly into XHTML using XSLT templates. From lee.brown at elecdev.com Thu Dec 14 16:46:36 2006 From: lee.brown at elecdev.com (Lee Brown) Date: Thu, 14 Dec 2006 10:46:36 -0500 Subject: [lxml-dev] About lxml status In-Reply-To: <1166107780.1729.63.camel@hera.local> Message-ID: <200612141546.kBEFkO0e023428@mail.elecdev.com> Greetings! Thanks for the warning, but I've already run headfirst into that problem. My apache server is running the Win32MPM, where every request is a new thread, so there aren't any tricks I can play with the PythonInterpreter directive. (None that help, anyway.) However, I did some benchmark tests and found that I can serve about 32 requests per second even with the overhead of recompiling the XSLT template new for each request. This is adequate for my needs, though a very busy website might have trouble. One thing I haven't tried is to pre-compile my XSLT templates and cPickle them to disk files and then unpickle a copy to serve the request. A web server with a good file caching system might have to do very few actual disk reads but whether it is faster to unpickle a compiled template object than to just re-compile a new one remains unknown. If anyone has a suggestion for building a thread-safe set of precompiled templates, I'm all ears.... -----Original Message----- From: Hans-J?rgen Hay [mailto:hjh at alterras.de] Sent: Thursday, December 14, 2006 9:50 AM To: Lee Brown; lxml-dev at codespeak.net Subject: Re: [lxml-dev] About lxml status Dear Lee Brown, I use lxml in the same context but their is an issue I like to warn you about, it is not possible to access global precompiled XSLT styles from different theads. But mod python uses multiple threads. The only solution I found up till now is to prevent mod_python from using multiple threads by globaly setting PythonInterpreter somename in the mod_python related apache config while using the prefork apache module or to build the style on each request new. If you see a better solution please tell me. Regards Hans Am Donnerstag, den 14.12.2006, 08:51 -0500 schrieb Lee Brown: I am currently using lxml in conjunction with Mod Python on an Apache web server to serve XML content data, merging dynamic data through Xincludes and transforming the output on-the-fly into XHTML using XSLT templates. From hjh at alterras.de Thu Dec 14 17:31:48 2006 From: hjh at alterras.de (=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay) Date: Thu, 14 Dec 2006 17:31:48 +0100 Subject: [lxml-dev] About lxml status In-Reply-To: <200612141546.kBEFkO0e023428@mail.elecdev.com> References: <200612141546.kBEFkO0e023428@mail.elecdev.com> Message-ID: <1166113908.1729.87.camel@hera.local> Greethings, I found out very late and this gave me serious headaches. pickle does not work with XSLT objects. eaven if it did it would propably perform much worse than building from source. But thanx for the tip, maybe the developers can help out a little with coarse graind locking at a later stage. Using lxml with mod_python should be an interesting use case. Regards Hans Am Donnerstag, den 14.12.2006, 10:46 -0500 schrieb Lee Brown: > Greetings! > > Thanks for the warning, but I've already run headfirst into that problem. My > apache server is running the Win32MPM, where every request is a new thread, so > there aren't any tricks I can play with the PythonInterpreter directive. (None > that help, anyway.) > > However, I did some benchmark tests and found that I can serve about 32 requests > per second even with the overhead of recompiling the XSLT template new for each > request. This is adequate for my needs, though a very busy website might have > trouble. > > One thing I haven't tried is to pre-compile my XSLT templates and cPickle them > to disk files and then unpickle a copy to serve the request. A web server with > a good file caching system might have to do very few actual disk reads but > whether it is faster to unpickle a compiled template object than to just > re-compile a new one remains unknown. > > If anyone has a suggestion for building a thread-safe set of precompiled > templates, I'm all ears.... > > -----Original Message----- > From: Hans-J?rgen Hay [mailto:hjh at alterras.de] > Sent: Thursday, December 14, 2006 9:50 AM > To: Lee Brown; lxml-dev at codespeak.net > Subject: Re: [lxml-dev] About lxml status > > Dear Lee Brown, > > I use lxml in the same context but their is an issue I like to warn you about, > it is not possible to access global precompiled XSLT styles from different > theads. But mod python uses multiple threads. The only solution I found up till > now is to prevent mod_python from using multiple threads by globaly setting > > PythonInterpreter somename > > in the mod_python related apache config while using the prefork apache module or > to build the style on each request new. If you see a better solution please tell > me. > > Regards > Hans > > > Am Donnerstag, den 14.12.2006, 08:51 -0500 schrieb Lee Brown: > > I am currently using lxml in conjunction with Mod Python on an Apache web server > to serve XML content data, merging dynamic data through Xincludes and > transforming the output on-the-fly into XHTML using XSLT templates. > From ianb at colorstudy.com Thu Dec 14 22:49:40 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 14 Dec 2006 15:49:40 -0600 Subject: [lxml-dev] About lxml status In-Reply-To: <200612141546.kBEFkO0e023428@mail.elecdev.com> References: <200612141546.kBEFkO0e023428@mail.elecdev.com> Message-ID: <4581C6F4.6000504@colorstudy.com> Lee Brown wrote: > Thanks for the warning, but I've already run headfirst into that problem. My > apache server is running the Win32MPM, where every request is a new thread, so > there aren't any tricks I can play with the PythonInterpreter directive. (None > that help, anyway.) > > However, I did some benchmark tests and found that I can serve about 32 requests > per second even with the overhead of recompiling the XSLT template new for each > request. This is adequate for my needs, though a very busy website might have > trouble. I'm not clear exactly on the way threads and mod_python and all that work, but I imagine you could use a pool of templates. You'd do something like: try: tmpl = template_pool.pop() except IndexError: tmpl = compile_template() # then to return the template to the pool: template_pool.append(tmpl) This is assuming that it's okay to move templates between threads, but not use them concurrently between threads. Or if they have to be used in the thread they were created in, you can use: import threading template_cache = threading.local() try: tmpl = template_cache.template except AttributeError: tmpl = template_cache.template = compile_template() That's assuming that threads are long-lived, otherwise this won't change anything either. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From lee.brown at elecdev.com Fri Dec 15 00:09:36 2006 From: lee.brown at elecdev.com (Lee Brown) Date: Thu, 14 Dec 2006 18:09:36 -0500 Subject: [lxml-dev] About lxml status In-Reply-To: <4581C6F4.6000504@colorstudy.com> Message-ID: <200612142309.kBEN9M0e031831@mail.elecdev.com> Greetings! The Apache web server has several different MPMs (Multi-Processing Modules) available to it (unless you're running the Win32MPM, in which case that's the one you're stuck with.) But basically, the web server can spawn either processes or threads to handle incoming requests. In the Win32MPM, each VHOST (virtual web site) runs as a separate OS process and each request that a VHOST receives is handled entirely as a thread within that process. Each thread invokes a chain of request handlers (code modules that handle specific tasks like authentication, authorization, content delivery, output filtering, and so forth) that are instantiated for that thread and then they die at the end of the request. Request threads may arrive simultaneously and are by nature very short-lived. If a VHOST gets 32 simultaneous requests, 32 threads get created and then within a second or two all 32 threads are finished and terminated. (By default the Win32 MPM can have a maximum of 250 concurrent threads.) What Mod Python does is to allow you to specify a python function that will handle a specific task or tasks in the chain in lieu of Apache's standard handlers. Mod Python's default behavior is to create a Python interpreter for each VHOST and this interpreter is responsible for executing the various handler functions in a thread-safe way for each request. (I have NO idea how it does it, nor is my state of confusion likely to change even if someone explains it to me.) The source code containing the function is imported as a module at interpreter startup in the normal 'Python' way, that is, executable code in the module defined outside of the handler function definition(s) is executed on import and is global to the handler function(s). So, naively, I wrote some global code to pre-load and pre-compile all of my XSLT templates into a dictionary at startup. Then, within the handler function definition I look up the correct template in the dictionary and use it to transform the parsed XML source object. This worked just fine as long as one and only one thread was being executed at any given time. Simultaneous requests would either bomb out with a threading-related error or just hang until the server ran out of available threads and crashed. Apparently, Mod Python can dole out handler functions in a thread-safe way, but any global objects you create at import time are not so lucky. Nor does there seem to be any way to share an object from one thread with another thread. One way around this may be to pass a copy of the template dictionary to the handler function, that is, pass a literal copy instead of an object reference. This would eliminate the time overhead of recompiling templates for each request at the expense of possibly having a lot of copies in-memory at one time. But since my server always seems to have plenty of free memory, I'll give it a try. -----Original Message----- From: Ian Bicking [mailto:ianb at colorstudy.com] Sent: Thursday, December 14, 2006 4:50 PM To: Lee Brown Cc: 'Hans-J?rgen Hay'; lxml-dev at codespeak.net Subject: Re: [lxml-dev] About lxml status Lee Brown wrote: > Thanks for the warning, but I've already run headfirst into that > problem. My apache server is running the Win32MPM, where every > request is a new thread, so there aren't any tricks I can play with > the PythonInterpreter directive. (None that help, anyway.) > > However, I did some benchmark tests and found that I can serve about > 32 requests per second even with the overhead of recompiling the XSLT > template new for each request. This is adequate for my needs, though > a very busy website might have trouble. I'm not clear exactly on the way threads and mod_python and all that work, but I imagine you could use a pool of templates. You'd do something like: try: tmpl = template_pool.pop() except IndexError: tmpl = compile_template() # then to return the template to the pool: template_pool.append(tmpl) This is assuming that it's okay to move templates between threads, but not use them concurrently between threads. Or if they have to be used in the thread they were created in, you can use: import threading template_cache = threading.local() try: tmpl = template_cache.template except AttributeError: tmpl = template_cache.template = compile_template() That's assuming that threads are long-lived, otherwise this won't change anything either. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From mcepl at redhat.com Fri Dec 15 09:15:44 2006 From: mcepl at redhat.com (Matej Cepl) Date: Fri, 15 Dec 2006 08:15:44 +0000 (UTC) Subject: [lxml-dev] jbrout fails to work second time on Fedora Core 6/RHEL 5b2 References: Message-ID: Fredrik Lundh scripst: > the error message says that the "db.xml" file is broken. what does the > first few lines in that file look like? The file is available in its entiriety on http://www.ceplovi.cz/matej/tmp/db.xml Thanks a lot for the answer, Mat?j -- http://www.ceplovi.cz/matej/blog/, Jabber: ceplmajabber.cz GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC Scouts are saving aluminum cans, bottles and other items to be recycled. Proceeds will be used to cripple children. -- from a church bulletin From faassen at startifact.com Tue Dec 19 21:42:38 2006 From: faassen at startifact.com (Martijn Faassen) Date: Tue, 19 Dec 2006 21:42:38 +0100 Subject: [lxml-dev] lxml segfaults while instantiating ElementBase In-Reply-To: References: Message-ID: Hello, Marian Schubert wrote: > i guess it should not be instantiated but still... > > Python 2.4.4 (#2, Oct 20 2006, 00:23:25) > [GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2 >>>> from lxml.etree import ElementBase >>>> ElementBase() > Segmentation fault > > > lxml.etree: (1, 1, 1, 0) > libxml used: (2, 6, 27) > libxml compiled: (2, 6, 26) > libxslt used: (1, 1, 18) > libxslt compiled: (1, 1, 17) Good point. We should be hiding this from import if we can. Stefan, any ideas? Regards, Martijn From faassen at startifact.com Tue Dec 19 21:45:51 2006 From: faassen at startifact.com (Martijn Faassen) Date: Tue, 19 Dec 2006 21:45:51 +0100 Subject: [lxml-dev] lxml documentation volunteers? Message-ID: Hi there, I think it's time to give the lxml documentation a reworking, in particular our API documentation. It must become more clear for users what is in the API; I find myself having to look at the code or do a dir() far too often to find whether a feature is supported. What I would wish for is API documentation similar to the module documentation on python.org. Something close to that would be familiar for users so they can get started with lxml quickly. I hope we can also continue to doctest code samples in the documentation. For completeness I think we would need to integrate the existing ElementTree API documentation so we have a one-stop-shop for people using lxml, instead of the scattered situation now. We should mark where an API is taken from ElementTree so that people can easily write compatible code. Such API documentation would also help us identify possible ElementTree APIs we haven't implemented yet, and it might also suggest APIs we want to imply that are currently missing. Any volunteers for this work? Regards, Martijn From howesteve at gmail.com Wed Dec 20 10:04:32 2006 From: howesteve at gmail.com (Steve Howe) Date: Wed, 20 Dec 2006 07:04:32 -0200 Subject: [lxml-dev] Processing instruction doubt Message-ID: <200612200704.32908.howesteve@gmail.com> Hello all, This should be rather a stupid question, but supposing I have an ElementTree instance, and I want to add a processing instruction to it - in my case, a xml-stylesheet PI - how do I add that PI in the correct location of the tree (i.e. before the root) without serializing ? ... Before someone answers "add the PI into the XSLT before serializing it", the ElementTree is received by a function from an end user and I have no control over what's received. Thanks. -- Best Regards, Steve Howe From faassen at startifact.com Thu Dec 21 17:46:02 2006 From: faassen at startifact.com (Martijn Faassen) Date: Thu, 21 Dec 2006 17:46:02 +0100 Subject: [lxml-dev] relax ng bug: validation twice doesn't give same answer Message-ID: Hi there, We just ran into the following problem with lxml's RelaxNG validation. Validating with the same RelaxNG schema gives the right result (invalid). Validating again however gives valid! This script is a minimal test case that demonstrates the problem. Tested with lxml 1.1.2 and libxml2 2.6.24 and also 2.6.26 on another machine. From an earlier thread last year it's possible that this is a libxml2 bug: http://codespeak.net/pipermail/lxml-dev/2005-September/000423.html This was more than a year ago though and it's somewhat surprising this still wasn't fixed. I found the bug report: http://bugzilla.gnome.org/show_bug.cgi?id=315883 and have added something to it in the hope it'll spur some activity in confirming it... Regards, Martijn from lxml import etree from StringIO import StringIO v = etree.RelaxNG(etree.parse(StringIO('''\ '''))) # this is an invalid document d = etree.parse(StringIO('''\ ''')) first = v.validate(d) # returns 0, what is expected second = v.validate(d) # returns 1! assert first == second, "Validity isn't the same over time" From Holger.Joukl at LBBW.de Fri Dec 29 16:16:12 2006 From: Holger.Joukl at LBBW.de (Holger Joukl) Date: Fri, 29 Dec 2006 16:16:12 +0100 Subject: [lxml-dev] [objectify] __MATCH_PATH_SEGMENT regex modification suggestion Message-ID: Hi, I suggest to loosen the __MATCH_PATH_SEGMENT regex a little to care for more possible element names, which are sometimes outside of my control. Currently ObjectPath chokes on paths like 'root.a-x.a-y'. While such names are often inconvenient at best I found that python itself is quite non-restrictive wrt attibute names: python2.4 Python 2.4.3 (#2, Nov 20 2006, 16:26:48) [GCC 2.95.2 19991024 (release)] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> class Foo(object): ... pass ... >>> setattr(Foo, 'a-b', "hmm") >>> Hence, I propose to change the regex to: cdef object __MATCH_PATH_SEGMENT __MATCH_PATH_SEGMENT = re.compile( r"(\.?)\s*(?:\{([^}]*)\})?\s*([^.{}]+)\s*(?:\[\s*([-0-9]+)\s*\])?", re.U).match (Changed: ([^.{}]+) replaces (\w+)) Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde, verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version. From diapriid at gmail.com Sat Dec 30 23:20:24 2006 From: diapriid at gmail.com (Matt) Date: Sat, 30 Dec 2006 16:20:24 -0600 Subject: [lxml-dev] lxml on OS X compile problems Message-ID: <19d6b9770612301420h8fe9718s47d538e786ff392a@mail.gmail.com> Hi All, My first question - anyone successfully build lxml on OS X 10.3.9? And next, have they encoutnered the following error - specifics- OS X 10.3.9 gcc 3.3 python 2.4.4 lxml 1.1.2 libxml2 config line - ./configure \ --with-python=/Library/Frameworks/Python.framework/Versions/2.4/ libxslt config line- ./configure \ --with-python=/Library/Frameworks/Python.framework/Versions/2.4/ \ --prefix=/usr/local \ --with-libxml-prefix=/usr/local \ --with-libxml-include-prefix=/usr/local/include \ --with-libxml-libs-prefix=/usr/local/lib Results- xsltproc was compiled against libxml 20611, libxslt 10109 and libexslt 807 libxslt 10117 was compiled against libxml 20626 libexslt 807 was compiled against libxml 20611 I'm a little worried that the xsltproc wasn't combiled against the same libraries- but I'm not sure how to resolve this. I've played with installing various versions of libxslt and libxml and they seem to be installing- though all the libxml2 headers may not be installing properly. The present error occurs since I've copied the missing headers xmlstring.hand xlmsave.h to /usr/include/. Using Pyrex to create the objectify.c file doesn't help. >python setup.py install Building lxml version 1.2.dev-35415 running install running bdist_egg running egg_info writing src/lxml.egg-info/PKG-INFO writing top-level names to src/lxml.egg-info/top_level.txt writing dependency_links to src/lxml.egg-info/dependency_links.txt reading manifest template 'MANIFEST.in' warning: no files found matching 'objectify.c' under directory 'src/lxml' warning: no files found matching '*.html' under directory 'doc' warning: no files found matching '*.py' under directory 'Pyrex' writing manifest file 'src/lxml.egg-info/SOURCES.txt' installing library code to build/bdist.macosx-10.3-ppc/egg running install_lib running build_py running build_ext building 'lxml.etree' extension gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -I/usr/include/libxml2 -I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4 -c src/lxml/etree.c -o build/temp.macosx-10.3-ppc-2.4/src/lxml/etree.o -w In file included from src/lxml/etree.c:27: /usr/include/libxml2/libxml/xmlstring.h:28: error: redefinition of `xmlChar' /usr/include/libxml2/libxml/tree.h:107: error: `xmlChar' previously declared here /usr/include/libxml2/libxml/xmlstring.h:40: error: syntax error before "xmlChar" /usr/include/libxml2/libxml/xmlstring.h:41: error: parse error before "xmlStrdup" <... and much more ...> Any hints? Thanks for your time, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20061230/2a366ad6/attachment.htm