From jkrukoff at ltgc.com Fri Dec 1 02:50:06 2006
From: jkrukoff at ltgc.com (John Krukoff)
Date: Thu, 30 Nov 2006 18:50:06 -0700
Subject: [lxml-dev] find/findall not accepting qnames?
Message-ID: <1164937806.22052.44.camel@localhost>
I find it surprising that find & findall do not accept QName objects in
lxml, and instead require a manual cast to string, like so:
etree.XML( '' ).find( str( etree.QName( 'http://test',
'b' ) ) )
ElementTree appears to accept QNames transparently, in my limited
testing.
I haven't tested with earlier revisions, but after tracking down a
couple of bugs related to this, is this a change in behavior in recent
(1.1) lxml versions?
--
John Krukoff
Land Title Guarantee Company
From fredrik at pythonware.com Fri Dec 1 08:29:41 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 01 Dec 2006 08:29:41 +0100
Subject: [lxml-dev] find/findall not accepting qnames?
In-Reply-To: <1164937806.22052.44.camel@localhost>
References: <1164937806.22052.44.camel@localhost>
Message-ID:
John Krukoff wrote:
> I find it surprising that find & findall do not accept QName objects in
> lxml, and instead require a manual cast to string, like so:
>
> etree.XML( ' xmlns:ns0="http://test"/>' ).find( str( etree.QName( 'http://test',
> 'b' ) ) )
>
> ElementTree appears to accept QNames transparently, in my limited
> testing.
I find it surprising that it does, though. Not sure that's intentional ;-)
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Dec 1 09:27:02 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 01 Dec 2006 09:27:02 +0100
Subject: [lxml-dev] find/findall not accepting qnames?
In-Reply-To: <1164937806.22052.44.camel@localhost>
References: <1164937806.22052.44.camel@localhost>
Message-ID: <456FE756.8050402@gkec.informatik.tu-darmstadt.de>
Hi,
John Krukoff wrote:
> I find it surprising that find & findall do not accept QName objects in
> lxml
True, that's about the only place where we do not parse the input ourselves
but hand it to the ElementPath module of ElementTree. All other places use the
same function for parsing tag names, so that makes QNames completely
transparent in lxml.
I agree that this is unexpected behaviour and since we accept QNames in loads
of other places, we should make it a special case if a QName is passed as path
to find*(). After all, it's a common use case to look for a specific tag
instead of a path.
BTW, using getiterator(tag) for this purpose should be a bit faster.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Dec 1 10:21:19 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 01 Dec 2006 10:21:19 +0100
Subject: [lxml-dev] lxml 1.1 problems with python 2.3
In-Reply-To: <456F102F.10802@infrae.com>
References: <456726B9.2050800@infrae.com> <45672C1B.6060707@palladion.com> <45682053.7040108@gkec.informatik.tu-darmstadt.de> <456872CF.7020400@palladion.com> <456A8C6A.9000508@gkec.informatik.tu-darmstadt.de> <456DF1B4.5010104@infrae.com>
<456EA12E.4040504@gkec.informatik.tu-darmstadt.de>
<456F102F.10802@infrae.com>
Message-ID: <456FF40F.1040209@gkec.informatik.tu-darmstadt.de>
Hi Martijn,
Martijn Faassen wrote:
> Stupid of me not to see it earlier, but that's because it's trying to
> import from lxl.local_doctest and you added it as local_doctest.
Ah, stupid me then. :)
>>> Things then fail with what looks like a new, unrelated issue:
>>>
>>> Traceback (most recent call last):
>>> File "test.py", line 591, in ?
>>> exitcode = main(sys.argv)
>>> File "test.py", line 554, in main
>>> test_cases = get_test_cases(test_files, cfg, tracer=tracer)
>>> File "test.py", line 254, in get_test_cases
>>> module = import_module(file, cfg, tracer=tracer)
>>> File "test.py", line 197, in import_module
>>> mod = __import__(modname)
>>> File
>>> "/home/faassen/working/lxml/lxml-trunk/src/lxml/tests/test_objectify.py",
>>>
>>> line 16, in ?
>>> from lxml import objectify
>>> ImportError:
>>> /home/faassen/working/lxml/lxml-trunk/src/lxml/objectify.so: undefined
>>> symbol: previousElement
>>
>> That's rather bizarre, previousElement is definitely a public function
>> (i.e.
>> defined in etree.so). I have no idea how that could be missing.
>
> It's consistently missing though in Python 2.3. Perhaps it accidentally
> gets turned off together with thread support? I did try to test this
> theory yesterday though on Python 2.4 by explicitly disabling tests, and
> that didn't help.
Ok, then, first thing to check: does "previousElement" turn up as a static
function in the generated src/lxml/etree.h? Could you check what the
preprocessor sees in objectify.c (gcc -E)?
On my side (Py 2.5), it sees the following:
-----------------------
...
static xmlNode (*((*nextElement)(xmlNode (*))));
static xmlNode (*((*previousElement)(xmlNode (*))));
...
{"nextElement", &nextElement},
{"previousElement", &previousElement},
...
__pyx_v_next = nextElement;
...
__pyx_v_next = previousElement;
...
-----------------------
I'm showing both functions here, as both are used in objectify, but only the
second seems to be missing according to your report. If this looks the same on
your side, I'm really out of ideas.
Stefan
From faassen at infrae.com Fri Dec 1 13:44:53 2006
From: faassen at infrae.com (Martijn Faassen)
Date: Fri, 01 Dec 2006 13:44:53 +0100
Subject: [lxml-dev] lxml 1.1 problems with python 2.3
In-Reply-To: <456FF40F.1040209@gkec.informatik.tu-darmstadt.de>
References: <456726B9.2050800@infrae.com> <45672C1B.6060707@palladion.com> <45682053.7040108@gkec.informatik.tu-darmstadt.de> <456872CF.7020400@palladion.com> <456A8C6A.9000508@gkec.informatik.tu-darmstadt.de> <456DF1B4.5010104@infrae.com> <456EA12E.4040504@gkec.informatik.tu-darmstadt.de> <456F102F.10802@infrae.com>
<456FF40F.1040209@gkec.informatik.tu-darmstadt.de>
Message-ID: <457023C5.8030908@infrae.com>
Stefan Behnel wrote:
[snip]
>> It's consistently missing though in Python 2.3. Perhaps it accidentally
>> gets turned off together with thread support? I did try to test this
>> theory yesterday though on Python 2.4 by explicitly disabling tests, and
>> that didn't help.
>
> Ok, then, first thing to check: does "previousElement" turn up as a static
> function in the generated src/lxml/etree.h?
The only reference to previousElement (and nextElement) in etree.h are here:
extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*))));
extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*))));
> Could you check what the
> preprocessor sees in objectify.c (gcc -E)?
Hm, I wasn't previously familiar with gcc -E. I tried running it against
objectify.c but got a lot of missing includes for Python and libxml2
(which is odd as these things are in /usr/include).
I'm not quite sure how you generate your output, but here's my reference
to previousElement when I do gcc -E:
extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*))));
extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*))));
...
__pyx_v_next = nextElement;
...
__pyx_v_next = previousElement;
...
Hm, is it possible I'm using the wrong version of Pyrex? I have lxml's
version installed for Python 2.4 but I guess I don't have that one for
Python 2.3... Us having to maintain our own version of Pyrex rather
sucks. I just installed lxml's version of Pyrex, and now the tests
start. We still get some failures, though. Most of them are because
'assertFalse' doesn't appear to exist. I added this to HelperTestCase
and made those errors go away.
There's also the use of operator.itemgetter, which was only introduced
in Python 2.4. I hacked up a simplistic implementation too.
Now we're down to one failure in Python 2.3:
======================================================================
FAIL: test_findall (lxml.tests.test_objectify.ObjectifyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/faassen/working/lxml/src/lxml/tests/test_objectify.py",
line 218, in test_findall
root.getchildren()[:2])
File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
raise self.failureException, \
AssertionError: [, ''] != [, '']
You'd think that this *should* be equal and thus succeed. Possibly some
rich comparison feature that doesn't exist yet in Python 2.3? Back to
you, Stephan. :)
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Dec 1 14:39:55 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Fri, 01 Dec 2006 14:39:55 +0100
Subject: [lxml-dev] lxml 1.1 problems with python 2.3
In-Reply-To: <457023C5.8030908@infrae.com>
References: <456726B9.2050800@infrae.com> <45672C1B.6060707@palladion.com> <45682053.7040108@gkec.informatik.tu-darmstadt.de> <456872CF.7020400@palladion.com> <456A8C6A.9000508@gkec.informatik.tu-darmstadt.de> <456DF1B4.5010104@infrae.com> <456EA12E.4040504@gkec.informatik.tu-darmstadt.de> <456F102F.10802@infrae.com>
<456FF40F.1040209@gkec.informatik.tu-darmstadt.de>
<457023C5.8030908@infrae.com>
Message-ID: <457030AB.7020302@gkec.informatik.tu-darmstadt.de>
Hi Martijn,
Martijn Faassen wrote:
> Hm, I wasn't previously familiar with gcc -E. I tried running it against
> objectify.c but got a lot of missing includes for Python and libxml2
You can use the same command line that distutils use to compile the module,
except for the "-c xxx.so" part.
> Us having to maintain our own version of Pyrex rather sucks.
Sure, but it's currently not that easy to push things upstream back into
Pyrex. Maybe Greg manages to get some work done over Christmas.
> I just installed lxml's version of Pyrex, and now the tests
> start.
Ah, finally. :)
> Now we're down to one failure in Python 2.3:
>
> ======================================================================
> FAIL: test_findall (lxml.tests.test_objectify.ObjectifyTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/home/faassen/working/lxml/src/lxml/tests/test_objectify.py",
> line 218, in test_findall
> root.getchildren()[:2])
> File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
> raise self.failureException, \
> AssertionError: [, ''] != [ b787f0cc>, '']
>
> You'd think that this *should* be equal and thus succeed. Possibly some
> rich comparison feature that doesn't exist yet in Python 2.3?
Or maybe just works differently. That was a bad test case anyway, as equality
of objectified elements is not really well defined in general. It can be type
specific, which might be the problem here already.
I changed that to an identity test, so it should work now.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Dec 2 22:14:30 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Sat, 02 Dec 2006 22:14:30 +0100
Subject: [lxml-dev] XInclude does not support Resolvers?
In-Reply-To: <456DEEE0.9090606@infrae.com>
References: <1164728726.7952.134.camel@ltucker.openplans.org>
<456C6D8B.6040004@gkec.informatik.tu-darmstadt.de>
<456DEEE0.9090606@infrae.com>
Message-ID: <4571ECB6.9040200@gkec.informatik.tu-darmstadt.de>
Hi,
Martijn Faassen wrote:
> I'm fine with supporting something Python-based in addition to the
> libxml2 version, but I think the XInclude implementation in libxml2 has
> the benefit in that it's probably fairly complete and besides, *they*'re
> maintaining it, not us. :) So, I'm fine with adding our own XInclude
> support, as long as it's in addition and not a replacement, along the
> same lines as the way we support ElementTree's 'find' together with our
> own 'xpath'.
I copied ET's ElementInclude module over to lxml (trunk) and modified it a
bit. The related tests in ET's selftest.py pass (with one minor exception),
although the serialisations can look a little different (so I had to fix the
doctests a little).
The implementation is adapted in that it uses Element.getiterator() to find
the XInclude elements. I also had to extend lxml's API in order to make the
original parser of a document available at the API level. There is now a
'parser' property on _ElementTree that is used by ElementInclude to provide
the same parser configuration (including resolvers) as for the source document.
It's not tested much, so I'd be glad if others could give it a try.
Hope it's useful,
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 4 08:37:47 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 04 Dec 2006 08:37:47 +0100
Subject: [lxml-dev] Customised xmlReconciliateNs() for lxml
Message-ID: <4573D04B.5060000@gkec.informatik.tu-darmstadt.de>
Hi all,
we had a couple of problems in the past that were related to the
xmlReconciliateNs() function in libxml2. Basically, it cleans up the
namespaces declared in a subtree after moving it to a new position inside a
document or from one document to another.
I rewrote this function in Pyrex and customised it to what we need in lxml. It
now tries to drop redundant declarations that were already available in the
new ancestors, and it avoids the bug that made lxml crash when parsing with
the COMPACT option. It also sets the new _Document reference in the same step,
which reduces the need for a second traversal step. There may be other
possible optimisations, but it's not always obvious how they behave in the
various possible use cases, so I'm a bit conservative here. This is a pretty
critical function, it can both make lxml crash and break namespace handling...
Anyway, I hope that having this function inside lxml will help us to further
optimise it in the future.
Stefan
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 4 08:49:22 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 04 Dec 2006 08:49:22 +0100
Subject: [lxml-dev] redundant namespace declarations
In-Reply-To: <4564B5CB.9050001@gkec.informatik.tu-darmstadt.de>
References: <20061120144135.GA23359@tttech.com>
<4564B5CB.9050001@gkec.informatik.tu-darmstadt.de>
Message-ID: <4573D302.7090207@gkec.informatik.tu-darmstadt.de>
Hi again,
Stefan Behnel wrote:
> Albert Brandl wrote:
>> The problem occurs with the following code:
>>
>> nsmap = dict (foo="http://foo.org", bar = "http://bar.org")
>> e = Element("{http://foo.org}somefoo", nsmap = nsmap)
>> s = Element("{http://bar.org}somebar", nsmap = nsmap)
>> e.append(s1)
>> et = ElementTree(e)
>> et.write("foo.xml", pretty_print = True)
>>
>> This code creates the following XML file:
>>
>>
>>
>>
>>
>> Is this a known bug?
>
> It's known - though not really a bug but rather an inconvenience. Currently,
> we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces
> when merging trees. This function shows the above behaviour. To fix this, we'd
> have to implement our own version, which is a bit tricky and just wasn't
> important enough to try to get right so far. Note that even libxml2 had a
> (minor) bug up to version 2.6.26 here, so it's really not trivial to get this
> kind of thing right.
I finally took a(nother) shot at it and I now have an implementation that can
avoid this kind of problem. It's currently stored in the "nscleanup" branch,
but I will move it to the trunk ASAP. Please give it a try then, to see if it
works nicely for you in other cases where you encountered this.
Stefan
From faassen at startifact.com Mon Dec 4 16:53:19 2006
From: faassen at startifact.com (Martijn Faassen)
Date: Mon, 4 Dec 2006 15:53:19 +0000 (UTC)
Subject: [lxml-dev] Python 2.4.1 and threading
Message-ID:
Hi there,
I think I just discovered that lxml 1.1.2 doesn't work clearly with
Python 2.4.1 either. It appears to work with Python 2.4.3 and 4, but
when I compile it for Python 2.4.1, it segfaults when you get an
error during parsing.
When I take lxml trunk and compile it without threading support, it
does all work with Python 2.4.1. This leads me to suspect threading
support is again the issue.
Stefan, perhaps you can turn off threading support not only in
Python 2.3 but also in (at least) Python 2.4.1.
Should we be going for a new release? Perhaps the world is ready for
a lxml 1.2. We haven't done a lot of changes except for the setup.py
stuff (that is, I can't see any mentioned in the CHANGES.txt..), but
I think those changes might warrant a new version number.
Regards,
Martijn
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 4 17:29:07 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 04 Dec 2006 17:29:07 +0100
Subject: [lxml-dev] Python 2.4.1 and threading
In-Reply-To:
References:
Message-ID: <45744CD3.9000604@gkec.informatik.tu-darmstadt.de>
Martijn Faassen wrote:
> I think I just discovered that lxml 1.1.2 doesn't work clearly with
> Python 2.4.1 either. It appears to work with Python 2.4.3 and 4, but
> when I compile it for Python 2.4.1, it segfaults when you get an
> error during parsing.
>
> When I take lxml trunk and compile it without threading support, it
> does all work with Python 2.4.1. This leads me to suspect threading
> support is again the issue.
Hmm, ok, I saw a couple of differences in the PyGILState_* API functions
between 2.3.6 and 2.4.4, so maybe they make a difference for us. It's still
possible that we're doing something wrong in lxml, but having it work with
newer versions lets me suspect that it's a race condition that was solved in
later Python versions).
> Stefan, perhaps you can turn off threading support not only in
> Python 2.3 but also in (at least) Python 2.4.1.
Ok, no problem. It's just a plain version number comparison.
> Should we be going for a new release? Perhaps the world is ready for
> a lxml 1.2. We haven't done a lot of changes except for the setup.py
> stuff (that is, I can't see any mentioned in the CHANGES.txt..), but
> I think those changes might warrant a new version number.
There are a couple of things that I expected to make it into 1.2, especially
the xmlReconciliateNs() replacement. But that one definitely needs more
testing before a release. There's also the integration of ElementInclude.py
that should be easier to integrate.
I don't think it's a good time to release "right now" or even next week, but I
agree that having a simpler-to-hack build process can become an opener and
should get a second-level version number to show that there may be things to
do to get it back working.
Stefan
From albert.brandl at tttech.com Wed Dec 6 10:21:12 2006
From: albert.brandl at tttech.com (Albert Brandl)
Date: Wed, 6 Dec 2006 10:21:12 +0100
Subject: [lxml-dev] redundant namespace declarations
In-Reply-To: <45745E7C.900@gkec.informatik.tu-darmstadt.de>
References: <20061120144135.GA23359@tttech.com>
<4564B5CB.9050001@gkec.informatik.tu-darmstadt.de>
<4573D302.7090207@gkec.informatik.tu-darmstadt.de>
<20061204092647.GA1898@tttech.com>
<45745E7C.900@gkec.informatik.tu-darmstadt.de>
Message-ID: <20061206092112.GA19902@tttech.com>
Hi,
On Mon, Dec 04, 2006 at 06:44:28PM +0100, Stefan Behnel wrote:
>
> Well, maybe that won't be that soon. It's currently undecided if this will be
> in the next release, so it will stay out of the trunk for now.
No problem. I've adapted the code responsible for building the element
tree to use SubElement instead of append, so that "pretty_print = True"
does what I want.
A minor problem remains, but I have a workaround for this. If you try to
write the document to a file without write access, "write_c14n" raises a
C14NError:
>>> from lxml.etree import *
>>> et = ElementTree(Element("abc"))
>>> et.write_c14n("nonwritable.xml")
>>> et.write_c14n("notwritable.xml")
Traceback (most recent call last):
File "", line 1, in ?
File "etree.pyx", line 657, in etree._ElementTree.write_c14n
File "serializer.pxi", line 224, in etree._tofilelikeC14N
etree.C14NError: C14N failed
But if you use "write" instead, lxml silently ignores the fact that the
file can't be written:
>>> et.write("notwritable.xml")
>>>
For now, I'm just writing to a StringIO buffer and open the file
manually, so this is no serious problem for me. But others might
well be bitten by this bug, even more so since lxml does not give
any feedback about what just happened.
Best regards,
Albert Brandl
From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Dec 6 23:18:28 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Wed, 06 Dec 2006 23:18:28 +0100
Subject: [lxml-dev] redundant namespace declarations
In-Reply-To: <20061206092112.GA19902@tttech.com>
References: <20061120144135.GA23359@tttech.com> <4564B5CB.9050001@gkec.informatik.tu-darmstadt.de> <4573D302.7090207@gkec.informatik.tu-darmstadt.de> <20061204092647.GA1898@tttech.com> <45745E7C.900@gkec.informatik.tu-darmstadt.de>
<20061206092112.GA19902@tttech.com>
Message-ID: <457741B4.7010103@gkec.informatik.tu-darmstadt.de>
Hi Albert,
Albert Brandl wrote:
> If you try to write the document to a file without write access,
> lxml silently ignores the fact that the file can't be written:
>
>.>>> et.write("notwritable.xml")
>.>>>
True, thanks for the report. Opening the file is done by libxml2 in this case
and we did not handle the case where it failed to do so. Fixed on the trunk.
Stefan
From dsoulayrol at free.fr Mon Dec 11 15:15:34 2006
From: dsoulayrol at free.fr (David Soulayrol)
Date: Mon, 11 Dec 2006 15:15:34 +0100
Subject: [lxml-dev] About lxml status
Message-ID: <1165846534.30509.13.camel@dsoulayr.neotip>
Hello,
I am currently using 4Suite-XML for a project on my own, which makes use
of DOM, XSLT and XPath. I've always looked around if I could find other
libraries to replace 4Suite eventually, and I (re-) discovered libxml
today, and found a link to lxml from there.
What I'd like to know is (before I dive deeply in documentation) how you
would compare the XML support between 4Suite and lxml. Would you say
lxml is ready for common DOM manipulations, XSLT transformations and
simple XPath usage ?
I ask this because I've read in http://xmlsoft.org/index.html:
"Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the
document model, but it doesn't implement the API itself, gdome2 does
this on top of libxml2"
'Hope I was not offending :)
Thanks,
--
David.
From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Dec 11 15:24:26 2006
From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel)
Date: Mon, 11 Dec 2006 15:24:26 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To: <1165846534.30509.13.camel@dsoulayr.neotip>
References: <1165846534.30509.13.camel@dsoulayr.neotip>
Message-ID: <457D6A1A.7080904@gkec.informatik.tu-darmstadt.de>
Hi,
David Soulayrol wrote:
> I am currently using 4Suite-XML for a project on my own, which makes use
> of DOM, XSLT and XPath. I've always looked around if I could find other
> libraries to replace 4Suite eventually, and I (re-) discovered libxml
> today, and found a link to lxml from there.
>
> What I'd like to know is (before I dive deeply in documentation) how you
> would compare the XML support between 4Suite and lxml. Would you say
> lxml is ready for common DOM manipulations, XSLT transformations and
> simple XPath usage ?
>
> I ask this because I've read in http://xmlsoft.org/index.html:
>
> "Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the
> document model, but it doesn't implement the API itself, gdome2 does
> this on top of libxml2"
lxml does not implement the DOM API either. Instead, as the cheeseshop page
nicely states:
---------------------
lxml is a Pythonic binding for the libxml2 and libxslt libraries. It provides
safe and convenient access to these libraries using the ElementTree API.
It extends the ElementTree API significantly to offer support for XPath,
RelaxNG, XML Schema, XSLT, C14N and much more.
---------------------
Feel free to find out more from the documentation, it's full of examples:
http://codespeak.net/lxml/#documentation
Stefan
From ogrisel at nuxeo.com Mon Dec 11 15:34:41 2006
From: ogrisel at nuxeo.com (Olivier Grisel)
Date: Mon, 11 Dec 2006 15:34:41 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To: <1165846534.30509.13.camel@dsoulayr.neotip>
References: <1165846534.30509.13.camel@dsoulayr.neotip>
Message-ID:
David Soulayrol a ?crit :
> Hello,
>
> I am currently using 4Suite-XML for a project on my own, which makes use
> of DOM, XSLT and XPath. I've always looked around if I could find other
> libraries to replace 4Suite eventually, and I (re-) discovered libxml
> today, and found a link to lxml from there.
>
> What I'd like to know is (before I dive deeply in documentation) how you
> would compare the XML support between 4Suite and lxml. Would you say
> lxml is ready for common DOM manipulations, XSLT transformations and
> simple XPath usage ?
>
> I ask this because I've read in http://xmlsoft.org/index.html:
>
> "Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the
> document model, but it doesn't implement the API itself, gdome2 does
> this on top of libxml2"
lxml is does provide a DOM API implementation but an ElementTree API which is
similar to DOM but simpler to use (more "pythonic"). As for XSLT and XPATH, lxml
support them out of the box.
If you really need a DOM API, the you probably should look at this project:
http://www.python.org/pypi/libxml2dom
--
Olivier
From faassen at startifact.com Mon Dec 11 20:30:58 2006
From: faassen at startifact.com (Martijn Faassen)
Date: Mon, 11 Dec 2006 20:30:58 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To: <457D6A1A.7080904@gkec.informatik.tu-darmstadt.de>
References: <1165846534.30509.13.camel@dsoulayr.neotip>
<457D6A1A.7080904@gkec.informatik.tu-darmstadt.de>
Message-ID:
Hey,
Stefan Behnel wrote:
[snip]
> lxml does not implement the DOM API either. Instead, as the cheeseshop page
> nicely states:
>
> ---------------------
> lxml is a Pythonic binding for the libxml2 and libxslt libraries. It provides
> safe and convenient access to these libraries using the ElementTree API.
Note that the ElementTree API is a developing Python standard,
implemented by 3 separate libraries, ElementTree, cElementTree and lxml.
ElementTree and cElement have become part of the core Python
distribution as of Python 2.5.
A lot of ElementTree documentation can be found here:
http://effbot.org/zone/element-index.htm
You can do common DOM-style manipulations through this API, just in a
more convenient manner.
As to XPath and XSLT support, lxml has that, including the ability to
create extension functions and the like. There are differences in
feature set here and there, but overall lxml should be able to compete
with 4Suite.
Regards,
Martijn
From lee.brown at elecdev.com Mon Dec 11 20:53:18 2006
From: lee.brown at elecdev.com (Lee Brown)
Date: Mon, 11 Dec 2006 14:53:18 -0500
Subject: [lxml-dev] About lxml status
In-Reply-To:
Message-ID: <200612111953.kBBJr70e002706@mail.elecdev.com>
Greetings!
This discussion reminded me of a question that I've been pondering: will the
XPATH support in LXML eventually include support for axes and predicates?
Also, congratulations on the thorough Xinclude support. Other than 4Suite, LXML
is one of the few parsers that handle the "parse=" attribute and the "fallback"
elements correctly.
-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On
Behalf Of Martijn Faassen
Sent: Monday, December 11, 2006 2:31 PM
To: lxml-dev at codespeak.net
Subject: Re: [lxml-dev] About lxml status
Hey,
Stefan Behnel wrote:
[snip]
> lxml does not implement the DOM API either. Instead, as the cheeseshop
> page nicely states:
>
> ---------------------
> lxml is a Pythonic binding for the libxml2 and libxslt libraries. It
> provides safe and convenient access to these libraries using the ElementTree
API.
Note that the ElementTree API is a developing Python standard, implemented by 3
separate libraries, ElementTree, cElementTree and lxml.
ElementTree and cElement have become part of the core Python distribution as of
Python 2.5.
A lot of ElementTree documentation can be found here:
http://effbot.org/zone/element-index.htm
You can do common DOM-style manipulations through this API, just in a more
convenient manner.
As to XPath and XSLT support, lxml has that, including the ability to create
extension functions and the like. There are differences in feature set here and
there, but overall lxml should be able to compete with 4Suite.
Regards,
Martijn
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
From dsoulayrol at free.fr Mon Dec 11 23:10:53 2006
From: dsoulayrol at free.fr (David Soulayrol)
Date: Mon, 11 Dec 2006 23:10:53 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To:
References: <1165846534.30509.13.camel@dsoulayr.neotip>
<457D6A1A.7080904@gkec.informatik.tu-darmstadt.de>
Message-ID: <1165875053.6426.3.camel@localhost>
Good evening.
> Hey,
>
> Stefan Behnel wrote:
> [snip]
> > lxml does not implement the DOM API either. Instead, as the cheeseshop page
> > nicely states:
> >
> > ---------------------
> > lxml is a Pythonic binding for the libxml2 and libxslt libraries. It provides
> > safe and convenient access to these libraries using the ElementTree API.
>
> Note that the ElementTree API is a developing Python standard,
> implemented by 3 separate libraries, ElementTree, cElementTree and lxml.
> ElementTree and cElement have become part of the core Python
> distribution as of Python 2.5.
>
> A lot of ElementTree documentation can be found here:
>
> http://effbot.org/zone/element-index.htm
>
> You can do common DOM-style manipulations through this API, just in a
> more convenient manner.
Thanks for all your answers and you Martin for these precisions. I'm not
very used to Python 2.5 yet. I will have a deeper look in the new Python
features and the ElementTree API.
> As to XPath and XSLT support, lxml has that, including the ability to
> create extension functions and the like. There are differences in
> feature set here and there, but overall lxml should be able to compete
> with 4Suite.
>
> Regards,
>
> Martijn
>
Thanks again.
--
David
From faassen at startifact.com Tue Dec 12 18:59:55 2006
From: faassen at startifact.com (Martijn Faassen)
Date: Tue, 12 Dec 2006 18:59:55 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To: <200612111953.kBBJr70e002706@mail.elecdev.com>
References:
<200612111953.kBBJr70e002706@mail.elecdev.com>
Message-ID:
Hello,
Lee Brown wrote:
> This discussion reminded me of a question that I've been pondering: will the
> XPATH support in LXML eventually include support for axes and predicates?
Could you explain how lxml is lacking in support for axes and
predicates? Possibly you've only been looking at '.find()'. which is the
ElementTree compatible buth limited xpath implementation, and not at the
full '.xpath()' functionality?
Regards,
Martijn
From ceplm at seznam.cz Thu Dec 14 11:14:52 2006
From: ceplm at seznam.cz (Matej Cepl)
Date: Thu, 14 Dec 2006 10:14:52 +0000 (UTC)
Subject: [lxml-dev] jbrout fails to work second time on Fedora Core 6/RHEL
5b2
Message-ID:
Hi,
trying to package jbrout (photo management application) for Fedora Extras
and I get always this error when running jbrout for the second time (this
time on RHEL 5beta2; verision of python-lxml is still just 1.0.3.2):
[matej at hubmaier ~]$ jbrout
GTK Accessibility Module initialized
Traceback (most recent call last):
File "/usr/share/jbrout/jbrout.py", line 2164, in ?
main()
File "/usr/share/jbrout/jbrout.py", line 2133, in main
JBrout.init(canModify)
File "/usr/share/jbrout/db.py", line 1297, in init
JBrout.db = DBPhotos( JBrout.getConfFile("db.xml") )
File "/usr/share/jbrout/db.py", line 118, in __init__
self.root = ElementTree(file=file).getroot()
File "etree.pyx", line 1504, in etree.ElementTree
File "parser.pxi", line 687, in etree._parseDocument
File "parser.pxi", line 624, in etree._parseDocFromFile
File "parser.pxi", line 364, in etree._BaseParser._parseDocFromFile
File "parser.pxi", line 432, in etree._handleParseResult
File "parser.pxi", line 403, in etree._raiseParseError
etree.XMLSyntaxError: line 1: PCDATA invalid Char value 2
[matej at hubmaier ~]$
Further discussion of this bug is on jbrout list at
http://groups.google.com/group/jbrout/browse_thread/thread/73aaa54115930c5b
Could anybody help me how to make this package work?
Thanks a lot,
Mat?j
--
http://www.ceplovi.cz/matej/blog/, Jabber: ceplmajabber.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC
My life has been full of terrible misfortunes most of which never
happened.
-- Michel de Montaigne
From fredrik at pythonware.com Thu Dec 14 13:05:19 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 14 Dec 2006 13:05:19 +0100
Subject: [lxml-dev] jbrout fails to work second time on Fedora Core
6/RHEL 5b2
In-Reply-To:
References:
Message-ID:
Matej Cepl wrote:
> File "/usr/share/jbrout/db.py", line 1297, in init
> JBrout.db = DBPhotos( JBrout.getConfFile("db.xml") )
...
> etree.XMLSyntaxError: line 1: PCDATA invalid Char value 2
>
> Further discussion of this bug is on jbrout list at
> http://groups.google.com/group/jbrout/browse_thread/thread/73aaa54115930c5b
>
> Could anybody help me how to make this package work?
the error message says that the "db.xml" file is broken. what does the
first few lines in that file look like?
From marian.schubert at gmail.com Thu Dec 14 14:38:49 2006
From: marian.schubert at gmail.com (Marian Schubert)
Date: Thu, 14 Dec 2006 14:38:49 +0100
Subject: [lxml-dev] lxml segfaults while instantiating ElementBase
Message-ID:
Hello,
i guess it should not be instantiated but still...
Python 2.4.4 (#2, Oct 20 2006, 00:23:25)
[GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2
>>> from lxml.etree import ElementBase
>>> ElementBase()
Segmentation fault
lxml.etree: (1, 1, 1, 0)
libxml used: (2, 6, 27)
libxml compiled: (2, 6, 26)
libxslt used: (1, 1, 18)
libxslt compiled: (1, 1, 17)
cu,
Maio
From lee.brown at elecdev.com Thu Dec 14 14:51:57 2006
From: lee.brown at elecdev.com (Lee Brown)
Date: Thu, 14 Dec 2006 08:51:57 -0500
Subject: [lxml-dev] About lxml status
In-Reply-To:
Message-ID: <200612141351.kBEDpj0e021058@mail.elecdev.com>
Greetings!
I apologize for my mistaken presumption. I presumed it only supported basic
Xpath functions because the examples on the lxml API web page only show basic
examples.
I am at a significant disadvantage when it comes to LXML and the underlying
libxml2/libxslt libraries as I cannot read the C source. (Back when I took
formal programming courses, the three choices offered to engineering students
were Basic, Fortran, and this hot, up-and-coming language called Pascal which
was supposed to set the world on fire.) All of the documentation for the
libxml2/libxslt libraries on xmlsoft.org is written from a C perspective and
checking out the source code for lxml won't help me much, either. So I am
limited to whatever I can glean from the lxml web site examples and whatever I
can discover using the usual Python code inspection techniques. (Which don't go
very far when much of the functionality resides in precompiled binaries.)
So really, the only way I'd be able to determine how far support for a given
X-standard goes in lxml is to write a whole bunch of test cases. (This is how I
figured out that lxml has broad support for the Xinclude standard, even though
the lxml API page states that "simple" Xinclude suport exists.)
Please don't infer from this that I have a negative tone towards lxml; I do not.
I think it's absolutely great. I have tried pretty much every Python-based
XML/XSLT/Xwhatever code base out there and lxml is really the only one that is
robust enough, reliable enough, and FAST enough to be useful for production use.
I am currently using lxml in conjunction with Mod Python on an Apache web server
to serve XML content data, merging dynamic data through Xincludes and
transforming the output on-the-fly into XHTML using XSLT templates. It works
great! If there's one thing I'd like to add to the lxml "wish list" it would be
some more in-depth examples on the web site - there's a lot more things I'd like
to be doing with lxml if I could just figure out if it will do them and how.
-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On
Behalf Of Martijn Faassen
Sent: Tuesday, December 12, 2006 1:00 PM
To: lxml-dev at codespeak.net
Subject: Re: [lxml-dev] About lxml status
Hello,
Lee Brown wrote:
> This discussion reminded me of a question that I've been pondering:
> will the XPATH support in LXML eventually include support for axes and
predicates?
Could you explain how lxml is lacking in support for axes and predicates?
Possibly you've only been looking at '.find()'. which is the ElementTree
compatible buth limited xpath implementation, and not at the full '.xpath()'
functionality?
Regards,
Martijn
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
From hjh at alterras.de Thu Dec 14 15:49:40 2006
From: hjh at alterras.de (=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay)
Date: Thu, 14 Dec 2006 15:49:40 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To: <200612141351.kBEDpj0e021058@mail.elecdev.com>
References: <200612141351.kBEDpj0e021058@mail.elecdev.com>
Message-ID: <1166107780.1729.63.camel@hera.local>
Dear Lee Brown,
I use lxml in the same context but their is an issue I like to warn you
about, it is not possible to access global precompiled XSLT styles from
different theads. But mod python uses multiple threads. The only
solution I found up till now is to prevent mod_python from using
multiple threads by globaly setting
PythonInterpreter somename
in the mod_python related apache config while using the prefork apache
module or to build the style on each request new. If you see a better
solution please tell me.
Regards
Hans
Am Donnerstag, den 14.12.2006, 08:51 -0500 schrieb Lee Brown:
I am currently using lxml in conjunction with Mod Python on an Apache
web server
to serve XML content data, merging dynamic data through Xincludes and
transforming the output on-the-fly into XHTML using XSLT templates.
From lee.brown at elecdev.com Thu Dec 14 16:46:36 2006
From: lee.brown at elecdev.com (Lee Brown)
Date: Thu, 14 Dec 2006 10:46:36 -0500
Subject: [lxml-dev] About lxml status
In-Reply-To: <1166107780.1729.63.camel@hera.local>
Message-ID: <200612141546.kBEFkO0e023428@mail.elecdev.com>
Greetings!
Thanks for the warning, but I've already run headfirst into that problem. My
apache server is running the Win32MPM, where every request is a new thread, so
there aren't any tricks I can play with the PythonInterpreter directive. (None
that help, anyway.)
However, I did some benchmark tests and found that I can serve about 32 requests
per second even with the overhead of recompiling the XSLT template new for each
request. This is adequate for my needs, though a very busy website might have
trouble.
One thing I haven't tried is to pre-compile my XSLT templates and cPickle them
to disk files and then unpickle a copy to serve the request. A web server with
a good file caching system might have to do very few actual disk reads but
whether it is faster to unpickle a compiled template object than to just
re-compile a new one remains unknown.
If anyone has a suggestion for building a thread-safe set of precompiled
templates, I'm all ears....
-----Original Message-----
From: Hans-J?rgen Hay [mailto:hjh at alterras.de]
Sent: Thursday, December 14, 2006 9:50 AM
To: Lee Brown; lxml-dev at codespeak.net
Subject: Re: [lxml-dev] About lxml status
Dear Lee Brown,
I use lxml in the same context but their is an issue I like to warn you about,
it is not possible to access global precompiled XSLT styles from different
theads. But mod python uses multiple threads. The only solution I found up till
now is to prevent mod_python from using multiple threads by globaly setting
PythonInterpreter somename
in the mod_python related apache config while using the prefork apache module or
to build the style on each request new. If you see a better solution please tell
me.
Regards
Hans
Am Donnerstag, den 14.12.2006, 08:51 -0500 schrieb Lee Brown:
I am currently using lxml in conjunction with Mod Python on an Apache web server
to serve XML content data, merging dynamic data through Xincludes and
transforming the output on-the-fly into XHTML using XSLT templates.
From hjh at alterras.de Thu Dec 14 17:31:48 2006
From: hjh at alterras.de (=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay)
Date: Thu, 14 Dec 2006 17:31:48 +0100
Subject: [lxml-dev] About lxml status
In-Reply-To: <200612141546.kBEFkO0e023428@mail.elecdev.com>
References: <200612141546.kBEFkO0e023428@mail.elecdev.com>
Message-ID: <1166113908.1729.87.camel@hera.local>
Greethings,
I found out very late and this gave me serious headaches. pickle does
not work with XSLT objects. eaven if it did it would propably perform
much worse than building from source. But thanx for the tip, maybe the
developers can help out a little with coarse graind locking at a later
stage. Using lxml with mod_python should be an interesting use case.
Regards
Hans
Am Donnerstag, den 14.12.2006, 10:46 -0500 schrieb Lee Brown:
> Greetings!
>
> Thanks for the warning, but I've already run headfirst into that problem. My
> apache server is running the Win32MPM, where every request is a new thread, so
> there aren't any tricks I can play with the PythonInterpreter directive. (None
> that help, anyway.)
>
> However, I did some benchmark tests and found that I can serve about 32 requests
> per second even with the overhead of recompiling the XSLT template new for each
> request. This is adequate for my needs, though a very busy website might have
> trouble.
>
> One thing I haven't tried is to pre-compile my XSLT templates and cPickle them
> to disk files and then unpickle a copy to serve the request. A web server with
> a good file caching system might have to do very few actual disk reads but
> whether it is faster to unpickle a compiled template object than to just
> re-compile a new one remains unknown.
>
> If anyone has a suggestion for building a thread-safe set of precompiled
> templates, I'm all ears....
>
> -----Original Message-----
> From: Hans-J?rgen Hay [mailto:hjh at alterras.de]
> Sent: Thursday, December 14, 2006 9:50 AM
> To: Lee Brown; lxml-dev at codespeak.net
> Subject: Re: [lxml-dev] About lxml status
>
> Dear Lee Brown,
>
> I use lxml in the same context but their is an issue I like to warn you about,
> it is not possible to access global precompiled XSLT styles from different
> theads. But mod python uses multiple threads. The only solution I found up till
> now is to prevent mod_python from using multiple threads by globaly setting
>
> PythonInterpreter somename
>
> in the mod_python related apache config while using the prefork apache module or
> to build the style on each request new. If you see a better solution please tell
> me.
>
> Regards
> Hans
>
>
> Am Donnerstag, den 14.12.2006, 08:51 -0500 schrieb Lee Brown:
>
> I am currently using lxml in conjunction with Mod Python on an Apache web server
> to serve XML content data, merging dynamic data through Xincludes and
> transforming the output on-the-fly into XHTML using XSLT templates.
>
From ianb at colorstudy.com Thu Dec 14 22:49:40 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 14 Dec 2006 15:49:40 -0600
Subject: [lxml-dev] About lxml status
In-Reply-To: <200612141546.kBEFkO0e023428@mail.elecdev.com>
References: <200612141546.kBEFkO0e023428@mail.elecdev.com>
Message-ID: <4581C6F4.6000504@colorstudy.com>
Lee Brown wrote:
> Thanks for the warning, but I've already run headfirst into that problem. My
> apache server is running the Win32MPM, where every request is a new thread, so
> there aren't any tricks I can play with the PythonInterpreter directive. (None
> that help, anyway.)
>
> However, I did some benchmark tests and found that I can serve about 32 requests
> per second even with the overhead of recompiling the XSLT template new for each
> request. This is adequate for my needs, though a very busy website might have
> trouble.
I'm not clear exactly on the way threads and mod_python and all that
work, but I imagine you could use a pool of templates. You'd do
something like:
try:
tmpl = template_pool.pop()
except IndexError:
tmpl = compile_template()
# then to return the template to the pool:
template_pool.append(tmpl)
This is assuming that it's okay to move templates between threads, but
not use them concurrently between threads. Or if they have to be used
in the thread they were created in, you can use:
import threading
template_cache = threading.local()
try:
tmpl = template_cache.template
except AttributeError:
tmpl = template_cache.template = compile_template()
That's assuming that threads are long-lived, otherwise this won't change
anything either.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From lee.brown at elecdev.com Fri Dec 15 00:09:36 2006
From: lee.brown at elecdev.com (Lee Brown)
Date: Thu, 14 Dec 2006 18:09:36 -0500
Subject: [lxml-dev] About lxml status
In-Reply-To: <4581C6F4.6000504@colorstudy.com>
Message-ID: <200612142309.kBEN9M0e031831@mail.elecdev.com>
Greetings!
The Apache web server has several different MPMs (Multi-Processing Modules)
available to it (unless you're running the Win32MPM, in which case that's the
one you're stuck with.) But basically, the web server can spawn either
processes or threads to handle incoming requests.
In the Win32MPM, each VHOST (virtual web site) runs as a separate OS process and
each request that a VHOST receives is handled entirely as a thread within that
process. Each thread invokes a chain of request handlers (code modules that
handle specific tasks like authentication, authorization, content delivery,
output filtering, and so forth) that are instantiated for that thread and then
they die at the end of the request.
Request threads may arrive simultaneously and are by nature very short-lived.
If a VHOST gets 32 simultaneous requests, 32 threads get created and then within
a second or two all 32 threads are finished and terminated. (By default the
Win32 MPM can have a maximum of 250 concurrent threads.)
What Mod Python does is to allow you to specify a python function that will
handle a specific task or tasks in the chain in lieu of Apache's standard
handlers. Mod Python's default behavior is to create a Python interpreter for
each VHOST and this interpreter is responsible for executing the various handler
functions in a thread-safe way for each request. (I have NO idea how it does it,
nor is my state of confusion likely to change even if someone explains it to
me.) The source code containing the function is imported as a module at
interpreter startup in the normal 'Python' way, that is, executable code in the
module defined outside of the handler function definition(s) is executed on
import and is global to the handler function(s).
So, naively, I wrote some global code to pre-load and pre-compile all of my XSLT
templates into a dictionary at startup. Then, within the handler function
definition I look up the correct template in the dictionary and use it to
transform the parsed XML source object. This worked just fine as long as one
and only one thread was being executed at any given time. Simultaneous requests
would either bomb out with a threading-related error or just hang until the
server ran out of available threads and crashed. Apparently, Mod Python can
dole out handler functions in a thread-safe way, but any global objects you
create at import time are not so lucky. Nor does there seem to be any way to
share an object from one thread with another thread.
One way around this may be to pass a copy of the template dictionary to the
handler function, that is, pass a literal copy instead of an object reference.
This would eliminate the time overhead of recompiling templates for each request
at the expense of possibly having a lot of copies in-memory at one time. But
since my server always seems to have plenty of free memory, I'll give it a try.
-----Original Message-----
From: Ian Bicking [mailto:ianb at colorstudy.com]
Sent: Thursday, December 14, 2006 4:50 PM
To: Lee Brown
Cc: 'Hans-J?rgen Hay'; lxml-dev at codespeak.net
Subject: Re: [lxml-dev] About lxml status
Lee Brown wrote:
> Thanks for the warning, but I've already run headfirst into that
> problem. My apache server is running the Win32MPM, where every
> request is a new thread, so there aren't any tricks I can play with
> the PythonInterpreter directive. (None that help, anyway.)
>
> However, I did some benchmark tests and found that I can serve about
> 32 requests per second even with the overhead of recompiling the XSLT
> template new for each request. This is adequate for my needs, though
> a very busy website might have trouble.
I'm not clear exactly on the way threads and mod_python and all that work, but I
imagine you could use a pool of templates. You'd do something like:
try:
tmpl = template_pool.pop()
except IndexError:
tmpl = compile_template()
# then to return the template to the pool:
template_pool.append(tmpl)
This is assuming that it's okay to move templates between threads, but not use
them concurrently between threads. Or if they have to be used in the thread
they were created in, you can use:
import threading
template_cache = threading.local()
try:
tmpl = template_cache.template
except AttributeError:
tmpl = template_cache.template = compile_template()
That's assuming that threads are long-lived, otherwise this won't change
anything either.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
From mcepl at redhat.com Fri Dec 15 09:15:44 2006
From: mcepl at redhat.com (Matej Cepl)
Date: Fri, 15 Dec 2006 08:15:44 +0000 (UTC)
Subject: [lxml-dev] jbrout fails to work second time on Fedora
Core 6/RHEL 5b2
References:
Message-ID:
Fredrik Lundh scripst:
> the error message says that the "db.xml" file is broken. what does the
> first few lines in that file look like?
The file is available in its entiriety on
http://www.ceplovi.cz/matej/tmp/db.xml
Thanks a lot for the answer,
Mat?j
--
http://www.ceplovi.cz/matej/blog/, Jabber: ceplmajabber.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC
Scouts are saving aluminum cans, bottles and other items to be
recycled. Proceeds will be used to cripple children.
-- from a church bulletin
From faassen at startifact.com Tue Dec 19 21:42:38 2006
From: faassen at startifact.com (Martijn Faassen)
Date: Tue, 19 Dec 2006 21:42:38 +0100
Subject: [lxml-dev] lxml segfaults while instantiating ElementBase
In-Reply-To:
References:
Message-ID:
Hello,
Marian Schubert wrote:
> i guess it should not be instantiated but still...
>
> Python 2.4.4 (#2, Oct 20 2006, 00:23:25)
> [GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2
>>>> from lxml.etree import ElementBase
>>>> ElementBase()
> Segmentation fault
>
>
> lxml.etree: (1, 1, 1, 0)
> libxml used: (2, 6, 27)
> libxml compiled: (2, 6, 26)
> libxslt used: (1, 1, 18)
> libxslt compiled: (1, 1, 17)
Good point. We should be hiding this from import if we can. Stefan, any
ideas?
Regards,
Martijn
From faassen at startifact.com Tue Dec 19 21:45:51 2006
From: faassen at startifact.com (Martijn Faassen)
Date: Tue, 19 Dec 2006 21:45:51 +0100
Subject: [lxml-dev] lxml documentation volunteers?
Message-ID:
Hi there,
I think it's time to give the lxml documentation a reworking, in
particular our API documentation. It must become more clear for users
what is in the API; I find myself having to look at the code or do a
dir() far too often to find whether a feature is supported.
What I would wish for is API documentation similar to the module
documentation on python.org. Something close to that would be familiar
for users so they can get started with lxml quickly. I hope we can also
continue to doctest code samples in the documentation.
For completeness I think we would need to integrate the existing
ElementTree API documentation so we have a one-stop-shop for people
using lxml, instead of the scattered situation now. We should mark where
an API is taken from ElementTree so that people can easily write
compatible code.
Such API documentation would also help us identify possible ElementTree
APIs we haven't implemented yet, and it might also suggest APIs we want
to imply that are currently missing.
Any volunteers for this work?
Regards,
Martijn
From howesteve at gmail.com Wed Dec 20 10:04:32 2006
From: howesteve at gmail.com (Steve Howe)
Date: Wed, 20 Dec 2006 07:04:32 -0200
Subject: [lxml-dev] Processing instruction doubt
Message-ID: <200612200704.32908.howesteve@gmail.com>
Hello all,
This should be rather a stupid question, but supposing I have an ElementTree
instance, and I want to add a processing instruction to it - in my case, a
xml-stylesheet PI - how do I add that PI in the correct location of the tree
(i.e. before the root) without serializing ?
...
Before someone answers "add the PI into the XSLT before serializing it", the
ElementTree is received by a function from an end user and I have no control
over what's received.
Thanks.
--
Best Regards,
Steve Howe
From faassen at startifact.com Thu Dec 21 17:46:02 2006
From: faassen at startifact.com (Martijn Faassen)
Date: Thu, 21 Dec 2006 17:46:02 +0100
Subject: [lxml-dev] relax ng bug: validation twice doesn't give same answer
Message-ID:
Hi there,
We just ran into the following problem with lxml's RelaxNG validation.
Validating with the same RelaxNG schema gives the right result
(invalid). Validating again however gives valid! This script is a
minimal test case that demonstrates the problem. Tested with lxml 1.1.2
and libxml2 2.6.24 and also 2.6.26 on another machine.
From an earlier thread last year it's possible that this is a libxml2 bug:
http://codespeak.net/pipermail/lxml-dev/2005-September/000423.html
This was more than a year ago though and it's somewhat surprising this
still wasn't fixed. I found the bug report:
http://bugzilla.gnome.org/show_bug.cgi?id=315883
and have added something to it in the hope it'll spur some activity in
confirming it...
Regards,
Martijn
from lxml import etree
from StringIO import StringIO
v = etree.RelaxNG(etree.parse(StringIO('''\
''')))
# this is an invalid document
d = etree.parse(StringIO('''\
'''))
first = v.validate(d) # returns 0, what is expected
second = v.validate(d) # returns 1!
assert first == second, "Validity isn't the same over time"
From Holger.Joukl at LBBW.de Fri Dec 29 16:16:12 2006
From: Holger.Joukl at LBBW.de (Holger Joukl)
Date: Fri, 29 Dec 2006 16:16:12 +0100
Subject: [lxml-dev] [objectify] __MATCH_PATH_SEGMENT regex modification
suggestion
Message-ID:
Hi,
I suggest to loosen the __MATCH_PATH_SEGMENT regex a little
to care for more possible element names, which are sometimes
outside of my control.
Currently ObjectPath chokes on paths like 'root.a-x.a-y'.
While such names are often inconvenient at best I found that
python itself is quite non-restrictive wrt attibute names:
python2.4
Python 2.4.3 (#2, Nov 20 2006, 16:26:48)
[GCC 2.95.2 19991024 (release)] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo(object):
... pass
...
>>> setattr(Foo, 'a-b', "hmm")
>>>
Hence, I propose to change the regex to:
cdef object __MATCH_PATH_SEGMENT
__MATCH_PATH_SEGMENT = re.compile(
r"(\.?)\s*(?:\{([^}]*)\})?\s*([^.{}]+)\s*(?:\[\s*([-0-9]+)\s*\])?",
re.U).match
(Changed: ([^.{}]+) replaces (\w+))
Holger
Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empf?nger sind oder falls diese E-Mail irrt?mlich an Sie adressiert wurde,
verst?ndigen Sie bitte den Absender sofort und l?schen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte ?bermittlung sind nicht
gestattet. Die Sicherheit von ?bermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Best?tigung w?nschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.
The contents of this e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail. Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.
From diapriid at gmail.com Sat Dec 30 23:20:24 2006
From: diapriid at gmail.com (Matt)
Date: Sat, 30 Dec 2006 16:20:24 -0600
Subject: [lxml-dev] lxml on OS X compile problems
Message-ID: <19d6b9770612301420h8fe9718s47d538e786ff392a@mail.gmail.com>
Hi All,
My first question - anyone successfully build lxml on OS X 10.3.9?
And next, have they encoutnered the following error -
specifics-
OS X 10.3.9
gcc 3.3
python 2.4.4
lxml 1.1.2
libxml2 config line -
./configure \
--with-python=/Library/Frameworks/Python.framework/Versions/2.4/
libxslt config line-
./configure \
--with-python=/Library/Frameworks/Python.framework/Versions/2.4/ \
--prefix=/usr/local \
--with-libxml-prefix=/usr/local \
--with-libxml-include-prefix=/usr/local/include \
--with-libxml-libs-prefix=/usr/local/lib
Results-
xsltproc was compiled against libxml 20611, libxslt 10109 and libexslt 807
libxslt 10117 was compiled against libxml 20626
libexslt 807 was compiled against libxml 20611
I'm a little worried that the xsltproc wasn't combiled against the same
libraries- but I'm not sure how to resolve this. I've played with
installing various versions of libxslt and libxml and they seem to be
installing- though all the libxml2 headers may not be installing properly.
The present error occurs since I've copied the missing headers xmlstring.hand
xlmsave.h to /usr/include/. Using Pyrex to create the objectify.c file
doesn't help.
>python setup.py install
Building lxml version 1.2.dev-35415
running install
running bdist_egg
running egg_info
writing src/lxml.egg-info/PKG-INFO
writing top-level names to src/lxml.egg-info/top_level.txt
writing dependency_links to src/lxml.egg-info/dependency_links.txt
reading manifest template 'MANIFEST.in'
warning: no files found matching 'objectify.c' under directory 'src/lxml'
warning: no files found matching '*.html' under directory 'doc'
warning: no files found matching '*.py' under directory 'Pyrex'
writing manifest file 'src/lxml.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.3-ppc/egg
running install_lib
running build_py
running build_ext
building 'lxml.etree' extension
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd
-fno-common -dynamic -DNDEBUG -g -O3 -I/usr/include/libxml2
-I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4 -c
src/lxml/etree.c -o build/temp.macosx-10.3-ppc-2.4/src/lxml/etree.o -w
In file included from src/lxml/etree.c:27:
/usr/include/libxml2/libxml/xmlstring.h:28: error: redefinition of `xmlChar'
/usr/include/libxml2/libxml/tree.h:107: error: `xmlChar' previously declared
here
/usr/include/libxml2/libxml/xmlstring.h:40: error: syntax error before
"xmlChar"
/usr/include/libxml2/libxml/xmlstring.h:41: error: parse error before
"xmlStrdup"
<... and much more ...>
Any hints?
Thanks for your time,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20061230/2a366ad6/attachment.htm