From rcdailey at gmail.com Wed Aug 1 23:02:28 2007 From: rcdailey at gmail.com (Robert Dailey) Date: Wed, 1 Aug 2007 16:02:28 -0500 Subject: [lxml-dev] lxml 1.3.3 released In-Reply-To: References: <46A8D0A6.30000@behnel.de> <496954360707291335m392786b9vae956191a9eeae3b@mail.gmail.com> <46ADD80F.2050700@behnel.de> Message-ID: <496954360708011402s2cf4931bl443861150c8be97b@mail.gmail.com> Thanks a ton Silva! On 7/31/07, Sidnei da Silva wrote: > > Hey there, > > Sorry for the delay, I was taking a short vacation in LA, just arrived > in Houston now. > > I've built 1.3.3 for Python 2.4 and 2.5 and uploaded them to the > cheeseshop. > > -- > Sidnei da Silva > Enfold Systems http://enfoldsystems.com > Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070801/c06a4654/attachment.htm From jholg at gmx.de Thu Aug 2 11:33:50 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Thu, 02 Aug 2007 11:33:50 +0200 Subject: [lxml-dev] schema typifier for objectify Message-ID: <20070802093350.155020@gmx.net> Hi, I'm interested in having a "schema typifier" for lxml.objectify that can enrich an instance document with type information taken from an XML Schema (xsi:type and/or py:pytype). Similar to the annotate()/pyannotate() stuff, but making use of the schema type definitions. I'm wondering how one realized such thing best: Is there a possibility to make use of the lxml/libxml2 validation capabilities, i.e. to somehow hook this into the validation proceedings? Any hints welcome, Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser From stefan_ml at behnel.de Thu Aug 2 13:52:05 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 02 Aug 2007 13:52:05 +0200 Subject: [lxml-dev] [xml] Catching error messages in schematron In-Reply-To: <20070802112659.GF5712@redhat.com> References: <469FA76C.60905@behnel.de> <20070802112659.GF5712@redhat.com> Message-ID: <46B1C565.1000503@behnel.de> Hi Daniel, Daniel Veillard wrote: > On Thu, Jul 19, 2007 at 08:03:24PM +0200, Stefan Behnel wrote: >> currently (as of libxml2 2.6.29), the schematron implementation writes >> validation error messages to stderr with a plain fprintf in >> xmlSchematronReportOutput (happily annotated with a "TODO"). >> >> However, to make schematron usable in lxml, I need a way to propagate these >> errors to the Python level. I would therefore like to see them passed into >> __xmlRaiseError(). > > The schematron code is really not complete, that sure could be fixed > but a lot more may be needed. I asked for a definitive set of test for > the ISO version and never got an answer, so I got discouraged and dropped > working on it at the time. Of course now I have far less time for libxml2 Ah, that's too bad. Can you comment on how complete you consider the current Schematron implementation? Stefan From mwm-keyword-lxml.9112b8 at mired.org Fri Aug 3 16:55:07 2007 From: mwm-keyword-lxml.9112b8 at mired.org (Mike Meyer) Date: Fri, 3 Aug 2007 10:55:07 -0400 Subject: [lxml-dev] easy_install of 1.3.3 not working Message-ID: <20070803105507.1c252108@bhuda.mired.org> Hi, I'm trying to install lxml-1.3.3 using easy-install, and it isn't so easy. In particular (typescript edited for brevity): Script started on Fri 03 Aug 2007 10:38:17 AM EDT mwm$ sudo easy_install --verbose lxml==1.3.3 Searching for lxml==1.3.3 [...] Best match: lxml 1.3.3 Downloading http://codespeak.net/lxml/lxml-1.3.3.tgz Processing lxml-1.3.3.tgz Unpacking lxml-1.3.3// to /tmp/easy_install-d0p9oh/lxml-1.3.3/ [...] copying src/lxml/builder.py -> build/lib.linux-x86_64-2.5/lxml running build_ext building 'lxml.etree' extension creating build/temp.linux-x86_64-2.5 creating build/temp.linux-x86_64-2.5/src creating build/temp.linux-x86_64-2.5/src/lxml gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/include/libxml2 -I/usr/local/include/python2.5 -c src/lxml/etree.c -o build/temp.linux-x86_64-2.5/src/lxml/etree.o -w gcc -pthread -shared build/temp.linux-x86_64-2.5/src/lxml/etree.o -L/usr/lib64 -L/usr/local/lib -lxslt -lexslt -lxml2 -lz -lm -lpython2.5 -o build/lib.linux-x86_64-2.5/lxml/etree.so /usr/bin/ld: /usr/lib64/libxslt.a(xslt.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC /usr/lib64/libxslt.a: could not read symbols: Bad value collect2: ld returned 1 exit status error: Setup script exited with error: command 'gcc' failed with exit status 1 mwm$ exit Script done on Fri 03 Aug 2007 10:39:03 AM EDT Checking the mail list archives turns up building python --enable-shared, and I did that, and nothing changed. The problem appears to be with libxslt, not python in any case. Anyone got any clues as to what to help with this? thanks, http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. From stefan_ml at behnel.de Fri Aug 3 17:46:07 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 03 Aug 2007 17:46:07 +0200 Subject: [lxml-dev] easy_install of 1.3.3 not working In-Reply-To: <20070803105507.1c252108@bhuda.mired.org> References: <20070803105507.1c252108@bhuda.mired.org> Message-ID: <46B34DBF.4040909@behnel.de> Mike Meyer wrote: > I'm trying to install lxml-1.3.3 using easy-install, and it isn't so > easy. In particular (typescript edited for brevity): > > /usr/bin/ld: /usr/lib64/libxslt.a(xslt.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC You need to provide (export) the CFLAGS that apply to your platform, especially the -fPIC option is required on x86_64. Stefan From gregwillden at gmail.com Wed Aug 8 21:30:50 2007 From: gregwillden at gmail.com (Greg Willden) Date: Wed, 8 Aug 2007 14:30:50 -0500 Subject: [lxml-dev] Round Trip w/ Processing Instructions (Newbie) Message-ID: <903323ff0708081230u4e99d2a0g6544a10974293584@mail.gmail.com> Hello all, I love the etree stuff and I'm excited about the API additions that this team is adding to make it better. Thanks for all your work. That said I'm really new to ElementTree. I am using lxml.etree version 1.3.3 on both Python 2.4.3 and 2.5.1. I have a question about processing instructions and round trip file processing. Let's say I have a file 'test.xml' containing the following Then I run these commands t1=etree.parse('test.xml') t1.write('testout.xml', xml_declaration=True) testout.xml looks like this: I see that t1.docinfo.doctype contains: So why isn't the PI written to the file? I am parsing/modifying a file for consumption by another program that refuses to load the file without the PI. Thanks Greg -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070808/f5546494/attachment.htm From stefan_ml at behnel.de Thu Aug 9 09:37:52 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 09 Aug 2007 09:37:52 +0200 Subject: [lxml-dev] Round Trip w/ Processing Instructions (Newbie) In-Reply-To: <903323ff0708081230u4e99d2a0g6544a10974293584@mail.gmail.com> References: <903323ff0708081230u4e99d2a0g6544a10974293584@mail.gmail.com> Message-ID: <46BAC450.1010809@behnel.de> Hi Greg, Greg Willden wrote: > I love the etree stuff and I'm excited about the API additions that this > team is adding to make it better. Thanks for all your work. :) > I have a question about processing instructions and round trip file > processing. > Let's say I have a file 'test.xml' containing the following > > > > > > Then I run these commands > > t1=etree.parse('test.xml') > t1.write('testout.xml ', xml_declaration=True) > > testout.xml looks like this: > > > > I see that t1.docinfo.doctype contains: > > > So why isn't the PI written to the file? > > I am parsing/modifying a file for consumption by another program that > refuses to load the file without the PI. Hmm, the problem here is that libxml2 lacks a good backwards compatible way of serialising documents including things like processing instructions, internal subsets (DTDs) and doctype declarations (which you have here, it's not a PI). I'll look into it (once again). IIRC, we serialise PIs by now, but not internal subsets and doctypes. We should do that when serialising root nodes (especially ElementTree objects). Stefan From gregwillden at gmail.com Thu Aug 9 15:37:16 2007 From: gregwillden at gmail.com (Greg Willden) Date: Thu, 9 Aug 2007 08:37:16 -0500 Subject: [lxml-dev] Fwd: Round Trip w/ Processing Instructions (Newbie) In-Reply-To: <903323ff0708090631y44c28ab9m703df93c1c557689@mail.gmail.com> References: <903323ff0708081230u4e99d2a0g6544a10974293584@mail.gmail.com> <46BAC450.1010809@behnel.de> <903323ff0708090631y44c28ab9m703df93c1c557689@mail.gmail.com> Message-ID: <903323ff0708090637t5c37a620i8462247933946684@mail.gmail.com> It looks like this message didn't go to the list. ---------- Forwarded message ---------- Hi Stefan, On 8/9/07, Stefan Behnel wrote: > > > I see that t1.docinfo.doctype contains: > > > > > > So why isn't the PI written to the file? > > Hmm, the problem here is that libxml2 lacks a good backwards compatible > way of > serialising documents including things like processing instructions, > internal > subsets (DTDs) and doctype declarations (which you have here, it's not a > PI). > > I'll look into it (once again). IIRC, we serialise PIs by now, but not > internal subsets and doctypes. We should do that when serialising root > nodes > (especially ElementTree objects). Sorry I've got my terminology mixed up. Obviously I'm not an XML expert. Just enough to be dangerous ;-) Thanks for looking into it. I searched the mailing list to see if this stuff had come up before and I had seen a lot of similar questions so I'm sorry if it's a FAQ or something. You mention serializing root nodes. That brings up another question I had along these lines. Given the same XML file: If I do an etree.parse('test.xml') I get an ElementTree object but If I were to have that same data as a string and call etree.XML() or etree.fromstring() then all I get is an Element. That seems really odd to me. If this question has been discussed (I'm guessing it has), I'd be happy with an answer like "take a look at this thread for the discussion". I guess it's probably more of an ElementTree API question/confusion than a problem with lxml.etree though. Thanks again for your work. Greg -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070809/2348c736/attachment-0001.htm From stefan_ml at behnel.de Sat Aug 11 19:08:00 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 11 Aug 2007 19:08:00 +0200 Subject: [lxml-dev] schema typifier for objectify In-Reply-To: <20070802093350.155020@gmx.net> References: <20070802093350.155020@gmx.net> Message-ID: <46BDECF0.3070805@behnel.de> jholg at gmx.de wrote: > I'm interested in having a "schema typifier" for lxml.objectify that can > enrich an instance document with type information taken from an XML Schema > (xsi:type and/or py:pytype). Similar to the annotate()/pyannotate() stuff, > but making use of the schema type definitions. > > I'm wondering how one realized such thing best: Is there a possibility to > make use of the lxml/libxml2 validation capabilities, i.e. to somehow hook > this into the validation proceedings? Hmm, there was a similar question on the libxml2 mailing list recently about creating a valid document from a schema. You might take a look there. http://mail.gnome.org/archives/xml/2007-July/msg00073.html http://mail.gnome.org/archives/xml/2007-August/msg00056.html Stefan From stefan_ml at behnel.de Mon Aug 13 15:08:41 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 13 Aug 2007 15:08:41 +0200 Subject: [lxml-dev] Round Trip w/ Processing Instructions In-Reply-To: <903323ff0708090637t5c37a620i8462247933946684@mail.gmail.com> References: <903323ff0708081230u4e99d2a0g6544a10974293584@mail.gmail.com> <46BAC450.1010809@behnel.de> <903323ff0708090631y44c28ab9m703df93c1c557689@mail.gmail.com> <903323ff0708090637t5c37a620i8462247933946684@mail.gmail.com> Message-ID: <46C057D9.1020401@behnel.de> Greg Willden wrote: > On 8/9/07, * Stefan Behnel* wrote: > > > I see that t1.docinfo.doctype contains: > > > > > > So why isn't the PI written to the file? > > I'll look into it (once again). IIRC, we serialise PIs by now, but not > internal subsets and doctypes. We should do that when serialising > root nodes (especially ElementTree objects). > > Thanks for looking into it. I searched the mailing list to see if this > stuff had come up before and I had seen a lot of similar questions so > I'm sorry if it's a FAQ or something. If it's fixed, it's no longer worth a FAQ entry. :) I committed something to the trunk for now (lxml 2.0). It changes the current behaviour in that it serialises an internal DTD when you serialise an ElementTree object (not a plain Element). I think that makes sense when you consider an ElementTree a document wrapper around a tree of Elements. I then think we should also restrict the current behaviour of serialising comments and PIs next to the top Element to ElementTree serialisation only. That way, you could easily be specific about your intention by serialising an Element (no sibling elements or DTDs, just the plain Element and its children) or an ElementTree (with DTD, sibling PIs and comments). I'll consider backporting it to lxml 1.3.x also (most people won't notice anyway :). > You mention serializing root nodes. That brings up another question I > had along these lines. > > Given the same XML file: > > > > > If I do an etree.parse('test.xml') I get an ElementTree object but If I > were to have that same data as a string and call etree.XML() or > etree.fromstring() then all I get is an Element. That seems really odd > to me. If this question has been discussed (I'm guessing it has), I'd > be happy with an answer like "take a look at this thread for the > discussion". > > I guess it's probably more of an ElementTree API question/confusion than > a problem with lxml.etree though. It's an ElementTree thing. We're compatible, so that's how it works. :) I think it initially came from the "parse()" method on the ElementTree class, but I'm not sure. You may have to ask Fredrik Lundh here. Stefan From utizoc at gmail.com Mon Aug 13 21:32:12 2007 From: utizoc at gmail.com (Bruno Deferrari) Date: Mon, 13 Aug 2007 16:32:12 -0300 Subject: [lxml-dev] Problem with xpath behaviour. Message-ID: Hi, I'm using lxml 1.3.3, libxml2 2.6.29 and python 2.5.1, and I'm getting a weird behaviour when using xpath, for example: >>> from lxml import etree >>> html = 'text1

text2

text3text4text5' >>> doc = etree.HTML(html) >>> etree.tostring(doc.xpath('//p')[0]) '

text2

text3' Shouldn't I be getting just '

text2

' ? From utizoc at gmail.com Mon Aug 13 21:41:29 2007 From: utizoc at gmail.com (Bruno Deferrari) Date: Mon, 13 Aug 2007 16:41:29 -0300 Subject: [lxml-dev] Problem with xpath behaviour. In-Reply-To: References: Message-ID: Just in case, under the same system, but using libxml2dom, I'm getting '

text2

' as the result: >>> doc2 = libxml2dom.parseString(html, html=True) >>> doc2.xpath("//p")[0].toString() '

lala

' From stefan_ml at behnel.de Tue Aug 14 08:37:35 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 14 Aug 2007 08:37:35 +0200 Subject: [lxml-dev] Problem with xpath behaviour. In-Reply-To: References: Message-ID: <46C14DAF.6040107@behnel.de> Bruno Deferrari wrote: > Hi, I'm using lxml 1.3.3, libxml2 2.6.29 and python 2.5.1, and I'm > getting a weird behaviour when using xpath, for example: > >>>> from lxml import etree >>>> html = 'text1

text2

text3text4text5' >>>> doc = etree.HTML(html) >>>> etree.tostring(doc.xpath('//p')[0]) > '

text2

text3' > > Shouldn't I be getting just '

text2

' ? Take one step back. What "doc.xpath('//p')[0]" returns is an Element with the tag "p", no children, the text "text2" and the tail "text3". When you serialise it, it becomes exactly the string you get. If you do not want that behaviour, consider using the XPath() class and wrapping it with a function that copies the result element and strips off its tail. Or, wrap tostring() with a function that ignores the tail of a single element that is passed. Alternatively, consider using the still-not-released-but-close lxml.html module for HTML handling. It comes with loads of handy HTML tools and also provides ways to deal with this 'issue'. http://codespeak.net/svn/lxml/branch/html/ Stefan From utizoc at gmail.com Tue Aug 14 14:02:15 2007 From: utizoc at gmail.com (Bruno Deferrari) Date: Tue, 14 Aug 2007 09:02:15 -0300 Subject: [lxml-dev] Problem with xpath behaviour. In-Reply-To: <46C14DAF.6040107@behnel.de> References: <46C14DAF.6040107@behnel.de> Message-ID: Ok, I made a function that checks if the element has a tail, and if it does it copies it and sets it to None before calling etree.tounicode() Thanks. From dkuhlman at rexx.com Thu Aug 16 02:31:25 2007 From: dkuhlman at rexx.com (Dave Kuhlman) Date: Wed, 15 Aug 2007 17:31:25 -0700 Subject: [lxml-dev] Problem with ":" char in tag names Message-ID: <20070816003125.GB42148@cutter.rexx.com> I've been using lxml and think it is great, but ... I recently installed lxml-1.3.3. Now I find that the following gives me an error: In [3]: from lxml import etree In [4]: etree.Element('abc:def') ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in File "etree.pyx", line 1801, in etree.Element File "apihelpers.pxi", line 101, in etree._makeElement File "apihelpers.pxi", line 723, in etree._getNsTag ValueError: Invalid tag name It's because of the ":" in the tag name. That's critical for me, because I use lxml in my rst2odt project to produce OpenOffice ODF .odt files. See: http://www.rexx.com/~dkuhlman/odtwriter.html An ODF/.odt file is a zipped archive of XML files. Those XML files contain many tags that contain colons. Here are the relevant portions of the XML spec, I believe: http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-starttags http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Name Aren't I correct that a colon should be allowed in a tag name? In apihelpers.pxi, it looks like the following lines were added in lxml version 1.3.3 and which I believe are raising the exception: elif cstd.strchr(c_tag, c':') is not NULL: raise ValueError, "Invalid tag name" Is there a reason for that? Hoping for enlightenment. Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman From stefan_ml at behnel.de Thu Aug 16 08:42:18 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 16 Aug 2007 08:42:18 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <20070816003125.GB42148@cutter.rexx.com> References: <20070816003125.GB42148@cutter.rexx.com> Message-ID: <46C3F1CA.5030303@behnel.de> Hi David, Dave Kuhlman wrote: > I've been using lxml and think it is great :) >, but ... ;) I just knew there was more to come... > I recently installed lxml-1.3.3. Now I find that the following > gives me an error: > > In [3]: from lxml import etree > In [4]: etree.Element('abc:def') > ------------------------------------------------------------ > Traceback (most recent call last): > File "", line 1, in > File "etree.pyx", line 1801, in etree.Element > File "apihelpers.pxi", line 101, in etree._makeElement > File "apihelpers.pxi", line 723, in etree._getNsTag > ValueError: Invalid tag name > > It's because of the ":" in the tag name. > > That's critical for me, because I use lxml in my rst2odt project to > produce OpenOffice ODF .odt files. See: > http://www.rexx.com/~dkuhlman/odtwriter.html > > An ODF/.odt file is a zipped archive of XML files. Those XML files > contain many tags that contain colons. > > Here are the relevant portions of the XML spec, I believe: > > http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-starttags > http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Name > > Aren't I correct that a colon should be allowed in a tag name? > > In apihelpers.pxi, it looks like the following lines were added in > lxml version 1.3.3 and which I believe are raising the exception: > > elif cstd.strchr(c_tag, c':') is not NULL: > raise ValueError, "Invalid tag name" > > Is there a reason for that? lxml (read: libxml2) supports XML 1.0 (don't think there were any relevant changes in 1.1, which you cite above) and is generally namespace aware. This means that ":" is considered a separator between a namespace prefix and the tag name, and is therefore not allowed as part of a plain (namespace-less) tag name. You mentioned ODF, which is heavily based on namespaces, and AFAIA, it doesn't use prefixes for anything but namespace references. So you should be fine with the general namespace support in lxml.etree. http://codespeak.net/lxml/dev/tutorial.html#namespaces Does that 'enlighten' you? :) Stefan From faassen at startifact.com Fri Aug 17 14:50:37 2007 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 17 Aug 2007 14:50:37 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <20070816003125.GB42148@cutter.rexx.com> References: <20070816003125.GB42148@cutter.rexx.com> Message-ID: Dave Kuhlman wrote: > I've been using lxml and think it is great, but ... > > I recently installed lxml-1.3.3. Now I find that the following > gives me an error: > > In [3]: from lxml import etree > In [4]: etree.Element('abc:def') > ------------------------------------------------------------ > Traceback (most recent call last): > File "", line 1, in > File "etree.pyx", line 1801, in etree.Element > File "apihelpers.pxi", line 101, in etree._makeElement > File "apihelpers.pxi", line 723, in etree._getNsTag > ValueError: Invalid tag name > > It's because of the ":" in the tag name. As another data point: by coincidence yesterday I saw a discussion of some other project who also ran into this problem. http://groups.google.com/group/html5lib-discuss/browse_thread/thread/9997a2468ab2b362 No idea about the context there. Regards, Martijn From faassen at startifact.com Fri Aug 17 14:54:16 2007 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 17 Aug 2007 14:54:16 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <46C3F1CA.5030303@behnel.de> References: <20070816003125.GB42148@cutter.rexx.com> <46C3F1CA.5030303@behnel.de> Message-ID: Hey, > lxml (read: libxml2) supports XML 1.0 (don't think there were any relevant > changes in 1.1, which you cite above) and is generally namespace aware. This > means that ":" is considered a separator between a namespace prefix and the > tag name, and is therefore not allowed as part of a plain (namespace-less) tag > name. What used to happen if you put a colon in a tag name? What would people expect to happen? I wonder whether it'd be possible to support namespace prefixes the proper way this way. I.e if I write: Element('foo:bar', nsmap={'foo': 'blah}) that could be equivalent to: Element('{blah}bar', nsmap={'foo': 'blah'}) The nice thing is that you could avoid having to write '{%s}foo' % my_namespace a lot. Of course this has consequences for other areas, such as 'tag', so I'm not sure whether this is a good idea, but throwing it in. It's definitely another extension on ElementTree, which can't really do this kind of stuff well due to the lack of parent pointers. Regards, Martijn From stefan_ml at behnel.de Fri Aug 17 21:14:54 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 17 Aug 2007 21:14:54 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: References: <20070816003125.GB42148@cutter.rexx.com> <46C3F1CA.5030303@behnel.de> Message-ID: <46C5F3AE.8080006@behnel.de> Martijn Faassen wrote: >> lxml (read: libxml2) supports XML 1.0 (don't think there were any relevant >> changes in 1.1, which you cite above) and is generally namespace aware. This >> means that ":" is considered a separator between a namespace prefix and the >> tag name, and is therefore not allowed as part of a plain (namespace-less) tag >> name. > > What used to happen if you put a colon in a tag name? What would people > expect to happen? Well, lxml.etree previously accepted those as part of a tag name. This means that you could do this: >>> root = etree.Element("some:root") >>> print etree.tostring(root) which allowed you to use namespace prefixes without declaring namespaces, i.e. it really helps you in writing out broken XML. It also allowed you to do this, which I think people did: >>> root = etree.XML('') >>> root.append( etree.Element("p:other") ) >>> print etree.tostring(root) Looks correct, right? However, it nicely breaks all namespace aware XML stuff that works on the in-memory tree: >>> print root, root[0] >>> print root.xpath("//p:other") Traceback (most recent call last): ... etree.XPathEvalError: Undefined namespace prefix >>> print root.xpath("//p:other", {"p":"http://whatever/"}) [] So raising an exception here *really* prevents a lot of pitfalls and helps people fix their programs. > I wonder whether it'd be possible to support namespace prefixes the > proper way this way. I.e if I write: > > Element('foo:bar', nsmap={'foo': 'blah}) > > that could be equivalent to: > > Element('{blah}bar', nsmap={'foo': 'blah'}) No. There should be one way to do this. We already use prefixes in XPath, which causes a lot of annoyance for new users. BTW, this is an extremely rare use pattern. Normally, you would either work on an XML document that already comes with its pre-defined prefixes, or you would define an nsmap once (as you show above) and then stick to using SubElement(..., "{ns}tag") without redefining the prefixes. Note that lxml nicely reassigns prefixes now when inserting an element into an existing tree, so there really is no need to assign prefixes more than once (if at all). > The nice thing is that you could avoid having to write '{%s}foo' % > my_namespace a lot. Feel free to assign it to a global constant or to use the E factory as in lxml.html.builder. > Of course this has consequences for other areas, such as 'tag', so I'm > not sure whether this is a good idea, but throwing it in. Right, it would let ".tag" return something other than what you passed into the Element() function. > It's definitely another extension on ElementTree, which can't really do > this kind of stuff well due to the lack of parent pointers. Right, so it would unnecessarily add an additional namespace definition pattern that is not supported by ET and at the same time allow the pitfalls that the users who reported the problem currently run into. Meaning: it would let people write programs that would stop working the day they wanted to switch to ET or the day they started using XPath. Great. No, this change is definitely a bug fix. I'm sorry for people who were not aware of this bug in the past and accidentally misused it, but this has to change. Stefan From stefan_ml at behnel.de Fri Aug 17 21:30:46 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 17 Aug 2007 21:30:46 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: References: <20070816003125.GB42148@cutter.rexx.com> Message-ID: <46C5F766.1050705@behnel.de> Martijn Faassen wrote: > Dave Kuhlman wrote: >> I've been using lxml and think it is great, but ... >> >> I recently installed lxml-1.3.3. Now I find that the following >> gives me an error: >> >> In [3]: from lxml import etree >> In [4]: etree.Element('abc:def') >> ------------------------------------------------------------ >> Traceback (most recent call last): >> File "", line 1, in >> File "etree.pyx", line 1801, in etree.Element >> File "apihelpers.pxi", line 101, in etree._makeElement >> File "apihelpers.pxi", line 723, in etree._getNsTag >> ValueError: Invalid tag name >> >> It's because of the ":" in the tag name. > > As another data point: by coincidence yesterday I saw a discussion of > some other project who also ran into this problem. > > http://groups.google.com/group/html5lib-discuss/browse_thread/thread/9997a2468ab2b362 > > No idea about the context there. Hmmm, I really wonder how many people used this 'feature' to work around having to implement proper namespace support... Stefan From ianb at colorstudy.com Fri Aug 17 23:10:40 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Aug 2007 16:10:40 -0500 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <46C5F766.1050705@behnel.de> References: <20070816003125.GB42148@cutter.rexx.com> <46C5F766.1050705@behnel.de> Message-ID: <46C60ED0.1050606@colorstudy.com> Stefan Behnel wrote: >>> It's because of the ":" in the tag name. >> As another data point: by coincidence yesterday I saw a discussion of >> some other project who also ran into this problem. >> >> http://groups.google.com/group/html5lib-discuss/browse_thread/thread/9997a2468ab2b362 >> >> No idea about the context there. > > Hmmm, I really wonder how many people used this 'feature' to work around > having to implement proper namespace support... One of the places where this recently came up is that Facebook is using markup with fb:*: http://wiki.developers.facebook.com/index.php/FBML -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org : Write code, do good : http://topp.openplans.org/careers From stefan_ml at behnel.de Sat Aug 18 07:56:25 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 18 Aug 2007 07:56:25 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <46C60ED0.1050606@colorstudy.com> References: <20070816003125.GB42148@cutter.rexx.com> <46C5F766.1050705@behnel.de> <46C60ED0.1050606@colorstudy.com> Message-ID: <46C68A09.4000307@behnel.de> Ian Bicking wrote: > Stefan Behnel wrote: >>>> It's because of the ":" in the tag name. >>> As another data point: by coincidence yesterday I saw a discussion of >>> some other project who also ran into this problem. >>> >>> http://groups.google.com/group/html5lib-discuss/browse_thread/thread/9997a2468ab2b362 >>> >>> >>> No idea about the context there. >> >> Hmmm, I really wonder how many people used this 'feature' to work around >> having to implement proper namespace support... > > One of the places where this recently came up is that Facebook is using > markup with fb:*: http://wiki.developers.facebook.com/index.php/FBML No, they are not. They are using a well-defined namespace: http://wiki.developers.facebook.com/index.php/FBML_DTD If you use unnamespaced "fb:*" tag names here, you will also break validation against their XSD. Stefan From stefan_ml at behnel.de Sat Aug 18 08:11:33 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 18 Aug 2007 08:11:33 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: References: <20070816003125.GB42148@cutter.rexx.com> Message-ID: <46C68D95.6020105@behnel.de> Martijn Faassen wrote: > Dave Kuhlman wrote: >> I've been using lxml and think it is great, but ... >> >> I recently installed lxml-1.3.3. Now I find that the following >> gives me an error: >> >> In [3]: from lxml import etree >> In [4]: etree.Element('abc:def') >> ------------------------------------------------------------ >> Traceback (most recent call last): >> File "", line 1, in >> File "etree.pyx", line 1801, in etree.Element >> File "apihelpers.pxi", line 101, in etree._makeElement >> File "apihelpers.pxi", line 723, in etree._getNsTag >> ValueError: Invalid tag name >> >> It's because of the ":" in the tag name. > > As another data point: by coincidence yesterday I saw a discussion of > some other project who also ran into this problem. > > http://groups.google.com/group/html5lib-discuss/browse_thread/thread/9997a2468ab2b362 Hmmm, I don't know. Maybe we should revert the behaviour for 1.3.4 and just keep it for 2.0, which actually tests tag names against the spec instead of just looking for ':'. Projects that use those tag names are now aware that this is not supposed to be allowed (as the link above suggests), so changing the behaviour in 2.0 gives them the time to fix their software. We could maybe raise a Warning if we encounter problematic usage. At least, I would make it clear in the release notes that this is *only* for temporary convenience. Opinions? Stefan From stefan_ml at behnel.de Sat Aug 18 12:42:13 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 18 Aug 2007 12:42:13 +0200 Subject: [lxml-dev] lxml.html landed Message-ID: <46C6CD05.6020302@behnel.de> Hi all, I finally merged the lxml.html branch into the trunk. This includes the package lxml.html, the module lxml.cssselect, and the XML/HTML doctest support. All were written by Ian Bicking, with only a couple of minor changes and comments by myself. Thanks Ian! One more thing I changed for the merge was to let get_element_by_id() raise an exception if no default is passed (like getattr() does). That way, you can be explicit about what you want: None or an exception. I took another quick look at the parser functions and I think they look OK now. The alpha cycle of lxml 2.0 will hopefully tell us if they work as people expect. I'll try to take another look at the pending objectify changes by Holger. Once I get through them, the time for the first alpha release of lxml 2.0 has come. You can expect it at the end of august. So, a call to all contributors and those who want to help out: *Please help out looking through the new APIs and the trunk documentation!* There were loads of changes and the new modules could well benefit from a couple of additional paragraphs in the docs. As usual, the HTML version is at http://codespeak.net/lxml/dev/ Thanks for the great code, Ian! Stefan From faassen at startifact.com Sat Aug 18 18:22:51 2007 From: faassen at startifact.com (Martijn Faassen) Date: Sat, 18 Aug 2007 18:22:51 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <8928d4e90708180922mf16b648o302f5195b6bea03e@mail.gmail.com> References: <20070816003125.GB42148@cutter.rexx.com> <46C3F1CA.5030303@behnel.de> <46C5F3AE.8080006@behnel.de> <8928d4e90708180922mf16b648o302f5195b6bea03e@mail.gmail.com> Message-ID: <8928d4e90708180922w7679075epd3d09d23de80fe8a@mail.gmail.com> [whoops reply only sent to Stefan while I meant to include the list] Hey Stefan, I agree that this is a bugfix; sorry for the confusion. I never tried to use namespace prefixes this way and used Clarke notation consistently. My feedback is not coming from the perspective of supporting broken use, but wondering whether we cannot make lxml easier to use. On 8/17/07, Stefan Behnel wrote: [snip in the past people used to be able to construct programs that looked like they produced correct XML but actually didn't] > So raising an exception here *really* prevents a lot of pitfalls and helps > people fix their programs. Okay, it is clear that the previous behavior was at best undefined, so raising an error is a good idea. It does indicate one thing though - if we *wanted* to write a feature that explictily used namespace prefixes, we could, as it's certainly not doing anything else. :) > > I wonder whether it'd be possible to support namespace prefixes the > > proper way this way. I.e if I write: > > > > Element('foo:bar', nsmap={'foo': 'blah}) > > > > that could be equivalent to: > > > > Element('{blah}bar', nsmap={'foo': 'blah'}) > > No. There should be one way to do this. We already use prefixes in XPath, > which causes a lot of annoyance for new users. Yes, and prefixes are used in the XML serialization. The way we have both prefixes and Clarke notation *already* creates a lot of confusion for users. In addition, the Clarke notation pattern forces one to write code like this: SubElement(el, '{%s}foo' % MY_NS) > BTW, this is an extremely rare use pattern. Normally, you would either work on > an XML document that already comes with its pre-defined prefixes, or you would > define an nsmap once (as you show above) and then stick to using > SubElement(..., "{ns}tag") without redefining the prefixes. Yes, but this is a very common pattern: SubElement(el, '{%s}foo' % MY_NS) i.e. people generally don't want to spell out their entire namespace URI over and over again when constructing XML. Therefore I started to wonder whether we could create a convenience that uses namespace prefixes *and* does the right thing: SubElement(el, 'myns:foo') will work *if* myns has been defined as a namespace prefix in the context of 'el'. Of course, accessing tags through .tag would still return Clarke notation. We can have various objections against this. We can for instance say, this is a bad idea as it's it is surprising behavior. After all, if you set a tag and then get it, you'll get something else back. Then again, since XML parsing already has this behavior and thus the user will have to be familiar with it anyway, I don't think it's that surprising in the end. It might actually be a useful convenience that will make some code look cleaner. Another objection is that we should have only one way to do it. But we don't, really. In order to set namespaces for elements, we currently have a number of ways to do arrange your code. One is to use the "%s" pattern. Another is to use a custom factory specific to your codebase. And we *already* have two ways to get namespace information into the application - through namespace prefixes in the parser, and through Clarke notation in the API. > Note that lxml nicely reassigns prefixes now when inserting an element into an > existing tree, so there really is no need to assign prefixes more than once > (if at all). Assigning prefixes, sure. *Using* prefixes is what I'm talking about. > > The nice thing is that you could avoid having to write '{%s}foo' % > > my_namespace a lot. > > Feel free to assign it to a global constant or to use the E factory as in > lxml.html.builder. Yes, remember that I've used lxml before. :) I often use a global constant. It still means I scatter "{%s}foo" % MY_GLOBAL_CONSTANT throughout my code. Meanwhile, I *already* have a "global constant" that I also set somewhere, in the XML, namely my namespace map. I can of course create my own factory, which I've also frequently done. That runs the risk of obscuring otherwise clear use of the lxml API. (then again, the application's concerns may often force a factory on the developer anyway). > > Of course this has consequences for other areas, such as 'tag', so I'm > > not sure whether this is a good idea, but throwing it in. > > Right, it would let ".tag" return something other than what you passed into > the Element() function. Yes. If we make this change, we'd also need to figure out what happens if you explictily *set* tag. Should we allow: foo.tag = 'foo:bar' allowing potentially inconsistent behavior as you can set something and then get back something else in Clarke notation, or still forbid it? > > It's definitely another extension on ElementTree, which can't really do > > this kind of stuff well due to the lack of parent pointers. > > Right, so it would unnecessarily add an additional namespace definition > pattern that is not supported by ET and at the same time allow the pitfalls > that the users who reported the problem currently run into. Meaning: it would > let people write programs that would stop working the day they wanted to > switch to ET or the day they started using XPath. Great. I think there are two potential drawbacks: * allow users to write programs that will stop working as soon as they switch back to ET. This is a drawback. It's also a drawback that already exists - we have many many extensions above the ElementTree API and people's programs will stop working if they don't stick to the common subset. * allow users to write programs that stop working when they switch to XPath. I don't understand why you say this. Of course it's broken *now*. I'm not advocating the current broken behavior at all, and support disabling undefined behavior. I'm just wondering whether we shouldn't support this behavior explicitly *and do the right thing*. We can still raise an exception as soon as someone uses an undefined namespace prefix, of course. > No, this change is definitely a bug fix. I'm sorry for people who were not > aware of this bug in the past and accidentally misused it, but this has to change. Sorry for the confusion in my original reply. I didn't mean to say the bugfix should be rolled back. It's indeed a bugfix and I support it. It just led me to think we might have an opportunity there for improving our API. Regards, Martijn From stefan_ml at behnel.de Sun Aug 19 18:56:05 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 19 Aug 2007 18:56:05 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <8928d4e90708180922w7679075epd3d09d23de80fe8a@mail.gmail.com> References: <20070816003125.GB42148@cutter.rexx.com> <46C3F1CA.5030303@behnel.de> <46C5F3AE.8080006@behnel.de> <8928d4e90708180922mf16b648o302f5195b6bea03e@mail.gmail.com> <8928d4e90708180922w7679075epd3d09d23de80fe8a@mail.gmail.com> Message-ID: <46C87625.5050509@behnel.de> Hi Martijn, Martijn Faassen wrote: > I agree that this is a bugfix; sorry for the confusion. I never tried > to use namespace prefixes this way and used Clarke notation > consistently. My feedback is not coming from the perspective of > supporting broken use, but wondering whether we cannot make lxml > easier to use. I understand. I just doubt it would become easier to use. > Yes, and prefixes are used in the XML serialization. The way we have > both prefixes and Clarke notation *already* creates a lot of confusion > for users. That's why I would rather prefer getting it eliminated in XPath (with ETXPath) than introducing it in other parts of the API. >> Note that lxml nicely reassigns prefixes now when inserting an element into >> an existing tree, so there really is no need to assign prefixes more than >> once (if at all). > > Assigning prefixes, sure. *Using* prefixes is what I'm talking about. But prefixes are error prone and this behaviour makes them even more error prone. Prefixes are not equivalent to namespaces as more than one prefix can map to the same namespace, in different parts of a document or even concurrently. And since lxml.etree adapts namespace prefixes when merging documents or adding new elements, you can get surprising behaviour depending on the source of the document you are working on. If you only generate XML from scratch without interacting with external code, you may be fine with prefix notation, but if you work on existing documents or pipe XML through external libraries, you may end up being surprised why lxml.etree starts throwing exceptions at you when you continue working on the document you just got back. Allowing prefix notation in tag names encourages people to write code that makes assumptions about their data that may not be true for 100% equivalent data. And if you are aware of the potential pitfalls of such a feature, I doubt that you would use it except for a very limited number of use cases. > In addition, the Clarke notation pattern forces one to write code like this: > > SubElement(el, '{%s}foo' % MY_NS) > > i.e. people generally don't want to spell out their entire namespace > URI over and over again when constructing XML. I absolutely see that problem. But I do not think that supporting prefix notation is a good way to solve this. I mean, the most common case where this really hurts is that you use one single namespace in your application and have to repeat it for every SubElement. But it's easy to write a factory that wraps SubElement() and simply copies the namespace of the parent over to the new child (if it doesn't provide one itself), something like this: def SameNamespaceSubElement(parent, tag, *args, **kwargs): if not tag.startswith("{") and parent.tag.startswith("{"): tag = parent.tag[:parent.tag.index("}")+1] + tag return etree.SubElement(parent, tag, *args, **kwargs) (plus QName() support, plus a better name, etc.) >>> The nice thing is that you could avoid having to write '{%s}foo' % >>> my_namespace a lot. >> Feel free to assign it to a global constant or to use the E factory as in >> lxml.html.builder. > > Yes, remember that I've used lxml before. :) :) > I often use a global constant. It still means I scatter "{%s}foo" % > MY_GLOBAL_CONSTANT throughout my code. Meanwhile, I *already* have a > "global constant" that I also set somewhere, in the XML, namely my > namespace map. I rather meant something like a module that keeps constants for every tag name in a namespace, a bit like in lxml.html.builder, just with strings instead of factories. >>> Of course this has consequences for other areas, such as 'tag', so I'm >>> not sure whether this is a good idea, but throwing it in. >> Right, it would let ".tag" return something other than what you passed into >> the Element() function. > > Yes. If we make this change, we'd also need to figure out what happens > if you explictily *set* tag. Should we allow: > > foo.tag = 'foo:bar' >>> foo.tag = 'foo:bar' >>> print foo.tag {http://whatever}bar Perfectly understandable. If we implement that, I'm all for documenting it in a doctest. People will have to see it to believe it. :) Stefan From stefan_ml at behnel.de Mon Aug 20 09:41:51 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 20 Aug 2007 09:41:51 +0200 Subject: [lxml-dev] objectify factories Message-ID: <46C945BF.9020805@behnel.de> Hi Holger, I finally looked a little closer at the latest patches you sent me. They contain good ideas, but I'm still not very comfortable with the implementation. I think the PT() factory should be folded into DataElement(), as it's just a special case. I attached a patch that merges a part of your latest factory patch and removes the need for the PT factory. Please check if it does what you wanted and if anything is still missing. Regarding the TypedElementMaker, I think that if we write one that is adapted to objectify, we should not stop half-way. We should remove the "typemap" thing and just use the type inference mechanisms that objectify already provides. You can take a look into that, if you want, otherwise I will try to come up with an implementation when I find the time (which may be after the release of 2.0alpha1). Holger, is this ok for you or is there any reason we should not go this way? Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: partial-merge-holger-pt.patch Type: text/x-diff Size: 5788 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070820/c5300782/attachment.bin From faassen at startifact.com Mon Aug 20 16:12:35 2007 From: faassen at startifact.com (Martijn Faassen) Date: Mon, 20 Aug 2007 16:12:35 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <46C87625.5050509@behnel.de> References: <20070816003125.GB42148@cutter.rexx.com> <46C3F1CA.5030303@behnel.de> <46C5F3AE.8080006@behnel.de> <8928d4e90708180922mf16b648o302f5195b6bea03e@mail.gmail.com> <8928d4e90708180922w7679075epd3d09d23de80fe8a@mail.gmail.com> <46C87625.5050509@behnel.de> Message-ID: <8928d4e90708200712n911accxf95ad8f77516b0e8@mail.gmail.com> Hey, On 8/19/07, Stefan Behnel wrote: [snip] > > Yes, and prefixes are used in the XML serialization. The way we have > > both prefixes and Clarke notation *already* creates a lot of confusion > > for users. > > That's why I would rather prefer getting it eliminated in XPath (with ETXPath) > than introducing it in other parts of the API. I assume ETXPath is a non-compliant way to express XPath expressions that uses Clarke notation? Those will look very very long. I'm not sure whether that will make people's life easier at all... Suddenly standard XPath examples fail to work. We must face it: namespace prefixes *are* something that people need to worry about when dealing with XML anyway. We cannot make them go away from our APIs entirely. My proposal is an attempt to make the best of it. > >> Note that lxml nicely reassigns prefixes now when inserting an element into > >> an existing tree, so there really is no need to assign prefixes more than > >> once (if at all). > > > > Assigning prefixes, sure. *Using* prefixes is what I'm talking about. > > But prefixes are error prone and this behaviour makes them even more error > prone. Prefixes are not equivalent to namespaces as more than one prefix can > map to the same namespace, in different parts of a document or even > concurrently. This is behavior that any XML programmer will need to be aware of anyway. I don't think that in most XML handling code this will be error-prone. People can understand that namespace definitions get inherited through the XML tree. We can see prefixes as variables "acquired" through the XML tree. If I use prefix 'a' on some node, the system will walk up the parent chain until it finds the definition of 'a'. If the definition can not be found, this is an error. > And since lxml.etree adapts namespace prefixes when merging > documents or adding new elements, you can get surprising behaviour depending > on the source of the document you are working on. That is indeed a greater potential cause for errors. Under what circumstances does this happen in practice? I imagine this is a bigger problem when merging documents than when adding new elements, right? > If you only generate XML > from scratch without interacting with external code, you may be fine with > prefix notation, but if you work on existing documents or pipe XML through > external libraries, you may end up being surprised why lxml.etree starts > throwing exceptions at you when you continue working on the document you just > got back. That's a good point. Another question is what would happen with the default namespace - it would be scary to have unprefixed names suddenly turn into namespaced names. > Allowing prefix notation in tag names encourages people to write code that > makes assumptions about their data that may not be true for 100% equivalent > data. And if you are aware of the potential pitfalls of such a feature, I > doubt that you would use it except for a very limited number of use cases. That's the question :is this set of use cases really "very limited"? In many many use cases, for instance almost all of my own, XML documents only use a single namespace, or at most a few. Possibly these (in my opinion very common) use cases would be served by another strategy than meaningful namespace prefixes. > > In addition, the Clarke notation pattern forces one to write code like this: > > > > SubElement(el, '{%s}foo' % MY_NS) > > > > i.e. people generally don't want to spell out their entire namespace > > URI over and over again when constructing XML. > > I absolutely see that problem. But I do not think that supporting prefix > notation is a good way to solve this. > I mean, the most common case where this > really hurts is that you use one single namespace in your application and have > to repeat it for every SubElement. But it's easy to write a factory that wraps > SubElement() and simply copies the namespace of the parent over to the new > child (if it doesn't provide one itself), something like this: > > def SameNamespaceSubElement(parent, tag, *args, **kwargs): > if not tag.startswith("{") and parent.tag.startswith("{"): > tag = parent.tag[:parent.tag.index("}")+1] + tag > return etree.SubElement(parent, tag, *args, **kwargs) > > (plus QName() support, plus a better name, etc.) In order to construct code like this more easily, it would be nice by the way if elements had their namespace URI and namespace prefix available as attributes (plus the namespace prefix -> namespace mapping). I do end up constructing a factory frequently. The above features would allow me to construct a sub element factory that uses namespace prefixes. We could then play with the feel of this and see whether we can eventually move such a factory into the core (and in what form). Regards, Martijn From stefan_ml at behnel.de Mon Aug 20 23:58:37 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 20 Aug 2007 23:58:37 +0200 Subject: [lxml-dev] Problem with ":" char in tag names In-Reply-To: <8928d4e90708200712n911accxf95ad8f77516b0e8@mail.gmail.com> References: <20070816003125.GB42148@cutter.rexx.com> <46C3F1CA.5030303@behnel.de> <46C5F3AE.8080006@behnel.de> <8928d4e90708180922mf16b648o302f5195b6bea03e@mail.gmail.com> <8928d4e90708180922w7679075epd3d09d23de80fe8a@mail.gmail.com> <46C87625.5050509@behnel.de> <8928d4e90708200712n911accxf95ad8f77516b0e8@mail.gmail.com> Message-ID: <46CA0E8D.6000602@behnel.de> Hi Martijn, Martijn Faassen wrote: > On 8/19/07, Stefan Behnel wrote: > [snip] >>> Yes, and prefixes are used in the XML serialization. The way we have >>> both prefixes and Clarke notation *already* creates a lot of confusion >>> for users. >> That's why I would rather prefer getting it eliminated in XPath (with ETXPath) >> than introducing it in other parts of the API. > > I assume ETXPath is a non-compliant way to express XPath expressions > that uses Clarke notation? Right. > Those will look very very long. I'm not > sure whether that will make people's life easier at all... Sure, that's a problem. I'm not considering ETXPath a perfect replacement, but XPath() is the last part of the API that really forces you into using prefixes. Before I added the FAQ entry, we had enough newbees asking how to figure out which prefix they had to use for their XPath expression. >> And since lxml.etree adapts namespace prefixes when merging >> documents or adding new elements, you can get surprising behaviour depending >> on the source of the document you are working on. > > That is indeed a greater potential cause for errors. Under what > circumstances does this happen in practice? I imagine this is a bigger > problem when merging documents than when > adding new elements, right? Definitely. If you manage to find a working prefix to append a first element and then continue to append elements below it with the same prefix, it will continue to work as expected. So, as I said, as long as you just generate, you're fine. But then, the E factory is much better for XML generation already. >>> In addition, the Clarke notation pattern forces one to write code like this: >>> >>> SubElement(el, '{%s}foo' % MY_NS) >>> >>> i.e. people generally don't want to spell out their entire namespace >>> URI over and over again when constructing XML. >> I absolutely see that problem. But I do not think that supporting prefix >> notation is a good way to solve this. > >> I mean, the most common case where this >> really hurts is that you use one single namespace in your application and have >> to repeat it for every SubElement. But it's easy to write a factory that wraps >> SubElement() and simply copies the namespace of the parent over to the new >> child (if it doesn't provide one itself), something like this: >> >> def SameNamespaceSubElement(parent, tag, *args, **kwargs): >> if not tag.startswith("{") and parent.tag.startswith("{"): >> tag = parent.tag[:parent.tag.index("}")+1] + tag >> return etree.SubElement(parent, tag, *args, **kwargs) >> >> (plus QName() support, plus a better name, etc.) > > In order to construct code like this more easily, it would be nice by > the way if elements had their namespace URI and namespace prefix > available as attributes (plus the namespace prefix -> namespace > mapping). :) I also thought about that when I wrote the factory above. Then we should add "tagname" and "namespace" properties to Element. Hmmm, I hate adding stuff to that class, though. It always cuts down the available names for objectify... > I do end up constructing a factory frequently. The above features > would allow me to construct a sub element factory that uses namespace > prefixes. We could then play with the feel of this and see whether we > can eventually move such a factory into the core (and in what form). We could add an experimental factory module for the alpha phase of 2.0. Just a few functions that people could try out and that could be folded back into etree. BTW, the above also looks much more efficient in Pyrex code. What about a factory with a fixed namespace, one which inherits the namespace, one which sets the text in one step, ... Stefan From stefan_ml at behnel.de Wed Aug 22 20:29:56 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 22 Aug 2007 20:29:56 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <46C945BF.9020805@behnel.de> References: <46C945BF.9020805@behnel.de> Message-ID: <46CC80A4.7010607@behnel.de> Stefan Behnel wrote: > Regarding the TypedElementMaker, I think that if we write one that is adapted > to objectify, we should not stop half-way. We should remove the "typemap" > thing and just use the type inference mechanisms that objectify already > provides. You can take a look into that, if you want, otherwise I will try to > come up with an implementation when I find the time (which may be after the > release of 2.0alpha1). ... or a bit before :) This code is what I think might work well for objectify. Any general comments before I go any further? Stefan cdef class ElementMaker: cdef object _makeelement cdef object _namespace cdef object _nsmap def __init__(self, namespace=None, nsmap=None, makeelement=None): self._nsmap = nsmap if namespace is None: self._namespace = None else: self._namespace = "{%s}" % namespace if makeelement is not None: assert callable(makeelement) self._makeelement = makeelement else: self._makeelement = None def __getattr__(self, tag): if tag[0] != "{" and self._namespace is not None: tag = self._namespace + tag return _ObjectifyElementMakerCaller( self._makeelement, tag, self._nsmap) cdef class _ObjectifyElementMakerCaller: cdef object _tag cdef object _nsmap cdef object _element_factory def __init__(self, element_factory, tag, nsmap): self._element_factory = element_factory self._tag = tag self._nsmap = nsmap def __call__(self, *children, **attrib): cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element if self._element_factory is None: element = cetree.makeElement( self._tag, None, objectify_parser, None, None, attrib, self._nsmap) else: element = self._element_factory(self._tag, attrib, self._nsmap) for child in children: if child is None: if len(children) == 1: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif python._isString(child): _add_text(element, child) elif isinstance(child, _Element): cetree.appendChild(element, child) elif isinstance(child, _ObjectifyElementMakerCaller): elementMaker = <_ObjectifyElementMakerCaller>child if elementMaker._element_factory is None: child = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: child = elementMaker._element_factory( (<_ObjectifyElementMakerCaller>child)._tag) cetree.appendChild(element, child) else: pytype = python.PyDict_GetItem( _PYTYPE_DICT, _typename(child)) if pytype is not NULL: (pytype)._stringify(element, child) else: child = str(child) _add_text(element, child) return element From sidnei at enfoldsystems.com Wed Aug 22 23:43:56 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Wed, 22 Aug 2007 18:43:56 -0300 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function Message-ID: Hi there, Just a heads up in case anyone is facing similar issues. I have written a Apache output filter using lxml and it has been working fine so far, but after some changes to a xsl file it started crashing Apache. I've tracked down the issue to two variable definitions that were using the 'document()' function to include a external xml file. I wrote a script that performs the same transformation outside of Apache, and surprisingly it works just fine. The only difference being that the script resolves to a file in the filesystem, and the filter in Apache uses urlopen(). I'm testing this on Windows, and when the crash happens and I select debug the current line is in free() and the previous item in the stack is etree.pyd, but I couldn't get a debug build going to see where exactly in etree.pyd. This happens with both lxml 1.2.1 and 1.3.3, using libxml2 2.6.28 and libxslt 1.1.19. Any clues appreciated... -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From felwert at uni-bremen.de Thu Aug 23 10:10:05 2007 From: felwert at uni-bremen.de (Frederik Elwert) Date: Thu, 23 Aug 2007 10:10:05 +0200 Subject: [lxml-dev] Context of custom XPath functions Message-ID: <1187856605.6349.1.camel@FredDesk> Hello! Currently, I'm trying to implement some of the XForms functions with lxml. The most urgent one is instance(), which gets the instance data from an XForms model. But if you're not familiar with XForms, don't bother, just think of id() or something similar: With id(), it is necessary to access the context of the function call, i.e., the tree in which it is called. id() is supposed to return an element node (or node-set) matching the given argument. So id() has to know about the tree, and not only about it's given argument. So how would one implement id() or any similar function with lxml? As far as I got with custom functions, they only can handle the arguments they get passed direcly, but they don't know about the broader context. Any suggestions how I could solve this? Thanks in advance, Frederik From stefan_ml at behnel.de Thu Aug 23 12:42:56 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 23 Aug 2007 12:42:56 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <1187856605.6349.1.camel@FredDesk> References: <1187856605.6349.1.camel@FredDesk> Message-ID: <46CD64B0.5090705@behnel.de> Hi, Frederik Elwert wrote: > Currently, I'm trying to implement some of the XForms functions with > lxml. The most urgent one is instance(), which gets the instance data > from an XForms model. But if you're not familiar with XForms, don't > bother, just think of id() or something similar: > > With id(), it is necessary to access the context of the function call, > i.e., the tree in which it is called. id() is supposed to return an > element node (or node-set) matching the given argument. So id() has to > know about the tree, and not only about it's given argument. I assume that instance() does not get any nodes as parameter. Otherwise you could call their getroottree() to retrieve the ElementTree of the current document. > So how would one implement id() or any similar function with lxml? As > far as I got with custom functions, they only can handle the arguments > they get passed direcly, but they don't know about the broader context. > Any suggestions how I could solve this? One way would be to define the functions local to each XPath call and provide them with the necessary context yourself. If you want a more global solution, you may have noticed that functions receive a (currently empty) context object as first parameter. Maybe we should make that a real context object or a dictionary that includes the reference to the current document or its root node? Stefan From felwert at uni-bremen.de Thu Aug 23 13:15:54 2007 From: felwert at uni-bremen.de (Frederik Elwert) Date: Thu, 23 Aug 2007 13:15:54 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <46CD64B0.5090705@behnel.de> References: <1187856605.6349.1.camel@FredDesk> <46CD64B0.5090705@behnel.de> Message-ID: <1187867754.6349.22.camel@FredDesk> Hi, Am Donnerstag, den 23.08.2007, 12:42 +0200 schrieb Stefan Behnel: > Frederik Elwert wrote: > > Currently, I'm trying to implement some of the XForms functions with > > lxml. The most urgent one is instance(), which gets the instance data > > from an XForms model. But if you're not familiar with XForms, don't > > bother, just think of id() or something similar: > > > > With id(), it is necessary to access the context of the function call, > > i.e., the tree in which it is called. id() is supposed to return an > > element node (or node-set) matching the given argument. So id() has to > > know about the tree, and not only about it's given argument. > > I assume that instance() does not get any nodes as parameter. Otherwise you > could call their getroottree() to retrieve the ElementTree of the current > document. Right, instance() get's a string as parameter, if any. So this doesn't work. > > > So how would one implement id() or any similar function with lxml? As > > far as I got with custom functions, they only can handle the arguments > > they get passed direcly, but they don't know about the broader context. > > Any suggestions how I could solve this? > > One way would be to define the functions local to each XPath call and provide > them with the necessary context yourself. I'm not sure I really understand how one would do this. But it sounds interesting, so could you give an example or further reference? I'm interested in a solution that I could get to work with current lxml, if possible. > If you want a more global solution, you may have noticed that functions > receive a (currently empty) context object as first parameter. Maybe we should > make that a real context object or a dictionary that includes the reference to > the current document or its root node? I already thought about that, too. This would be great! Maybe a dict is a good idea, since one could add further context information in the future. I'm not sure, if the document itself or the root node would be the best choice. Another XForms function, current(), "Returns the context node used to initialize the evaluation of the containing XPath expression." . So maybe this information is most useful in general. One knows which element is the context for the xpath-function, and from this, one can get the doc and root by getroottree() etc. And the context node seems to be introduces quite well in XPath, as I just read in the XPath spec . The introduction defines the context of XPath expressions as: * a node (the context node) * a pair of non-zero positive integers (the context position and the context size) * a set of variable bindings * a function library * the set of namespace declarations in scope for the expression So maybe this could be used as a reference for what to pass in a context dict. Function library and namespaces are present, anyway, so they man not be needed. Is this the case with XSLT-variables? I would guess so. So a context dict might contain the context node, and, if useful for anyone, context position and size. But whatever you find most practical, I really like the idea of context information for XPath functions! Regards, Frederik -- Frederik Elwert, M.A. Feldstr. 79A 28203 Bremen 0421.277 85 30 ICQ# 255-031-612 JabberID freedo at jabber.bettercom.de From stefan_ml at behnel.de Thu Aug 23 13:33:27 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 23 Aug 2007 13:33:27 +0200 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: References: Message-ID: <46CD7087.7060501@behnel.de> Hi Sidnei, Sidnei da Silva wrote: > Just a heads up in case anyone is facing similar issues. > > I have written a Apache output filter using lxml and it has been > working fine so far, but after some changes to a xsl file it started > crashing Apache. I've tracked down the issue to two variable > definitions that were using the 'document()' function to include a > external xml file. > > select="document('../vocabularies/geographic_vocabularies.xml')"/> > select="document('../vocabularies/workspace_vocabularies.xml')"/> > > I wrote a script that performs the same transformation outside of > Apache, and surprisingly it works just fine. The only difference being > that the script resolves to a file in the filesystem, and the filter > in Apache uses urlopen(). > > I'm testing this on Windows, and when the crash happens and I select > debug the current line is in free() and the previous item in the stack > is etree.pyd, but I couldn't get a debug build going to see where > exactly in etree.pyd. > > This happens with both lxml 1.2.1 and 1.3.3, using libxml2 2.6.28 and > libxslt 1.1.19. > > Any clues appreciated... Hmmm, I need more detail here. Any chance you could send me a working (i.e. crashing) example of acceptable size so that I could run it through valgrind? Stefan From sidnei at enfoldsystems.com Thu Aug 23 15:28:04 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Thu, 23 Aug 2007 10:28:04 -0300 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: <46CD7087.7060501@behnel.de> References: <46CD7087.7060501@behnel.de> Message-ID: On 8/23/07, Stefan Behnel wrote: > Hmmm, I need more detail here. Any chance you could send me a working (i.e. > crashing) example of acceptable size so that I could run it through valgrind? Yeah, that's the trick ;( For now I can only reproduce it inside Apache, but I've witnessed crashes both on Windows and on Linux. The code is not sharing parser or xslt objects between threads that I can spot. In fact all the processing is happening on one single thread, on the very first request. I've just got libxml2 and libxslt upgraded to newer versions (2.6.29 and 1.1.21) on my Linux box and will give it a last try there to see if the problem goes away. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Thu Aug 23 15:43:22 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 23 Aug 2007 15:43:22 +0200 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: References: Message-ID: <46CD8EFA.405@behnel.de> Sidnei da Silva wrote: > I wrote a script that performs the same transformation outside of > Apache, and surprisingly it works just fine. The only difference being > that the script resolves to a file in the filesystem, and the filter > in Apache uses urlopen(). Hmm, are you passing the file name straight into etree in the script? Because the code for parsing from a file-like object is very different from the code that parses from a filename. You're using Python resolvers? Stefan From sidnei at enfoldsystems.com Thu Aug 23 15:49:52 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Thu, 23 Aug 2007 10:49:52 -0300 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: <46CD8EFA.405@behnel.de> References: <46CD8EFA.405@behnel.de> Message-ID: On 8/23/07, Stefan Behnel wrote: > > Sidnei da Silva wrote: > > I wrote a script that performs the same transformation outside of > > Apache, and surprisingly it works just fine. The only difference being > > that the script resolves to a file in the filesystem, and the filter > > in Apache uses urlopen(). > > Hmm, are you passing the file name straight into etree in the script? Because > the code for parsing from a file-like object is very different from the code > that parses from a filename. > > You're using Python resolvers? Yes, sorry. Let me be more explicit. Both the script and the Apache filter are using urlopen() in a Python resolver. The only difference is that the filter is passing http:// urls into the Python resolver -> urlopen(), and the script is passing file:// urls into the Python resolver -> urlopen(). urlopen() then returns a file-like object, but I'm reading that into a StringIO object which is then returned with resolve_file(). -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Thu Aug 23 21:59:11 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 23 Aug 2007 21:59:11 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <1187867754.6349.22.camel@FredDesk> References: <1187856605.6349.1.camel@FredDesk> <46CD64B0.5090705@behnel.de> <1187867754.6349.22.camel@FredDesk> Message-ID: <46CDE70F.1010603@behnel.de> Hi, Frederik Elwert wrote: > Am Donnerstag, den 23.08.2007, 12:42 +0200 schrieb Stefan Behnel: >> Frederik Elwert wrote: >>> So how would one implement id() or any similar function with lxml? As >>> far as I got with custom functions, they only can handle the arguments >>> they get passed direcly, but they don't know about the broader context. >>> Any suggestions how I could solve this? >> One way would be to define the functions local to each XPath call and provide >> them with the necessary context yourself. > > I'm not sure I really understand how one would do this. But it sounds > interesting, so could you give an example or further reference? I'm > interested in a solution that I could get to work with current lxml, if > possible. There is an older API in lxml.etree that supports per-call extension definitions. It's not very well documented, but you can pass a list of [ {(ns, name):function} ] dicts as "extensions" kw arg into the constructor of XPath evaluators. It should work with any 1.x version of lxml.etree. >> If you want a more global solution, you may have noticed that functions >> receive a (currently empty) context object as first parameter. Maybe we should >> make that a real context object or a dictionary that includes the reference to >> the current document or its root node? > > I already thought about that, too. This would be great! Maybe a dict is > a good idea, since one could add further context information in the > future. > > I'm not sure, if the document itself or the root node would be the best > choice. Another XForms function, current(), "Returns the context node > used to initialize the evaluation of the containing XPath expression." > . So maybe this information > is most useful in general. One knows which element is the context for > the xpath-function, and from this, one can get the doc and root by > getroottree() etc. > > And the context node seems to be introduces quite well in XPath, as I > just read in the XPath spec . The > introduction defines the context of XPath expressions as: > > * a node (the context node) > * a pair of non-zero positive integers (the context position and > the context size) > * a set of variable bindings > * a function library > * the set of namespace declarations in scope for the expression > > So maybe this could be used as a reference for what to pass in a context > dict. Function library and namespaces are present, anyway, so they man > not be needed. Is this the case with XSLT-variables? I would guess so. > So a context dict might contain the context node, and, if useful for > anyone, context position and size. > > But whatever you find most practical, I really like the idea of context > information for XPath functions! Hmmm, having the context node available is obviously desirable, but it's not a straight forward thing. Internally, lxml.etree does loads of C-ish stuff to make sure the underlying C-tree stays consistent and allocated as long as there are Python references to it. When we pass an Element for the current context node, we may end up passing a node that is not part of the current document (in XSLT, for example). It might even be a temporary node, which is not linked to any of the documents that lxml.etree takes care of. So I think this might get us into a lot of trouble if people start keeping a reference to that node to work with it outside the context of the current function call. Even deallocation might just crash in the case of a temporary node. It already starts with the context itself. Allowing people to keep a reference to the context to make it live outside of the function call is like pushing around crash bugs. Ok, most people will not do this, but it happens. I will really, really have to take a deep look into this before I consider this a good idea. Stefan From felwert at uni-bremen.de Thu Aug 23 23:10:28 2007 From: felwert at uni-bremen.de (Frederik Elwert) Date: Thu, 23 Aug 2007 23:10:28 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <46CDE70F.1010603@behnel.de> References: <1187856605.6349.1.camel@FredDesk> <46CD64B0.5090705@behnel.de> <1187867754.6349.22.camel@FredDesk> <46CDE70F.1010603@behnel.de> Message-ID: <1187903428.6349.50.camel@FredDesk> Hi, Am Donnerstag, den 23.08.2007, 21:59 +0200 schrieb Stefan Behnel: > There is an older API in lxml.etree that supports per-call extension > definitions. It's not very well documented, but you can pass a list of > > [ {(ns, name):function} ] > > dicts as "extensions" kw arg into the constructor of XPath evaluators. It > should work with any 1.x version of lxml.etree. Ok, thank you. I just played a bit with this, and I came up with a custom element class that provides these functions, so they can access the context information. In addition to the xpath method, they provide a new "xfpath" method: class XFormsElement(ET.ElementBase): def instance(self, _, idref): tree = self.getroottree() namespaces = {'xf': 'http://www.w3.org/2002/xforms'} result = tree.xpath('//xf:instance[@id="%s"]' % idref, namespaces) return(result) def xfpath(self, expr): extensions = {(None, 'instance'): self.instance} return self.xpath(expr, extensions=extensions) myelement.xfpath("instance('scales')") I'm not sure if this is really the best solution, but it works. > Hmmm, having the context node available is obviously desirable, but it's not a > straight forward thing. Internally, lxml.etree does loads of C-ish stuff to > make sure the underlying C-tree stays consistent and allocated as long as > there are Python references to it. When we pass an Element for the current > context node, we may end up passing a node that is not part of the current > document (in XSLT, for example). It might even be a temporary node, which is > not linked to any of the documents that lxml.etree takes care of. So I think > this might get us into a lot of trouble if people start keeping a reference to > that node to work with it outside the context of the current function call. > Even deallocation might just crash in the case of a temporary node. > > It already starts with the context itself. Allowing people to keep a reference > to the context to make it live outside of the function call is like pushing > around crash bugs. Ok, most people will not do this, but it happens. > > I will really, really have to take a deep look into this before I consider > this a good idea. Ok, I understand. I don't want to get you into trouble... :-) I think it would be practical to have that context information, but it seems not to be needed too often - otherwise the topic might have come up earlier. So maybe it's just not worth the trouble. Regards, Frederik From stefan_ml at behnel.de Fri Aug 24 00:05:18 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 24 Aug 2007 00:05:18 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <1187867754.6349.22.camel@FredDesk> References: <1187856605.6349.1.camel@FredDesk> <46CD64B0.5090705@behnel.de> <1187867754.6349.22.camel@FredDesk> Message-ID: <46CE049E.3070209@behnel.de> Hi, I investigated this a little. It looks like such a context is not as hard to implement as I first thought. We can just throw an exception for the dangerous cases for now, so that we can see if people actually use them. And since the pointer to the libxml2 XPath context is cleared before returning from the function, we can determine the case where users keep the context alive and just throw an exception as well. Frederik Elwert wrote: > * a node (the context node) We can instantiate the context Element if it's from the current document, that should not do any harm. > * a pair of non-zero positive integers (the context position and > the context size) libxml2 makes them available, but since I don't quite get what they are used for, I'll leave these out for now. (Feel free to convince me that they are needed). > * a set of variable bindings We can get these from libxml2's XPath context by traversing a hash table. However, I'm not sure how these behave for XSLT, so I'll also just leave these out for now. > * a function library Who cares, really? Just call another XPath expression if you really need to access an XPath function. > * the set of namespace declarations in scope for the expression That's just the nsmap of the context node. libxml2's XPath context also has a pointer for the value of the here() function, maybe that's interesting, too. Ok, so I think it would be enough for now to provide the context with a property "context_node" and maybe with a call-local dictionary that allows functions to keep state. Would that be enough for you? BTW, this is definitely lxml 2.0 stuff, it won't go into lxml 1.3. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: function-xpath-context.patch Type: text/x-diff Size: 3596 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070824/dc6fc48a/attachment.bin From stefan_ml at behnel.de Fri Aug 24 09:41:21 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 24 Aug 2007 09:41:21 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <1187903428.6349.50.camel@FredDesk> References: <1187856605.6349.1.camel@FredDesk> <46CD64B0.5090705@behnel.de> <1187867754.6349.22.camel@FredDesk> <46CDE70F.1010603@behnel.de> <1187903428.6349.50.camel@FredDesk> Message-ID: <46CE8BA1.1050700@behnel.de> Hi, Frederik Elwert wrote: > Am Donnerstag, den 23.08.2007, 21:59 +0200 schrieb Stefan Behnel: >> There is an older API in lxml.etree that supports per-call extension >> definitions. It's not very well documented, but you can pass a list of >> >> [ {(ns, name):function} ] >> >> dicts as "extensions" kw arg into the constructor of XPath evaluators. It >> should work with any 1.x version of lxml.etree. it actually /is/ documented, sorry: http://codespeak.net/lxml/extensions.html#evaluator-local-extensions (It's been a while since I wrote these docs, so I keep forgetting what's in there - and what isn't :) > I just played a bit with this, and I came up with a custom element class > that provides these functions, so they can access the context > information. In addition to the xpath method, they provide a new > "xfpath" method: > > class XFormsElement(ET.ElementBase): > > def instance(self, _, idref): > tree = self.getroottree() > namespaces = {'xf': 'http://www.w3.org/2002/xforms'} > result = tree.xpath('//xf:instance[@id="%s"]' % idref, namespaces) > return(result) > > def xfpath(self, expr): > extensions = {(None, 'instance'): self.instance} > return self.xpath(expr, extensions=extensions) > > myelement.xfpath("instance('scales')") > > I'm not sure if this is really the best solution, but it works. Well, it's maybe not the most efficient thing ever, as it requires re-parsing the XPath expression each time. But it works with most lxml versions and solves your problem, which is worth more than peek performance IMHO. I also like how simple it is, BTW. I'll also commit the context stuff for lxml 2.0, I think that really is a helpful addition. Stefan From felwert at uni-bremen.de Fri Aug 24 09:58:07 2007 From: felwert at uni-bremen.de (Frederik Elwert) Date: Fri, 24 Aug 2007 09:58:07 +0200 Subject: [lxml-dev] Context of custom XPath functions In-Reply-To: <46CE049E.3070209@behnel.de> References: <1187856605.6349.1.camel@FredDesk> <46CD64B0.5090705@behnel.de> <1187867754.6349.22.camel@FredDesk> <46CE049E.3070209@behnel.de> Message-ID: <1187942287.7450.9.camel@FredDesk> Hi, Am Freitag, den 24.08.2007, 00:05 +0200 schrieb Stefan Behnel: > I investigated this a little. It looks like such a context is not as hard to > implement as I first thought. We can just throw an exception for the dangerous > cases for now, so that we can see if people actually use them. And since the > pointer to the libxml2 XPath context is cleared before returning from the > function, we can determine the case where users keep the context alive and > just throw an exception as well. That sounds good! > > Frederik Elwert wrote: > > * a node (the context node) > > We can instantiate the context Element if it's from the current document, that > should not do any harm. That's the major point, I think. > > * a pair of non-zero positive integers (the context position and > > the context size) > > libxml2 makes them available, but since I don't quite get what they are used > for, I'll leave these out for now. (Feel free to convince me that they are > needed). No, I totally agree. I just wanted to see how XPath defines the context here. But I wouldn't know for what to use this information. Same for the next points... > > > * a set of variable bindings > > We can get these from libxml2's XPath context by traversing a hash table. > However, I'm not sure how these behave for XSLT, so I'll also just leave these > out for now. > > > > * a function library > > Who cares, really? Just call another XPath expression if you really need to > access an XPath function. > > > > * the set of namespace declarations in scope for the expression > > That's just the nsmap of the context node. > > libxml2's XPath context also has a pointer for the value of the here() > function, maybe that's interesting, too. Erm, I just have no idea what the here() function provides, so I can say that I don't need it for now... :-) > Ok, so I think it would be enough for now to provide the context with a > property "context_node" and maybe with a call-local dictionary that allows > functions to keep state. > > Would that be enough for you? Definitely. Thanks for all the investigations. > BTW, this is definitely lxml 2.0 stuff, it won't go into lxml 1.3. Sure. I have a working solution for now. But I really think this is a good and powerful thing to have, so I'll be looking forwards to lxml 2.0! Thanks again, Frederik From sidnei at enfoldsystems.com Fri Aug 24 21:01:30 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Fri, 24 Aug 2007 16:01:30 -0300 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: References: <46CD8EFA.405@behnel.de> Message-ID: Ok, I didn't finish setting up my environment on Linux (Paralells behaving badly today). But I've found some information about mod_python and C-based Python extensions that seems relevant. The Xapian folks pointed me out to this piece of documentation, which hints about using Xapian with mod_python and mentions issues about multiple interpreters and acquiring locks. http://svn.xapian.org/trunk/xapian-bindings/python/docs/bindings.html?r1=9201&r2=9202&view=patch Using the trick described there, setting 'PythonInterpreter main_interpreter' solves my crash, but leaves me wondering how I can write a reproducible test case without having more knowledge about multiple interpreters. Is that enough information for you to work with Stephan? Maybe it's a known-issue that just needs to be documented? Or is it a real bug involving Python resolvers and multiple interpreters? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From sidnei at enfoldsystems.com Fri Aug 24 21:16:28 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Fri, 24 Aug 2007 16:16:28 -0300 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: References: <46CD8EFA.405@behnel.de> Message-ID: FYI, Mark Hammond described a solution for the multiple interpreter issue here: http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=185 That issue mentions deadlocks, my guess is that the crash I experienced is the result of a *missing* lock. But that's only a guess. :) -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From stefan_ml at behnel.de Fri Aug 24 23:40:41 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 24 Aug 2007 23:40:41 +0200 Subject: [lxml-dev] Possible 'freeing null pointer' problem with 'document()' function In-Reply-To: References: <46CD8EFA.405@behnel.de> Message-ID: <46CF5059.3070904@behnel.de> Hi Sidnei, Sidnei da Silva wrote: > FYI, Mark Hammond described a solution for the multiple interpreter issue here: > > http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=185 > > That issue mentions deadlocks, my guess is that the crash I > experienced is the result of a *missing* lock. But that's only a > guess. :) That's actually quite possible. lxml.etree tries to find a reasonable trade-off between expensive locking and cheap telling people not to do certain stuff. As threading is rare in the Python world, I assume that most people won't use threading (or whatever kind of interpreter concurrency) anyway, so making that trade-off is worth it. Currently, lxml.etree uses locks around parsers, XPath evaluators and Element instantiation. The first two need locking as they reuse their contexts for performance reasons, and the Element factory needs to make sure Element proxy objects represent a libxml2 node exclusively. There may well be a few other places that require locking when people really use them concurrently. BTW, have you tried passing the "--without-threading" option to setup.py? That would give you a build that avoids releasing the GIL completely, and might also solve your problem. I would also like to see your original setup run with the current trunk, which features a rewrite of the XPath function code, including a few threading fixes. Another thing: I'm planning to switch lxml from Pyrex to Cython somewhere next week and I would like to get some feedback then if the generated code compiles nicely under MSVC before releasing 2.0alpha1. Otherwise, I might have to fix Cython to make it work. Luckily, the Cython maintainers are much more open to patches than Greg... Cython can be easy-intalled, so I'm just waiting for the next official release to let lxml depend on it and drop the svn:external on our patched Pyrex version. Stefan From felwert at uni-bremen.de Sun Aug 26 17:13:27 2007 From: felwert at uni-bremen.de (Frederik Elwert) Date: Sun, 26 Aug 2007 15:13:27 +0000 Subject: [lxml-dev] Keep track of file URL in Elements Message-ID: <1188141207.6369.34.camel@FredDesk> Hello! XForms turns out to be quite challenging... Sorry to bother you again. I need to access a document given by a link in an attribute. When using XSLT, this would be a typical usecase for document(). But when not using XSLT, I need to mimic it's behaviour, since document() doesn't work in ordinary XPath expressions. So I have something like this in an XML document: Since the link is relative to the document's location, and not to the current working directory, I need to get some absolute path for this. But the information for the document's location is only stored in the ElementTree initially generated by etree.parse(). The descendants know neither about the document's location, nor about their "parent" ElementTree, as far as I could see. So would it be anyhow possible to get the document's URL information in a custom xpath function? The only thing I could think of is searching for ElementTrees using dir(), seeing if the current element is in that tree using getpath(), and then using the trees docinfo. But that's really ugly. Maybe another possibility would be to use a custom Element class and add an attribute containing this information to each element during parsing the XML file. But (a) I'm not sure if this is possible at all, and (b) all elements would have a new, arbitrary attribute that I would have to get rid of before writing the tree back. So maybe anyone of you has an idea how this could be done using lxml? Kind regards, Frederik From stefan_ml at behnel.de Sun Aug 26 17:22:02 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 26 Aug 2007 17:22:02 +0200 Subject: [lxml-dev] Keep track of file URL in Elements In-Reply-To: <1188141207.6369.34.camel@FredDesk> References: <1188141207.6369.34.camel@FredDesk> Message-ID: <46D19A9A.4070402@behnel.de> Frederik Elwert wrote: > XForms turns out to be quite challenging... Sorry to bother you again. > > I need to access a document given by a link in an attribute. When using > XSLT, this would be a typical usecase for document(). But when not using > XSLT, I need to mimic it's behaviour, since document() doesn't work in > ordinary XPath expressions. > > So I have something like this in an XML document: > > > > Since the link is relative to the document's location, and not to the > current working directory, I need to get some absolute path for this. > But the information for the document's location is only stored in the > ElementTree initially generated by etree.parse(). The descendants know > neither about the document's location, nor about their "parent" > ElementTree, as far as I could see. Doesn't element.getroottree() do what you want? Stefan From jholg at gmx.de Mon Aug 27 12:39:21 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 27 Aug 2007 12:39:21 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <46C945BF.9020805@behnel.de> References: <46C945BF.9020805@behnel.de> Message-ID: <20070827103921.301120@gmx.net> Hi, back from the ever-too-short holidays... > I finally looked a little closer at the latest patches you sent me. They > contain good ideas, but I'm still not very comfortable with the > implementation. I think the PT() factory should be folded into > DataElement(), > as it's just a special case. I attached a patch that merges a part of your > latest factory patch and removes the need for the PT factory. Please check > if > it does what you wanted and if anything is still missing. I'm having a bit trouble to apply the patch as some seems to already be in the trunk now,so maybe I'm mistaken, but: How does this remove the need for PT(), which uses the python type name of its argument as pytype? Wouldn't folding this into DataElement() change DataElement behaviour significantly, which currently just operates on the string literal type-lookup? > Regarding the TypedElementMaker, I think that if we write one that is > adapted > to objectify, we should not stop half-way. We should remove the "typemap" > thing and just use the type inference mechanisms that objectify already > provides. You can take a look into that, if you want, otherwise I will try > to > come up with an implementation when I find the time (which may be after > the > release of 2.0alpha1). I saw you already posted an implementation, so I'll have a look at that. Cheers, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail From jholg at gmx.de Mon Aug 27 12:53:00 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 27 Aug 2007 12:53:00 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <46BDEDFC.3000207@behnel.de> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> Message-ID: <20070827105300.301110@gmx.net> Hi, I discussed this with Stefan before and I'm anxious to know if this is the way to go (maybe as switchable behaviour), removing the need for a beast like the discussed PT() factory, as well as making type behaviour arguably more "straightforward", at the cost of auto-adding py:pytype attributes: # _setElementValue implementation that auto-adds type(RVAL).__name__ as # py:pytype cdef _setElementValue(_Element element, value): if value is None: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif isinstance(value, _Element): _replaceElement(element, value) else: cetree.delAttributeFromNsName( element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") if not python._isString(value): pytype_name = type(value).__name__ if isinstance(value, bool): value = _lower_bool(value) else: value = str(value) else: pytype_name = "str" cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) cetree.setNodeText(element._c_node, value) I'm +1 for that. By making it switchable we could cater for those who don't care about the types that much but who do not want to see any non-explicitly created attributes. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal f?r Modem und ISDN: http://www.gmx.net/de/go/smartsurfer From stefan_ml at behnel.de Mon Aug 27 12:59:43 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 27 Aug 2007 12:59:43 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <20070827103921.301120@gmx.net> References: <46C945BF.9020805@behnel.de> <20070827103921.301120@gmx.net> Message-ID: <46D2AE9F.40105@behnel.de> Hi Holger, jholg at gmx.de wrote: > back from the ever-too-short holidays... welcome back to work. :) >> I finally looked a little closer at the latest patches you sent me. They >> contain good ideas, but I'm still not very comfortable with the >> implementation. I think the PT() factory should be folded into >> DataElement(), as it's just a special case. I attached a patch that >> merges a part of your latest factory patch and removes the need for the >> PT factory. Please check if it does what you wanted and if anything is >> still missing. > > I'm having a bit trouble to apply the patch as some seems to already be in > the trunk now Sorry, it already is in the trunk. > so maybe I'm mistaken, but: How does this remove the need for > PT(), which uses the python type name of its argument as pytype? Wouldn't > folding this into DataElement() change DataElement behaviour significantly, > which currently just operates on the string literal type-lookup? Ok, I was mistaken. I just applied the bit that special cased ObjectifyDataElements that were passed into DataElement() ... and even that might need some rework. In the case where you pass an ODE *and* a _pytype, you'd have to convert the value to a string and process it with the normal machinery. You're right, PT() solves a different purpose. I think it makes sense to add it. (I know, I keep changing my mind here, but it really looks like a helpful little factory). >> Regarding the TypedElementMaker, I think that if we write one that is >> adapted to objectify, we should not stop half-way. We should remove the >> "typemap" thing and just use the type inference mechanisms that objectify >> already provides. You can take a look into that, if you want, otherwise I >> will try to come up with an implementation when I find the time (which >> may be after the release of 2.0alpha1). > > I saw you already posted an implementation, so I'll have a look at that. Thanks. Stefan From stefan_ml at behnel.de Mon Aug 27 13:08:12 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 27 Aug 2007 13:08:12 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <20070827105300.301110@gmx.net> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> <20070827105300.301110@gmx.net> Message-ID: <46D2B09C.7060207@behnel.de> jholg at gmx.de wrote: > I discussed this with Stefan before and I'm anxious to know if this is the > way to go (maybe as switchable behaviour), removing the need for a beast like > the discussed PT() factory, as well as making type behaviour arguably more > "straightforward", at the cost of auto-adding py:pytype attributes: [_setElementValue implementation that auto-adds type(RVAL).__name__ as py:pytype] > I'm +1 for that. Actually, you were the one who proposed it in the first place, so there's nothing to add to. :) > By making it switchable we could cater for those who don't > care about the types that much but who do not want to see any non-explicitly > created attributes. I dislike the idea of adding a switch here. We already add pytype attributes in a couple of places, so people who do not like it will have to deannotate() their XML anyway (or not use objectify...). I think that always adding a pytype will give us more predictable behaviour. On the other hand, we could just check if the pytype the type inference mechanism returns is the type of the value, and only add the attribute if that is not the case. What do you think? It would not work if you exchange annotated data with other machines that use different setups, but if you do that, you'd probably annotate everything by hand anyway. Stefan From jholg at gmx.de Mon Aug 27 13:29:01 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 27 Aug 2007 13:29:01 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <46D2AE9F.40105@behnel.de> References: <46C945BF.9020805@behnel.de> <20070827103921.301120@gmx.net> <46D2AE9F.40105@behnel.de> Message-ID: <20070827112901.301100@gmx.net> > welcome back to work. :) Thanks! > You're right, PT() solves a different purpose. I think it makes sense to > add > it. (I know, I keep changing my mind here, but it really looks like a > helpful > little factory). Maybe we don't need it after all if general behaviour changes to auto-add python type name of RVALs. Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger From jholg at gmx.de Mon Aug 27 13:43:10 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 27 Aug 2007 13:43:10 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <46D2B09C.7060207@behnel.de> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> <20070827105300.301110@gmx.net> <46D2B09C.7060207@behnel.de> Message-ID: <20070827114310.301090@gmx.net> Hi, > > I discussed this with Stefan before and I'm anxious to know if this is > the > > way to go (maybe as switchable behaviour), removing the need for a beast > like > > the discussed PT() factory, as well as making type behaviour arguably > more > > "straightforward", at the cost of auto-adding py:pytype attributes: > [_setElementValue implementation that auto-adds type(RVAL).__name__ as > py:pytype] > > I'm +1 for that. > > Actually, you were the one who proposed it in the first place, so there's > nothing to add to. :) Yes, but I admit I was unsure then if this muddies the API by making >>> root = objectify.Element("root") >>> root.x = "3" behave differently from >>> root = objectify.fromstring("""3""") Kind of losing sort of a symmetry. But then again, we actually *do* have more information in the first case, namely the python type, so we should use it. Now I think that practicality beats purity here. > > By making it switchable we could cater for those who don't > > care about the types that much but who do not want to see any > non-explicitly > > created attributes. > > I dislike the idea of adding a switch here. We already add pytype > attributes > in a couple of places, so people who do not like it will have to > deannotate() > their XML anyway (or not use objectify...). Right, there's also TREE attributes and stuff. > I think that always adding a pytype will give us more predictable > behaviour. > On the other hand, we could just check if the pytype the type inference > mechanism returns is the type of the value, and only add the attribute if > that > is not the case. What do you think? It would not work if you exchange > annotated data with other machines that use different setups, but if you > do > that, you'd probably annotate everything by hand anyway. I'd rather always add the pytype, then. I just think this is simpler. And if you want to exchange data with other machines, better xsiannotate() to fall back to XML standard types, or deannotate() and rely on type inference. Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger From stefan_ml at behnel.de Mon Aug 27 19:36:14 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 27 Aug 2007 19:36:14 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <20070827114310.301090@gmx.net> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> <20070827105300.301110@gmx.net> <46D2B09C.7060207@behnel.de> <20070827114310.301090@gmx.net> Message-ID: <46D30B8E.3050407@behnel.de> jholg at gmx.de wrote: > I admit I was unsure then if this muddies the API by making > >>>> root = objectify.Element("root") >>>> root.x = "3" > behave differently from >>>> root = objectify.fromstring("""3""") > > Kind of losing sort of a symmetry. I can't see much of a symmetry there anyway. I'm more concerned about putting in "3" and getting back the number 3, than putting in "3" and getting back a number. The latter sounds natural to me. >> I think that always adding a pytype will give us more predictable >> behaviour. >> On the other hand, we could just check if the pytype the type inference >> mechanism returns is the type of the value, and only add the attribute if >> that >> is not the case. What do you think? It would not work if you exchange >> annotated data with other machines that use different setups, but if you >> do >> that, you'd probably annotate everything by hand anyway. > > I'd rather always add the pytype, then. I just think this is simpler. And > if you want to exchange data with other machines, better xsiannotate() to > fall back to XML standard types, or deannotate() and rely on type inference. Sure. So be it. :) (... for lxml 2.0, that is) Stefan From stefan_ml at behnel.de Mon Aug 27 19:49:38 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 27 Aug 2007 19:49:38 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <20070827114310.301090@gmx.net> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> <20070827105300.301110@gmx.net> <46D2B09C.7060207@behnel.de> <20070827114310.301090@gmx.net> Message-ID: <46D30EB2.6060100@behnel.de> jholg at gmx.de wrote: >>>> root = objectify.Element("root") >>>> root.x = "3" > behave differently from >>>> root = objectify.fromstring("""3""") > > Kind of losing sort of a symmetry. What bothers me more (and where I do see a symmetry) is: >>> root = objectify.fromstring("true") >>> # now >>> root.flag True >>> root.flag = "true" >>> root.flag True >>> # then >>> root.flag True >>> root.flag = "true" >>> root.flag 'true' I'm not sure what to think about that. It would be wrong to special case it, but it kinda feels wrong the way it would work in the future... Stefan From jholg at gmx.de Mon Aug 27 23:03:00 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 27 Aug 2007 23:03:00 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <46D30EB2.6060100@behnel.de> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> <20070827105300.301110@gmx.net> <46D2B09C.7060207@behnel.de> <20070827114310.301090@gmx.net> <46D30EB2.6060100@behnel.de> Message-ID: <20070827210300.321980@gmx.net> Hi Stefan, > jholg at gmx.de wrote: > >>>> root = objectify.Element("root") > >>>> root.x = "3" > > behave differently from > >>>> root = objectify.fromstring("""3""") > > > > Kind of losing sort of a symmetry. > > What bothers me more (and where I do see a symmetry) is: > > >>> root = objectify.fromstring("true") > > >>> # now > >>> root.flag > True > >>> root.flag = "true" > >>> root.flag > True > > >>> # then > >>> root.flag > True > >>> root.flag = "true" > >>> root.flag > 'true' > > I'm not sure what to think about that. It would be wrong to special case > it, > but it kinda feels wrong the way it would work in the future... Hm, not for me (any more :). I think this is just the same case as having a literal 3 in the XML document. When parsing XML from a string or a file with no type information whatsoever, there is really only 2 things we can do: 1. Make strings of everything. 2. Use type-inference provided by the lookup mechanisms. (1) does not make much sense as we would not really need objectify at all (except for the syntactic sugar of its __setattr__-API). On the other hand, when setting elements by hand, i.e. in Python code, we well know the (python-)type information: For me, it begins to rather feel more natural to do: >>> # then >>> root.flag = True # real live python boolean object >>> root.flag True >>> root.flag.text "true" instead of >>> # now >>> root.flag = "true" >>> root.flag True which is, in the end, pretty much the same as >>> # now >>> root.three = "3" >>> root.three 3 So, let's go for the auto-pytype-addition in _setElementValue, without special-casing, imo. Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger From stefan_ml at behnel.de Tue Aug 28 09:16:57 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 28 Aug 2007 09:16:57 +0200 Subject: [lxml-dev] should _setElementValue add type attributes? In-Reply-To: <20070827210300.321980@gmx.net> References: <46AEE8CD.1030002@behnel.de> <20070731135020.172290@gmx.net> <46BDEDFC.3000207@behnel.de> <20070827105300.301110@gmx.net> <46D2B09C.7060207@behnel.de> <20070827114310.301090@gmx.net> <46D30EB2.6060100@behnel.de> <20070827210300.321980@gmx.net> Message-ID: <46D3CBE9.2080109@behnel.de> Hi Holger, jholg at gmx.de wrote: > For me, it begins to rather feel more natural to do: >>>> # then >>>> root.flag = True # real live python boolean object >>>> root.flag > True >>>> root.flag.text > "true" > > instead of >>>> # now >>>> root.flag = "true" >>>> root.flag > True > > which is, in the end, pretty much the same as >>>> # now >>>> root.three = "3" >>>> root.three > 3 > > So, let's go for the auto-pytype-addition in _setElementValue, without special-casing, imo. Fine, no special casing here. One more thing, though: we shouldn't store Python type hints that were not registered as their instantiation wouldn't work anyway. So I added a lookup before the attribute setter call. So, the new rules are: - what you put in comes back out (as long as the type is registered) - for non-annotated XML data, type inference is used to determine the return type (which may be ambiguous in some cases). Simple enough, I'd say. Stefan From mantegazza at ill.fr Thu Aug 30 11:18:18 2007 From: mantegazza at ill.fr (=?iso-8859-15?q?Fr=E9d=E9ric_Mantegazza?=) Date: Thu, 30 Aug 2007 11:18:18 +0200 Subject: [lxml-dev] ERESTARTNOHAND error Message-ID: <200708301118.19261.mantegazza@ill.fr> Hello, Some months ago, I migrated some part of our code from libxml2/libxslt modules to lxml module (which is much pythonic!). But my code never worked, because of a seg. fault :o( The crash occurs at: result = self.__xslFilter.apply(tree, {'filename':"'%s'" % baseName}) This code is used inside a Pyro server (Pyro is a client/server framework. see http://pyro.sourceforge.net), so in a multi-threaded environement. I ran the server using strace to see what happens: I got a ERESTARTNOHAND error. This error is very strange, and Google does not give usefull informations... I tested my code out of Pyro: it works. I made a simple Pyro serveur with just this part of the code: it works too. It only crashes when using the real server, which uses a lot of multi-threads calls. Another guy had a similar problem, but the error was in the python mysql client wrapper, only on stressed usage. It seems that there is a problem in either lxml and pymysql when used in a stressed multi-threaded way. I use locks in my server to avoid re-entrant calls to lxml, so I'm sure the problem is not there; I feel that it is related to the number of threads running... Does anyone has an idea of that problem? Thanks, PS: I also use lxml from mod_python, and all is working fine. -- Fr?d?ric From stefan_ml at behnel.de Thu Aug 30 12:28:12 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Aug 2007 12:28:12 +0200 Subject: [lxml-dev] lxml 1.3.4 released Message-ID: <46D69BBC.4070701@behnel.de> Hi all, I just released lxml 1.3.4 to PyPI. It has a minor bug fix and a few compatibility enhancements both backwards and forwards. Changelog follows. One of the changes is that the exception that 1.3.3 raised for tag names containing a ':' was changed into a warning. lxml 2.0 will be strict here. Have fun, Stefan 1.3.4 (2007-08-30) Features added * The ElementMaker in lxml.builder now accepts the keyword arguments namespace and nsmap to set a namespace and nsmap for the Elements it creates. * The docinfo on ElementTree objects has new properties internalDTD and externalDTD that return a DTD object for the internal or external subset of the document respectively. * Serialising an ElementTree now includes any internal DTD subsets that are part of the document, as well as comments and PIs that are siblings of the root node. Bugs fixed * Parsing with the no_network option could fail Other changes * lxml now raises a TagNameWarning about tag names containing ':' instead of an Error as 1.3.3 did. The reason is that a number of projects currently misuse the previous lack of tag name validation to generate namespace prefixes without declaring namespaces. Apart from the danger of generating broken XML this way, it also breaks most of the namespace-aware tools in XML, including XPath, XSLT and validation. lxml 1.3.x will continue to support this bug with a Warning, while lxml 2.0 will be strict about well-formed tag names (not only regarding ':'). * Serialising an Element no longer includes its comment and PI siblings (only ElementTree serialisation includes them). From jholg at gmx.de Thu Aug 30 13:44:48 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Thu, 30 Aug 2007 13:44:48 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <46CC80A4.7010607@behnel.de> References: <46C945BF.9020805@behnel.de> <46CC80A4.7010607@behnel.de> Message-ID: <20070830114448.280910@gmx.net> Hi, > Stefan Behnel wrote: > > Regarding the TypedElementMaker, I think that if we write one that is > adapted > > to objectify, we should not stop half-way. We should remove the > "typemap" > > thing and just use the type inference mechanisms that objectify already > > provides. You can take a look into that, if you want, otherwise I will > try to > > come up with an implementation when I find the time (which may be after > the > > release of 2.0alpha1). > > ... or a bit before :) > > This code is what I think might work well for objectify. Any general > comments > before I go any further? > > Stefan > [...] I finally looked at the new ElementMaker implementation, and it works just fine for me. Attached patch adds tests for it (essentially the very same I already had for the initial implementation), plus some small tests that cover DataElement() "none" vs "NoneType" name compatibility measures. However, this ElementMaker does not add type annotation, as the TypedElementMaker I proposed at some point did. Question: Do we want/need a TypedElementMaker? I'd say yes, otherwise the E-Factory isn't very useful for someone who wants "strong typing". Also, I think it might make sense to have a PT() factory after all, just to add the possibility to hand in attributes: >>> root.x = 3 >>> root.x.set("foo", "bar") >>> # shortcut >>> root.x = PT(3, foo="bar") What do you say? Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger -------------- next part -------------- A non-text attachment was scrubbed... Name: efactory_none_compat_tests.patch Type: application/octet-stream Size: 4738 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070830/0016d779/attachment.obj From stefan_ml at behnel.de Thu Aug 30 14:45:55 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Aug 2007 14:45:55 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <20070830114448.280910@gmx.net> References: <46C945BF.9020805@behnel.de> <46CC80A4.7010607@behnel.de> <20070830114448.280910@gmx.net> Message-ID: <46D6BC03.4050103@behnel.de> Hi Holger, jholg at gmx.de wrote: > I finally looked at the new ElementMaker implementation, and it works just > fine for me. Attached patch adds tests for it (essentially the very same I > already had for the initial implementation), plus some small tests that > cover DataElement() "none" vs "NoneType" name compatibility measures. > > However, this ElementMaker does not add type annotation, as the > TypedElementMaker I proposed at some point did. Question: Do we want/need a > TypedElementMaker? I'd say yes, otherwise the E-Factory isn't very useful > for someone who wants "strong typing". Given the current behaviour of _setElementValue(), I'd say it should just go and annotate everything it produces. > Also, I think it might make sense to have a PT() factory after all, just to > add the possibility to hand in attributes: > > >>> root.x = 3 > >>> root.x.set("foo", "bar") > >>> # shortcut > >>> root.x = PT(3, foo="bar") DataElement can add attributes for you. Stefan From sidnei at enfoldsystems.com Thu Aug 30 14:53:40 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Thu, 30 Aug 2007 09:53:40 -0300 Subject: [lxml-dev] lxml 1.3.4 released In-Reply-To: <46D69BBC.4070701@behnel.de> References: <46D69BBC.4070701@behnel.de> Message-ID: Hi Stefan, Just in case you didn't notice yet, the download link for 1.3.4 on http://codespeak.net/lxml is broken. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From sidnei at enfoldsystems.com Thu Aug 30 15:10:04 2007 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Thu, 30 Aug 2007 10:10:04 -0300 Subject: [lxml-dev] lxml 1.3.4 released In-Reply-To: <46D69BBC.4070701@behnel.de> References: <46D69BBC.4070701@behnel.de> Message-ID: Windows binaries for Python 2.4 and 2.5 are now available. sitting-here-at-a-PyConBrasil-talk-ly, -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 From jholg at gmx.de Thu Aug 30 16:27:30 2007 From: jholg at gmx.de (jholg at gmx.de) Date: Thu, 30 Aug 2007 16:27:30 +0200 Subject: [lxml-dev] objectify factories In-Reply-To: <46D6BC03.4050103@behnel.de> References: <46C945BF.9020805@behnel.de> <46CC80A4.7010607@behnel.de> <20070830114448.280910@gmx.net> <46D6BC03.4050103@behnel.de> Message-ID: <20070830142730.280890@gmx.net> > Given the current behaviour of _setElementValue(), I'd say it should just > go > and annotate everything it produces. Meaning an additional TypedElementMaker, right? I think it is actually nice to have the not-annotating ElementMaker as a choice. > > Also, I think it might make sense to have a PT() factory after all, just > to > > add the possibility to hand in attributes: > > > > >>> root.x = 3 > > >>> root.x.set("foo", "bar") > > >>> # shortcut > > >>> root.x = PT(3, foo="bar") > > DataElement can add attributes for you. Yes it does but not the same way. DataElement() always uses the literal of the RVAL to infer the type, if not explicitly given, i.e. it does not make use of the python type of an RVAL. PT() otoh does work just like the new __setattr__/_setElementValue. Maybe modify DataElement() instead of introducing PT(), then? Holger -- Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger From stefan_ml at behnel.de Thu Aug 30 21:33:29 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Aug 2007 21:33:29 +0200 Subject: [lxml-dev] ERESTARTNOHAND error In-Reply-To: <200708301118.19261.mantegazza@ill.fr> References: <200708301118.19261.mantegazza@ill.fr> Message-ID: <46D71B89.1050401@behnel.de> Salut! Fr?d?ric Mantegazza wrote: > Some months ago, I migrated some part of our code from libxml2/libxslt > modules to lxml module (which is much pythonic!). But my code never worked, > because of a seg. fault :o( > > The crash occurs at: > > result = self.__xslFilter.apply(tree, {'filename':"'%s'" % baseName}) > > This code is used inside a Pyro server (Pyro is a client/server framework. > see http://pyro.sourceforge.net), so in a multi-threaded environement. > > I ran the server using strace to see what happens: I got a ERESTARTNOHAND > error. This error is very strange, and Google does not give usefull > informations... > > I tested my code out of Pyro: it works. I made a simple Pyro serveur with > just this part of the code: it works too. It only crashes when using the > real server, which uses a lot of multi-threads calls. > > Another guy had a similar problem, but the error was in the python mysql > client wrapper, only on stressed usage. > > It seems that there is a problem in either lxml and pymysql when used in a > stressed multi-threaded way. I use locks in my server to avoid re-entrant > calls to lxml, so I'm sure the problem is not there; I feel that it is > related to the number of threads running... > > Does anyone has an idea of that problem? Which version of lxml are you using? Could you try with 1.3.4? You can also build lxml from sources and pass the "--without-threading" option, which might work for you. Stefan From gregwillden at gmail.com Fri Aug 31 06:09:52 2007 From: gregwillden at gmail.com (Greg Willden) Date: Thu, 30 Aug 2007 23:09:52 -0500 Subject: [lxml-dev] _elementpath.pyc and py2exe problem Message-ID: <903323ff0708302109r9d29a57s85b4911856f95346@mail.gmail.com> Hi All, This is probably a question for py2exe developers but I'll ask it here too. I am trying to build a win32 .exe file with py2exe and it appears that py2exe cannot locate _elementpath.pyc correctly. The file does not get copied into the library.zip file. I can manually (or programatically with python's zipfile module) add _elementpath.pyc to the zip file and then the application works fine. Has anyone experienced this? Any ideas why? Thanks Greg -- Linux. Because rebooting is for adding hardware. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20070830/70b0d861/attachment-0001.htm From stefan_ml at behnel.de Fri Aug 31 09:42:57 2007 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 31 Aug 2007 09:42:57 +0200 Subject: [lxml-dev] _elementpath.pyc and py2exe problem In-Reply-To: <903323ff0708302109r9d29a57s85b4911856f95346@mail.gmail.com> References: <903323ff0708302109r9d29a57s85b4911856f95346@mail.gmail.com> Message-ID: <46D7C681.6070706@behnel.de> Hi, Greg Willden wrote: > This is probably a question for py2exe developers but I'll ask it here too. > > I am trying to build a win32 .exe file with py2exe and it appears that > py2exe cannot locate _elementpath.