From herve.cauwelier at free.fr Wed Sep 2 18:05:19 2009 From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=) Date: Wed, 02 Sep 2009 18:05:19 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap Message-ID: <4A9E97BF.2070009@free.fr> Hi, I have another unexpected behaviour. I have a root element with many namespaces declared. When I introspect the first child of the root, the nsmap is full. When I "deepcopy" it, only the the namespace used in the tag name is kept. If the name is not qualified, nsmap turns empty. (Sorry if I don't use the right XML semantic.) I tried to reaffect nsmap but the attribute is read-only. How could I preserve the nsmap attribute of unattached elements? Regards, Herv? -- lxml.etree: (2, 2, 2, 0) libxml used: (2, 7, 3) libxml compiled: (2, 7, 3) libxslt used: (1, 1, 24) libxslt compiled: (1, 1, 24) From howesteve at gmail.com Wed Sep 2 21:19:13 2009 From: howesteve at gmail.com (Steve Howe) Date: Wed, 2 Sep 2009 16:19:13 -0300 Subject: [lxml-dev] Python 3.1 binary on Windows? In-Reply-To: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> Message-ID: <200909021619.13985.howesteve@gmail.com> Hello all, > Hi, > > maybe it's just me being stupid for overlooking something, but are there > Windows binaries built for Python 3.1 out there? Shouldn't this message be on Python-specific list ? Or at *least* marked as "OFF-TOPIC" ? That said, you can choose between the official download on the Python page: http://www.python.org/download/ ... or ActivePython's distribution: http://www.activestate.com/activepython/python3/ -- Best Regards, Steve Howe From stefan_ml at behnel.de Wed Sep 2 21:32:33 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Sep 2009 21:32:33 +0200 Subject: [lxml-dev] Python 3.1 binary on Windows? In-Reply-To: <200909021619.13985.howesteve@gmail.com> References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> <200909021619.13985.howesteve@gmail.com> Message-ID: <4A9EC851.4090104@behnel.de> Steve Howe wrote: >> maybe it's just me being stupid for overlooking something, but are there >> Windows binaries built for Python 3.1 out there? > Shouldn't this message be on Python-specific list ? Or at *least* marked as > "OFF-TOPIC" ? I would guess from the context of this mailing list that the OP meant binary builds of lxml for that platform. I think that would be a question for Sidnei. Stefan From stefan_ml at behnel.de Wed Sep 2 21:39:28 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Sep 2009 21:39:28 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4A9E97BF.2070009@free.fr> References: <4A9E97BF.2070009@free.fr> Message-ID: <4A9EC9F0.4030504@behnel.de> Herv? Cauwelier wrote: > Hi, I have another unexpected behaviour. > > I have a root element with many namespaces declared. When I introspect > the first child of the root, the nsmap is full. When I "deepcopy" it, > only the the namespace used in the tag name is kept. If the name is not > qualified, nsmap turns empty. (Sorry if I don't use the right XML semantic.) > > I tried to reaffect nsmap but the attribute is read-only. > > How could I preserve the nsmap attribute of unattached elements? My guess is that the namespaces end up being declared either on the new document (libxml2) node, or at the element where they are first used in the copied tree. If the latter, there's not that much that lxml can do about it (I guess). If the first, we might be able to fake something by adding the document wide namespace declarations to the .nsmap of the root element. That might be a nice feature anyway. Could you check if the namespaces end up in .nsmap dicts at deeper tree levels or not? (a short example script would be helpful, BTW) Stefan From stefan_ml at behnel.de Wed Sep 2 21:40:14 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Sep 2009 21:40:14 +0200 Subject: [lxml-dev] getname() method on 'smart' attribute string values In-Reply-To: <4A932658.9000408@gmail.com> References: <4A932658.9000408@gmail.com> Message-ID: <4A9ECA1E.5000004@behnel.de> Nicholas Dudfield wrote: >> You can give it a try on the trunk, if you like. >> >> https://codespeak.net/viewvc/?view=rev&revision=67010 >> >> http://codespeak.net/lxml/build.html >> >> Stefan >> > I also have need for this functionality and also a bugfix from a > revision ahead of the stable version 2.2.2 available for windows. > > I heard libxml2/lxml is a PITA to build on windows so being > inexperienced I'll not bother attempting it before ruling out alternatives. > > Is there a build bot with dist zips of the latest revisions available > anywhere for windows ? Not that I know of. Stefan From stefan_ml at behnel.de Wed Sep 2 21:45:29 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Sep 2009 21:45:29 +0200 Subject: [lxml-dev] current trunk includes static build for libiconv In-Reply-To: <4A912D83.9080407@urheberrecht.org> References: <4A85C9B8.1070809@behnel.de> <4A912D83.9080407@urheberrecht.org> Message-ID: <4A9ECB59.9010802@behnel.de> Pascal Oberndoerfer wrote: > I copied the new 'buildlibxml.py' into a clean lxml-2.2.2 directory and > started a build with '-- static-deps'. Everything seems to work fine > (libiconv, libxml2, and libxslt build nicely) except form some minor > errors like: > > - 'make[3]: [install-data-local] Error 71 (ignored)' > - 'make[2]: [xsltproc.html] Error 4 (ignored)' > > Unfortunately -- after installing -- I get this ImportError on doing > 'import lxml.etree': > >> ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.5/ >> lib/python2.5/site-packages/lxml-2.2.2-py2.5-macosx-10.3-ppc.egg/lxml/ >> etree.so, 2): Symbol not found: _libiconv_close >> >> Referenced from: /Library/Frameworks/Python.framework/Versions/2.5/ >> lib/python2.5/site-packages/lxml-2.2.2-py2.5-macosx-10.3-ppc.egg/lxml/ >> etree.so >> >> Expected in: dynamic lookup Could you post the command line options of the calls to gcc that distutils print during the build? > As the problem is AFAICT only related to the PPC platform > (and if running MacOS X 10.4.x?), would it make sense to > build libiconv statically only > > 'if platform.processor() == 'powerpc' and major_version == 8:'? > > This could possibly help avoid any side effects on Intel or 10.5 > systems. Just a thought... I doubt that there are any side effects. By now, I actually think it's better to provide a really static build with all external dependencies, rather than relying on things being set up by the system itself. If you request a static build, you get one, that's all. Stefan From stefan_ml at behnel.de Wed Sep 2 21:46:58 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Sep 2009 21:46:58 +0200 Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X In-Reply-To: <7911b3bb0908130516x21c02613xeaecf2a79b064f36@mail.gmail.com> References: <4A8197DE.70800@simplistix.co.uk> <4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de> <4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de> <7911b3bb0908130516x21c02613xeaecf2a79b064f36@mail.gmail.com> Message-ID: <4A9ECBB2.7090702@behnel.de> Gael Pasgrimaud wrote: > On Wed, Aug 12, 2009 at 6:01 PM, Stefan Behnel wrote: >> we have binaries for 10.5: >> >> http://pypi.python.org/pypi/lxml/2.2.2 > > I still dont understand why my OSX 10.5 always want to compile lxml. To be honest, I have no idea. But that's definitely a distutils/setuptools/easyinstall thing, not an lxml problem. Stefan From sidnei at enfoldsystems.com Wed Sep 2 21:49:22 2009 From: sidnei at enfoldsystems.com (Sidnei da Silva) Date: Wed, 2 Sep 2009 16:49:22 -0300 Subject: [lxml-dev] Python 3.1 binary on Windows? In-Reply-To: <4A9EC851.4090104@behnel.de> References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> <200909021619.13985.howesteve@gmail.com> <4A9EC851.4090104@behnel.de> Message-ID: On Wed, Sep 2, 2009 at 4:32 PM, Stefan Behnel wrote: > > Steve Howe wrote: >>> maybe it's just me being stupid for overlooking something, but are there >>> Windows binaries built for Python 3.1 out there? >> Shouldn't this message be on Python-specific list ? Or at *least* marked as >> "OFF-TOPIC" ? > > I would guess from the context of this mailing list that the OP meant > binary builds of lxml for that platform. > > I think that would be a question for Sidnei. Uhm. Yeah. I thought I had those but I have only 3.0. When is the next release of lxml due? If it will take a while I can upload a 3.1 build of lxml 2.2.2. -- Sidnei From ted at milo.com Wed Sep 2 21:27:07 2009 From: ted at milo.com (Ted Dziuba) Date: Wed, 2 Sep 2009 12:27:07 -0700 Subject: [lxml-dev] Python 3.1 binary on Windows? In-Reply-To: <200909021619.13985.howesteve@gmail.com> References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> <200909021619.13985.howesteve@gmail.com> Message-ID: <6451ccbf0909021227g7aa8bcfeoa265bb2bfb8dfc23@mail.gmail.com> I believe he's looking for pre-build lxml binaries that target Python 3.1 on Windows. On Wed, Sep 2, 2009 at 12:19 PM, Steve Howe wrote: > Hello all, > > Hi, > > > > maybe it's just me being stupid for overlooking something, but are there > > Windows binaries built for Python 3.1 out there? > Shouldn't this message be on Python-specific list ? Or at *least* marked as > "OFF-TOPIC" ? > > That said, you can choose between the official download on the Python page: > > http://www.python.org/download/ > > ... or ActivePython's distribution: > > http://www.activestate.com/activepython/python3/ > > -- > Best Regards, > Steve Howe > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > -- Ted Dziuba Co-Founder and Engineer Milo.com, Inc. 165 University Avenue Palo Alto, CA, 94301 http://milo.com Cell: (609)-665-2639 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090902/09f121d0/attachment.htm From stefan_ml at behnel.de Wed Sep 2 21:58:40 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Sep 2009 21:58:40 +0200 Subject: [lxml-dev] Python 3.1 binary on Windows? In-Reply-To: References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> <200909021619.13985.howesteve@gmail.com> <4A9EC851.4090104@behnel.de> Message-ID: <4A9ECE70.1000003@behnel.de> Sidnei da Silva wrote: > On Wed, Sep 2, 2009 at 4:32 PM, Stefan Behnel wrote: >> Steve Howe wrote: >>>> maybe it's just me being stupid for overlooking something, but are there >>>> Windows binaries built for Python 3.1 out there? >>> Shouldn't this message be on Python-specific list ? Or at *least* marked as >>> "OFF-TOPIC" ? >> I would guess from the context of this mailing list that the OP meant >> binary builds of lxml for that platform. >> >> I think that would be a question for Sidnei. > > Uhm. Yeah. I thought I had those but I have only 3.0. When is the next > release of lxml due? I'd like to get a 2.2.3 ready soon and a 2.3 at about the same time, but there's no deadline. > If it will take a while I can upload a 3.1 build of lxml 2.2.2. Please do, that will certainly help some users. The main problem on 3.x is still the lack of available external (binary) packages. Stefan From herve.cauwelier at free.fr Thu Sep 3 10:56:28 2009 From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=) Date: Thu, 03 Sep 2009 10:56:28 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4A9EC9F0.4030504@behnel.de> References: <4A9E97BF.2070009@free.fr> <4A9EC9F0.4030504@behnel.de> Message-ID: <4A9F84BC.3060902@free.fr> Stefan Behnel a ?crit : > Could you check if the namespaces end up in .nsmap dicts at deeper tree > levels or not? (a short example script would be helpful, BTW) Here is a console session: http://bpaste.net/show/34/ I guess the good hypothesis is the latter one. Couldn't you copy the nsmap along with the other properties, in the __copy__ method? Herv? From stefan_ml at behnel.de Thu Sep 3 11:46:29 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 03 Sep 2009 11:46:29 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4A9F84BC.3060902@free.fr> References: <4A9E97BF.2070009@free.fr> <4A9EC9F0.4030504@behnel.de> <4A9F84BC.3060902@free.fr> Message-ID: <4A9F9075.8010307@behnel.de> Herv? Cauwelier wrote: > Stefan Behnel a ?crit : >> Could you check if the namespaces end up in .nsmap dicts at deeper tree >> levels or not? (a short example script would be helpful, BTW) > > Here is a console session: > http://bpaste.net/show/34/ Copying this here: >>> from lxml import etree >>> root = etree.fromstring(''' ... ''') >>> root.nsmap {'a': 'a', 'c': 'c', 'b': 'b'} >>> four = root.xpath('descendant::four')[0] >>> four.nsmap {'a': 'a', 'c': 'c', 'b': 'b'} >>> from copy import deepcopy >>> fourprime = deepcopy(four) >>> fourprime.nsmap {} That's absolutely correct behaviour. The copied subtree does not use any namespaces, so there is no reason why it should have any namespace declarations on it. > Couldn't you copy the nsmap along with the other properties, in the > __copy__ method? Why? Stefan From herve.cauwelier at free.fr Thu Sep 3 12:35:49 2009 From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=) Date: Thu, 03 Sep 2009 12:35:49 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4A9F9075.8010307@behnel.de> References: <4A9E97BF.2070009@free.fr> <4A9EC9F0.4030504@behnel.de> <4A9F84BC.3060902@free.fr> <4A9F9075.8010307@behnel.de> Message-ID: <4A9F9C05.6080200@free.fr> Stefan Behnel a ?crit : > That's absolutely correct behaviour. The copied subtree does not use any > namespaces, so there is no reason why it should have any namespace > declarations on it. > > >> Couldn't you copy the nsmap along with the other properties, in the >> __copy__ method? > > Why? Given the element "fourprime" from the previous example, when I add an attribute qualified with the namespace "b", lxml will generate a new prefix like "ns0". But I need to follow the same prefix convention for comparison in unit tests against the expected result. On a side note, I verified that when adding my copy to the tree, identical namespaces are merged, not repeated. I know you'll tell me prefixes are just sugar and only the URI matters. But when working on OpenDocument files, the prefixes are well known and repeated in the specification. And I'm trying to generate OD objects that match the examples. Herv? From philipp.reichmuth+gmane at gmail.com Thu Sep 3 21:55:47 2009 From: philipp.reichmuth+gmane at gmail.com (Philipp Reichmuth) Date: Thu, 3 Sep 2009 21:55:47 +0200 Subject: [lxml-dev] Python 3.1 binary on Windows? References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> <200909021619.13985.howesteve@gmail.com> <4A9EC851.4090104@behnel.de> Message-ID: <1imaw9ksp0t13.oyg3hy1wkp5j.dlg@40tude.net> Am Wed, 2 Sep 2009 16:49:22 -0300 schrieb Sidnei da Silva: >> I would guess from the context of this mailing list that the OP meant >> binary builds of lxml for that platform. >> >> I think that would be a question for Sidnei. > > Uhm. Yeah. I thought I had those but I have only 3.0. When is the next > release of lxml due? If it will take a while I can upload a 3.1 build > of lxml 2.2.2. Thanks. This was indeed exactly what I meant. So I'll be waiting, and if in the meantime you do find the time to upload a 3.1 build, it would be much appreciated. Philipp From yaniv at aknin.name Mon Sep 7 08:21:47 2009 From: yaniv at aknin.name (Yaniv Aknin) Date: Mon, 7 Sep 2009 09:21:47 +0300 Subject: [lxml-dev] etree.XMLParser: two possible bugs? Message-ID: Hi, I'm far from being an xml/libxml/lxml guru, so I feel a bit unconfident reporting these two 'peculiarities' I have with lxml. However, I've managed to reduce things to simple enough code that I don't know how to explain in any terms other than lxml/libxml bugs, and I'd appreciate it if the list could confirm my findings before I report them as bugs. Attached is a short piece of my code, called lxml_test.py, defining a class called TerminationDetectingXMLParser. As the name suggests, this is a stream XML parser that 'knows' whether or not it has seen the end of the document it is being fed with. I have two peculiarities with this code: 1. While the code works for: >>> td.feed('') >>> td.feed('') >>> td.feed('') ...it doesn't work for: >>> td.feed('') It seems to me that etree.XMLParser doesn't call end() on the treebuilder in the latter case, which, humbly, I think is a bug; etree.XMLParser has everything it needs to call end() on the target. 2. The second peculiarity is a bit weirder. Much to my dismay we have two Python environments built and maintained by different teams. Both environments are 2.6.2 and lxml 2.2.2, though one uses slightly different libxml/libxslt versions. In one environment, peculiarity (1) described above exists, but the code runs well. In the other, an AssertionError is raised from saxparser.pxi during lxml.etree.TreeBuilder.close. This is despite the XML input being really really trivial. I have attached two ASCII tty-screehshots of the two interpreters running against the same code, one with the assertion error, the other showing just peculiarity (1). Am I using the library incorrectly? Are these libxml issues? Should I open bugs for these? What workaround should I use? Thanks in advance, - Yaniv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090907/314dd166/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: screenshot-interpreter-two Type: application/octet-stream Size: 887 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090907/314dd166/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: screenshot-interpreter-one Type: application/octet-stream Size: 410 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090907/314dd166/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: lxml_test.py Type: application/octet-stream Size: 1561 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20090907/314dd166/attachment-0002.obj From stefan_ml at behnel.de Mon Sep 7 08:52:01 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 Sep 2009 08:52:01 +0200 Subject: [lxml-dev] etree.XMLParser: two possible bugs? In-Reply-To: References: Message-ID: <4AA4AD91.7050501@behnel.de> Hi, Yaniv Aknin wrote: > I'm far from being an xml/libxml/lxml guru, so I feel a bit unconfident > reporting these two 'peculiarities' I have with lxml. Thanks for the report. > Attached is a short piece of my code, called lxml_test.py, defining a class > called TerminationDetectingXMLParser. As the name suggests, this is a stream > XML parser that 'knows' whether or not it has seen the end of the document > it is being fed with. Without having looked at the code, I assume you mean "when the root tag has been closed in the input data", right? > I have two peculiarities with this code: > 1. While the code works for: > >>> td.feed('') > >>> td.feed('') > >>> td.feed('') > ...it doesn't work for: > >>> td.feed('') > It seems to me that etree.XMLParser doesn't call end() on the treebuilder in > the latter case, which, humbly, I think is a bug; etree.XMLParser has > everything it needs to call end() on the target. It may or may not. In any case, it doesn't guarantee that the last closing .end() gets called on the parser target before you call .close() on the feed parser. You can't rely on that, especially since the behaviour is influenced by both lxml and libxml2. Any reason you can't call .close()? > 2. The second peculiarity is a bit weirder. Much to my dismay we have two > Python environments built and maintained by different teams. Both > environments are 2.6.2 and lxml 2.2.2, though one uses slightly different > libxml/libxslt versions. In one environment, peculiarity (1) described above > exists, but the code runs well. In the other, an AssertionError is raised > from saxparser.pxi during lxml.etree.TreeBuilder.close. This is despite the > XML input being really really trivial. You didn't say which libxml2 versions you are using. There are known bugs in several libxml2 releases, so different libxml2 versions may show different behaviour. > I have attached two ASCII tty-screehshots of the two interpreters running > against the same code, one with the assertion error, the other showing just > peculiarity (1). You sent them without filename extension, so your mail program failed to attach MIME types and mine can't show them. I'll look at them as soon as I get to it. > What workaround should I use? Depends on what you are trying to achieve. Stefan From yaniv at aknin.name Mon Sep 7 10:41:11 2009 From: yaniv at aknin.name (Yaniv Aknin) Date: Mon, 7 Sep 2009 11:41:11 +0300 Subject: [lxml-dev] etree.XMLParser: two possible bugs? In-Reply-To: <4AA4AD91.7050501@behnel.de> References: <4AA4AD91.7050501@behnel.de> Message-ID: On Mon, Sep 7, 2009 at 9:52 AM, Stefan Behnel wrote: > Hi, > > Yaniv Aknin wrote: > > I'm far from being an xml/libxml/lxml guru, so I feel a bit unconfident > > reporting these two 'peculiarities' I have with lxml. > > Thanks for the report. > > > Attached is a short piece of my code, called lxml_test.py, defining a > class > > called TerminationDetectingXMLParser. As the name suggests, this is a > stream > > XML parser that 'knows' whether or not it has seen the end of the > document > > it is being fed with. > > Without having looked at the code, I assume you mean "when the root tag has > been closed in the input data", right? > > Yes. > > > I have two peculiarities with this code: > > 1. While the code works for: > > >>> td.feed('') > > >>> td.feed('') > > >>> td.feed('') > > ...it doesn't work for: > > >>> td.feed('') > > It seems to me that etree.XMLParser doesn't call end() on the treebuilder > in > > the latter case, which, humbly, I think is a bug; etree.XMLParser has > > everything it needs to call end() on the target. > > It may or may not. In any case, it doesn't guarantee that the last closing > .end() gets called on the parser target before you call .close() on the > feed parser. You can't rely on that, especially since the behaviour is > influenced by both lxml and libxml2. > > Any reason you can't call .close()? > > Ah. I didn't know that. I took this bit of the API: "The parser will parse as much of the XML stream as it can on each call." ( http://codespeak.net/lxml/api/lxml.etree._FeedParser-class.html#feed) to mean that end() will be call if possible. I can not call close() because I myself don't know when the root tag has been closed or not. I'm reading a single stream with one or more XMLs in it, and I need to know when each root tag ends. > > 2. The second peculiarity is a bit weirder. Much to my dismay we have two > > Python environments built and maintained by different teams. Both > > environments are 2.6.2 and lxml 2.2.2, though one uses slightly different > > libxml/libxslt versions. In one environment, peculiarity (1) described > above > > exists, but the code runs well. In the other, an AssertionError is raised > > from saxparser.pxi during lxml.etree.TreeBuilder.close. This is despite > the > > XML input being really really trivial. > > You didn't say which libxml2 versions you are using. There are known bugs > in several libxml2 releases, so different libxml2 versions may show > different behaviour. > > The versions are inside the screenshots I attached (without extensions, I'm sorry). They are as follows: Python interpreter one (no assertion error): lxml.etree: (2, 2, 2, 0) libxml used: (2, 6, 23) libxml compiled: (2, 6, 23) libxslt used: (1, 1, 15) libxslt compiled: (1, 1, 15) Python interpreter two (raises assertion error): lxml.etree: (2, 2, 2, 0) libxml used: (2, 7, 2) libxml compiled: (2, 7, 2) libxslt used: (1, 1, 24) libxslt compiled: (1, 1, 24) The full traceback is in the screenshots. > > I have attached two ASCII tty-screehshots of the two interpreters running > > against the same code, one with the assertion error, the other showing > just > > peculiarity (1). > > You sent them without filename extension, so your mail program failed to > attach MIME types and mine can't show them. I'll look at them as soon as I > get to it. > > Oops. Thanks for the extra effort. > > > What workaround should I use? > > Depends on what you are trying to achieve. > > A feed based parser which knows "by itself" when the root tag has been closed. I think the most convenient interface would be a stream parser that when fed with data beyond the root element, raises an exception with just the extra data (so it could be re-fed into a new parser). Again, many thanks for your help. > Stefan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090907/9db784bc/attachment.htm From stefan_ml at behnel.de Mon Sep 7 10:55:45 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 Sep 2009 10:55:45 +0200 Subject: [lxml-dev] etree.XMLParser: two possible bugs? In-Reply-To: References: <4AA4AD91.7050501@behnel.de> Message-ID: <4AA4CA91.8000009@behnel.de> Yaniv Aknin wrote: > On Mon, Sep 7, 2009 at 9:52 AM, Stefan Behnel wrote: >> Any reason you can't call .close()? >> > Ah. I didn't know that. I took this bit of the API: "The parser will parse > as much of the XML stream as it can on each call." ( > http://codespeak.net/lxml/api/lxml.etree._FeedParser-class.html#feed) to > mean that end() will be call if possible. Hmm, yes, that can be misunderstood, I guess. > I can not call close() because I myself don't know when the root tag has > been closed or not. I'm reading a single stream with one or more XMLs in it, > and I need to know when each root tag ends. Sounds like you lack an application level protocol for your data stream. Can you control the source? > Python interpreter one (no assertion error): > lxml.etree: (2, 2, 2, 0) > libxml used: (2, 6, 23) > libxml compiled: (2, 6, 23) > libxslt used: (1, 1, 15) > libxslt compiled: (1, 1, 15) > Python interpreter two (raises assertion error): > lxml.etree: (2, 2, 2, 0) > libxml used: (2, 7, 2) > libxml compiled: (2, 7, 2) > libxslt used: (1, 1, 24) > libxslt compiled: (1, 1, 24) Those versions can hardly be called "slightly different". 2.6.23 is pretty old, and 2.7.2 contains various changes compared to the 2.6 series. If you want predictable results, you should use the same versions on all machines. Otherwise, you'll end up putting a lot of time into hard to reproduce problems that turn out to be due to different dependency versions. >>> What workaround should I use? >> Depends on what you are trying to achieve. >> >> A feed based parser which knows "by itself" when the root tag has been > closed. I think the most convenient interface would be a stream parser that > when fed with data beyond the root element, raises an exception with just > the extra data (so it could be re-fed into a new parser). I'd certainly invest into a smarter stream protocol here. Stefan From yaniv at aknin.name Mon Sep 7 11:17:59 2009 From: yaniv at aknin.name (Yaniv Aknin) Date: Mon, 7 Sep 2009 12:17:59 +0300 Subject: [lxml-dev] etree.XMLParser: two possible bugs? In-Reply-To: <4AA4CA91.8000009@behnel.de> References: <4AA4AD91.7050501@behnel.de> <4AA4CA91.8000009@behnel.de> Message-ID: On Mon, Sep 7, 2009 at 11:55 AM, Stefan Behnel wrote: > > > Yaniv Aknin wrote: > > On Mon, Sep 7, 2009 at 9:52 AM, Stefan Behnel wrote: > >> Any reason you can't call .close()? > >> > > Ah. I didn't know that. I took this bit of the API: "The parser will > parse > > as much of the XML stream as it can on each call." ( > > http://codespeak.net/lxml/api/lxml.etree._FeedParser-class.html#feed) to > > mean that end() will be call if possible. > > Hmm, yes, that can be misunderstood, I guess. > > > > I can not call close() because I myself don't know when the root tag has > > been closed or not. I'm reading a single stream with one or more XMLs in > it, > > and I need to know when each root tag ends. > > Sounds like you lack an application level protocol for your data stream. > Can you control the source? > > > > Python interpreter one (no assertion error): > > lxml.etree: (2, 2, 2, 0) > > libxml used: (2, 6, 23) > > libxml compiled: (2, 6, 23) > > libxslt used: (1, 1, 15) > > libxslt compiled: (1, 1, 15) > > Python interpreter two (raises assertion error): > > lxml.etree: (2, 2, 2, 0) > > libxml used: (2, 7, 2) > > libxml compiled: (2, 7, 2) > > libxslt used: (1, 1, 24) > > libxslt compiled: (1, 1, 24) > > Those versions can hardly be called "slightly different". 2.6.23 is pretty > old, and 2.7.2 contains various changes compared to the 2.6 series. If you > want predictable results, you should use the same versions on all machines. > Otherwise, you'll end up putting a lot of time into hard to reproduce > problems that turn out to be due to different dependency versions. > > I'm trying to make us all use the same 'interpreter distribution', it's just difficult. It's been hard enough to get everyone to use 2.6.2... But I understand this isn't your problem. Either way, the AssertionError arises when using libxml 2.7.2, which is more recent (and one micro behind the latest, 2.7.3). How can I tell if the AssertionError is an lxml bug or a libxml bug? > > >>> What workaround should I use? > >> Depends on what you are trying to achieve. > >> > >> A feed based parser which knows "by itself" when the root tag has been > > closed. I think the most convenient interface would be a stream parser > that > > when fed with data beyond the root element, raises an exception with just > > the extra data (so it could be re-fed into a new parser). > > I'd certainly invest into a smarter stream protocol here. > > I can not control the source :-/ No alternatives? > Stefan > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090907/2357db32/attachment.htm From stefan_ml at behnel.de Fri Sep 11 09:13:11 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Sep 2009 09:13:11 +0200 Subject: [lxml-dev] etree.XMLParser: two possible bugs? In-Reply-To: References: <4AA4AD91.7050501@behnel.de> <4AA4CA91.8000009@behnel.de> Message-ID: <4AA9F887.7040009@behnel.de> Yaniv Aknin wrote: > I'm trying to make us all use the same 'interpreter distribution', it's just > difficult. It's been hard enough to get everyone to use 2.6.2... Note that you can build lxml statically against its dependencies using "--static-deps". Distributing the result to all developer machines would relieve you from having to care about the installed library versions. Here's a description (for MacOS-X, but also works for other systems): http://codespeak.net/lxml/build.html#building-lxml-on-macos-x > Either way, the AssertionError arises when using libxml 2.7.2, which is more > recent (and one micro behind the latest, 2.7.3). How can I tell if the > AssertionError is an lxml bug or a libxml bug? Looking at the code, my educated guess is that you are continuing to use the same tree builder for a new XML document. That won't work. >>>>> What workaround should I use? >>>> Depends on what you are trying to achieve. >>>> >>>> A feed based parser which knows "by itself" when the root tag has been >>> closed. I think the most convenient interface would be a stream parser >> that >>> when fed with data beyond the root element, raises an exception with just >>> the extra data (so it could be re-fed into a new parser). >> I'd certainly invest into a smarter stream protocol here. >> > I can not control the source :-/ > No alternatives? Not really, no. Sending consecutive XML documents through a stream is a really stupid protocol as it effectively makes the stream non-XML. XML parsers are not made for parsing non-XML, so the receiver will end up having to write a new parser. Regarding your proposal, the lxml side handler cannot easily know what data was fed into the parser, as the parser has already decoded and unescaped the data at that point. So any exception raised from lxml can't reproduce the original data. Unless there are any assumptions about the stream that you can make, or (preferrably) any changes to it, such as sending '\0' bytes between the documents or some other way of separating them, I'd say you'll have a really hard time getting this to work. Stefan From stefan_ml at behnel.de Fri Sep 11 09:32:05 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Sep 2009 09:32:05 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4A9F9C05.6080200@free.fr> References: <4A9E97BF.2070009@free.fr> <4A9EC9F0.4030504@behnel.de> <4A9F84BC.3060902@free.fr> <4A9F9075.8010307@behnel.de> <4A9F9C05.6080200@free.fr> Message-ID: <4AA9FCF5.4080701@behnel.de> Herv? Cauwelier wrote: > But I need to follow the same prefix convention for comparison in unit > tests against the expected result. There is some prefix/formatting agnostic XML comparison code in lxml.doctestcompare. It's meant for doctests, but it might also work in other cases. > On a side note, I verified that when adding my copy to the tree, > identical namespaces are merged, not repeated. > > I know you'll tell me prefixes are just sugar and only the URI matters. > But when working on OpenDocument files, the prefixes are well known and > repeated in the specification. And I'm trying to generate OD objects > that match the examples. One thing you can try is to append the root element to a new element that defines the namespaces, i.e. ns_def_root = etree.Element("ROOT", nsmap=my_nsmap) root = copy.deepcopy(some_other_root) ns_def_root.append(root) print etree.tostring(root, encoding=unicode) ns_def_root.remove(root) I didn't test this, but it should work. Does that help? Stefan From herve.cauwelier at free.fr Fri Sep 11 11:35:36 2009 From: herve.cauwelier at free.fr (=?UTF-8?B?SGVydsOpIENhdXdlbGllcg==?=) Date: Fri, 11 Sep 2009 11:35:36 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4AA9FCF5.4080701@behnel.de> References: <4A9E97BF.2070009@free.fr> <4A9EC9F0.4030504@behnel.de> <4A9F84BC.3060902@free.fr> <4A9F9075.8010307@behnel.de> <4A9F9C05.6080200@free.fr> <4AA9FCF5.4080701@behnel.de> Message-ID: <4AAA19E8.50403@free.fr> Stefan Behnel a ?crit : > One thing you can try is to append the root element to a new element that > defines the namespaces, i.e. > > ns_def_root = etree.Element("ROOT", nsmap=my_nsmap) > > root = copy.deepcopy(some_other_root) > > ns_def_root.append(root) > print etree.tostring(root, encoding=unicode) > ns_def_root.remove(root) > > I didn't test this, but it should work. > > Does that help? Great! I was using a similar technique to parse XML fragments with prefixes, but without the namespace declarations. I didn't thought about applying it to existing elements. I also tested what happens when the "root" of your example is attached to another parent. It will disappear from "ns_def_root", which is logic since an element can reference a single parent. I just wonder if the now useless "ns_def_root" will be properly garbage-collected. Thank you very much, I owe you a beer when you come by Paris! Herv? From stefan_ml at behnel.de Fri Sep 11 11:55:49 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Sep 2009 11:55:49 +0200 Subject: [lxml-dev] copying an element will "reset" is nsmap In-Reply-To: <4AAA19E8.50403@free.fr> References: <4A9E97BF.2070009@free.fr> <4A9EC9F0.4030504@behnel.de> <4A9F84BC.3060902@free.fr> <4A9F9075.8010307@behnel.de> <4A9F9C05.6080200@free.fr> <4AA9FCF5.4080701@behnel.de> <4AAA19E8.50403@free.fr> Message-ID: <4AAA1EA5.3080704@behnel.de> Herv? Cauwelier wrote: > Stefan Behnel a ?crit : >> One thing you can try is to append the root element to a new element that >> defines the namespaces, i.e. >> >> ns_def_root = etree.Element("ROOT", nsmap=my_nsmap) >> >> root = copy.deepcopy(some_other_root) >> >> ns_def_root.append(root) >> print etree.tostring(root, encoding=unicode) >> ns_def_root.remove(root) >> >> I didn't test this, but it should work. > > Great! Happy to hear that. > I also tested what happens when the "root" of your example is attached > to another parent. It will disappear from "ns_def_root", which is logic > since an element can reference a single parent. Yep. > I just wonder if the now useless "ns_def_root" will be properly > garbage-collected. No doubt. This is lxml 2.2, not 1.2. > Thank you very much, I owe you a beer when you come by Paris! Voyons. Le gare de l'est est ? moins de 7 heures en train de chez moi, mais quand m?me... Stefan From stefan_ml at behnel.de Fri Sep 11 12:56:08 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Sep 2009 12:56:08 +0200 Subject: [lxml-dev] [Fwd: [xml] Release of libxml2-2.7.4] Message-ID: <4AAA2CC8.70106@behnel.de> Hi, just wanted to give a note that libxml2 2.7.4 was released, which includes quite a number of bug fixes. Some of those are likely relevant to lxml users, e.g. those for the HTML parser or a couple of XML Schema improvements. I didn't do much testing, but so far, lxml passes its test suite nicely with this release. Kudos to Daniel Veillard for the work he keeps putting into libxml2. Have fun, Stefan -------- Original-Message -------- Subject: [xml] Release of libxml2-2.7.4 Date: Thu, 10 Sep 2009 18:51:18 +0200 From: Daniel Veillard To: xml at gnome.org Better late than never, but an awful lot of pending bug got fixed. Still no major improvement except adding symbol versioning to libxml2 shared libs, which is fairy important for long term maintainance, but not worth jumping to 2.8.0 Tarball and signed rpms available at ftp://xmlsoft.org/libxml2/ There are still a few things which I would have loved to put in the release like per context error handling and the like but I prefer a (nearly) bug fix only release that people can upgrade to without troubles and then work on changing more stuff: Improvements: - Autoregenerate libxml2.syms automated checkings (Daniel Veillard) - Add symbol versioning to libxml2 shared libs (Daniel Veillard) Documentation: - 544910 typo: "renciliateNs" (Leonid Evdokimov) - Add VxWorks to list of OSes (Daniel Veillard) - Regenerate the documentation and update for git (Daniel Veillard) - 560524 ? xmlTextReaderLocalName description (Daniel Veillard) - Added sponsoring by AOE media for the server (Daniel Veillard) - updated URLs for GNOME (Vincent Lefevre) - more warnings about xmlCleanupThreads and xmlCleanupParser (Daniel Veillard) Portability: - 593857 try to work around thread pbm MinGW 4.4 (Daniel Veillard) - 594250 rename ATTRIBUTE_ALLOC_SIZE to avoid clashes (Daniel Veillard) - Fix Windows build * relaxng.c: fix windows build (Rob Richards) - Fix the globals.h to use XMLPUBFUN (Paul Smith) - Problem with extern extern in header (Daniel Veillard) - Add -lnetwork for compiling on Haiku (Scott McCreary) - Runtest portability patch for Solaris (Tim Rice) - Small patch to accomodate the Haiku OS (Scott McCreary) - 584605 package VxWorks folder in the distribution (Daniel Veillard) - 574017 Realloc too expensive on most platform (Daniel Veillard) - Fix windows build (Rob Richards) - 545579 doesn't compile without schema support (Daniel Veillard) - xmllint use xmlGetNodePath when not compiled in (Daniel Veillard) - Try to avoid __imp__xmlFree link trouble on msys (Daniel Veillard) - Allow to select the threading system on Windows (LRN) - Fix Solaris binary links, cleanups (Daniel Veillard) - Bug 571059 ? MSVC doesn't work with the bakefile (Intron) - fix ATTRIBUTE_PRINTF header clash (Belgabor and Mike Hommey) - fixes for Borland/CodeGear/Embarcadero compilers (Eric Zurcher) Bug fixes: - 594514 memory leaks - duplicate initialization (MOD) - Wrong block opening in htmlNodeDumpOutputInternal (Daniel Veillard) - 492317 Fix Relax-NG validation problems (Daniel Veillard) - 558452 fight with reg test and error report (Daniel Veillard) - 558452 RNG compilation of optional multiple child (Daniel Veillard) - 579746 XSD validation not correct / nilable groups (Daniel Veillard) - 502960 provide namespace stack when parsing entity (Daniel Veillard) - 566012 part 2 fix regresion tests and push mode (Daniel Veillard) - 566012 autodetected encoding and encoding conflict (Daniel Veillard) - 584220 xpointer(/) and xinclude problems (Daniel Veillard) - 587663 Incorrect Attribute-Value Normalization (Daniel Veillard) - 444994 HTML chunked failure for attribute with <> (Daniel Veillard) - Fix end of buffer char being split in XML parser (Daniel Veillard) - Non ASCII character may be split at buffer end (Adiel Mittmann) - 440226 Add xmlXIncludeProcessTreeFlagsData API (Stefan Behnel) - 572129 speed up parsing of large HTML text nodes (Markus Kull) - Fix HTML parsing with 0 character in CDATA (Daniel Veillard) - Fix SetGenericErrorFunc and SetStructured clash (Wang Lam) - 566012 Incomplete EBCDIC parsing support (Martin K?gler) - 541335 HTML avoid creating 2 head or 2 body element (Daniel Veillard) - 541237 error correcting missing end tags in HTML (Daniel Veillard) - 583439 missing line numbers in push mode (Daniel Veillard) - 587867 xmllint --html --xmlout serializing as HTML (Daniel Veillard) - 559501 avoid select and use poll for nanohttp (Raphael Prevost) - 559410 - Regexp bug on (...)? constructs (Daniel Veillard) - Fix a small problem on previous HTML parser patch (Daniel Veillard) - 592430 - HTML parser runs into endless loop (Daniel Veillard) - 447899 potential double free in xmlFreeTextReader (Daniel Veillard) - 446613 small validation bug mixed content with NS (Daniel Veillard) - Fix the problem of revalidating a doc with RNG (Daniel Veillard) - Fix xmlKeepBlanksDefault to not break indent (Nick Wellnhofer) - 512131 refs from externalRef part need to be added (Daniel Veillard) - 512131 crash in xmlRelaxNGValidateFullElement (Daniel Veillard) - 588441 allow '.' in HTML Names even if invalid (Daniel Veillard) - 582913 Fix htmlSetMetaEncoding() to be nicer (Daniel Veillard) - 579317 Try to find the HTML encoding information (Daniel Veillard) - 575875 don't output charset=html (Daniel Veillard) - 571271 fix semantic of xsd:all with minOccurs=0 (Daniel Veillard) - 570702 fix a bug in regexp determinism checking (Daniel Veillard) - 567619 xmlValidateNotationUse missing param test (Daniel Veillard) - 574393 ? utf-8 filename magic for compressed files (Hans Breuer) - Fix a couple of problems in the parser (Daniel Veillard) - 585505 ? Document ids and refs populated by XSD (Wayne Jensen) - 582906 XSD validating multiple imports of the same schema (Jason Childs) - Bug 582887 ? problems validating complex schemas (Jason Childs) - Bug 579729 ? fix XSD schemas parsing crash (Miroslav Bajtos) - 576368 ? htmlChunkParser with special attributes (Jiri Netolicky) - Bug 565747 ? relax anyURI data character checking (Vincent Lefevre) - Preserve attributes of include start on tree copy (Petr Pajas) - Skip silently unrecognized XPointer schemes (Jakub Wilk) - Fix leak on SAX1, xmllint --sax1 option and debug (Daniel Veillard) - potential NULL dereference on non-glibc (Jim Meyering) - Fix an XSD validation crash (Daniel Veillard) - Fix a regression in streaming entities support (Daniel Veillard) - Fix a couple of ABI issues with C14N 1.1 (Aleksey Sanin) - Aleksey Sanin support for c14n 1.1 (Aleksey Sanin) - reader bug fix with entities - use options from current parser context for external entities (Rob Richards) - 581612 use %s to printf strings (Christian Persch) - 584605 change the threading initialization sequence (Igor Novoseltsev) - 580705 keep line numbers in HTML parser (Aaron Patterson) - 581803 broken HTML table attributes init (Roland Steiner) - do not set error code in xmlNsWarn (Rob Richards) - 564217 fix structured error handling problems - reuse options from current parser for entities (Rob Richards) - xmlXPathRegisterNs should not allow enpty prefixes (Daniel Veillard) - add a missing check in xmlAddSibling (Kris Breuker) - avoid leaks on errors (Jinmei Tatuya) Cleanups: - Chasing dead assignments reported by clang-scan (Daniel Veillard) - A few more safety cleanup raised by scan (Daniel Veillard) - Fixing assorted potential problems raised by scan (Daniel Veillard) - Potential uninitialized arguments raised by scan (Daniel Veillard) - Fix a bunch of scan 'dead increments' and cleanup (Daniel Veillard) - Remove a pedantic warning (Daniel Veillard) - 555833 always use rm -f in uninstall-local (Daniel Veillard) - 542394 xmlRegisterOutputCallbacks MAX_INPUT_CALLBACK (Daniel Veillard) - Make xmlRecoverDoc const (Martin Trappel) (Daniel Veillard) - Both args of xmlStrcasestr are const (Daniel Veillard) - hide the nbParse* variables used for debugging (Mike Hommey) - 570806 changed include of config.h (William M. Brack) - cleanups and error reports when xmlTextWriterVSprintf fails (Jinmei Tatuya) Thanks everybody who helped by submitting ideas, bug reports or patches ! Enjoy ! Daniel From stefan_ml at behnel.de Fri Sep 11 13:16:26 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Sep 2009 13:16:26 +0200 Subject: [lxml-dev] etree.XMLParser: two possible bugs? In-Reply-To: <4AA9F887.7040009@behnel.de> References: <4AA4AD91.7050501@behnel.de> <4AA4CA91.8000009@behnel.de> <4AA9F887.7040009@behnel.de> Message-ID: <4AAA318A.9030505@behnel.de> ... replying to myself ... Stefan Behnel wrote: >>>>>> What workaround should I use? >>>>> Depends on what you are trying to achieve. >>>>> >>>>> A feed based parser which knows "by itself" when the root tag has been >>>> closed. I think the most convenient interface would be a stream parser >>> that >>>> when fed with data beyond the root element, raises an exception with just >>>> the extra data (so it could be re-fed into a new parser). >>> I'd certainly invest into a smarter stream protocol here. >>> >> I can not control the source :-/ >> No alternatives? > > Not really, no. Sending consecutive XML documents through a stream is a > really stupid protocol as it effectively makes the stream non-XML. XML > parsers are not made for parsing non-XML, so the receiver will end up > having to write a new parser. Thinking about this some more: you may get away with a rather simple solution *if* the stream only contains XML fragments, not complete documents with XML declarations or doctype declarations. What you might try in that case is to actually fix the XML fragments stream into well-formed XML by inserting a fake root element at the beginning. That way, you are always parsing a well-formed XML document (assuming that the incoming XML document fragments are well-formed and do not contain XML declarations or DTD subsets), and you can see the start of a new document when the parser inserts a second element next to the document root you were just parsing. Something like: parser.feed("") while True: doc_root = root_element_wherever_you_get_it_from while len(doc_root) < 2: parser.feed(more_data) doc_fragment_root = doc_root[0] del doc_root[0] handle_fragment(doc_fragment_root) However, as I said, this will fail if the partial documents in the stream contain anything that must not appear inside of the root element of an XML document. (which isn't unlikely if you can't fix the source) Stefan From gael at gawel.org Sat Sep 12 12:36:29 2009 From: gael at gawel.org (Gael Pasgrimaud) Date: Sat, 12 Sep 2009 12:36:29 +0200 Subject: [lxml-dev] problems trying to install lxml 2.2 on Mac OS X In-Reply-To: <4A9ECBB2.7090702@behnel.de> References: <4A8197DE.70800@simplistix.co.uk> <4A82C4AA.4090601@simplistix.co.uk> <4A82D4E2.9030606@behnel.de> <4A82E081.6090707@simplistix.co.uk> <4A82E747.5030708@behnel.de> <7911b3bb0908130516x21c02613xeaecf2a79b064f36@mail.gmail.com> <4A9ECBB2.7090702@behnel.de> Message-ID: <7911b3bb0909120336v41b674ceh2e8805658f8ac80e@mail.gmail.com> On Wed, Sep 2, 2009 at 9:46 PM, Stefan Behnel wrote: > > Gael Pasgrimaud wrote: >> On Wed, Aug 12, 2009 at 6:01 PM, Stefan Behnel wrote: >>> we have binaries for 10.5: >>> >>> http://pypi.python.org/pypi/lxml/2.2.2 >> >> I still dont understand why my OSX 10.5 always want to compile lxml. > > To be honest, I have no idea. But that's definitely a > distutils/setuptools/easyinstall thing, not an lxml problem. > Well, I think it's more OSX related. I've build my own egg on my system and it work fine: (tttt)gawel:~/tmp/tttt% easy_install http://release.ingeniweb.com/third-party-dist/lxml-2.2.2-py2.4-macosx-10.3-fat.egg Downloading http://release.ingeniweb.com/third-party-dist/lxml-2.2.2-py2.4-macosx-10.3-fat.egg Processing lxml-2.2.2-py2.4-macosx-10.3-fat.egg creating /Users/gawel/tmp/tttt/lib/python2.4/site-packages/lxml-2.2.2-py2.4-macosx-10.3-fat.egg Extracting lxml-2.2.2-py2.4-macosx-10.3-fat.egg to /Users/gawel/tmp/tttt/lib/python2.4/site-packages Adding lxml 2.2.2 to easy-install.pth file Installed /Users/gawel/tmp/tttt/lib/python2.4/site-packages/lxml-2.2.2-py2.4-macosx-10.3-fat.egg Processing dependencies for lxml==2.2.2 Finished processing dependencies for lxml==2.2.2 As you notice, it's for macosx-10.3, not 10.5 (but I assume I have a 10.5.8). I don't know the difference. Maybe the XCode version installed on the system... I can build an egg for py 2.4 2.5 2.6 if you want to add this on pypi even if it seems that I'm the only one that have this problem. -- Gael From uwe.ml at family-hoffmann.de Sun Sep 13 09:45:42 2009 From: uwe.ml at family-hoffmann.de (Uwe Hoffmann) Date: Sun, 13 Sep 2009 09:45:42 +0200 Subject: [lxml-dev] namespace inconsistency with attributes Message-ID: <1252827942.6953.3.camel@schreihals> hi, here is a python session showing a namespace inconsistency (from my point of view). It seems that a roundtrip (reading and saving a xml document ) is not possible with lxml under this circumstances. Maybe there is a misunderstanding on my side. user at workstation:~/tmp$ python Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import lxml.etree as et >>> eTree=et.fromstring(""" ... ... ... ... """) >>> bluff=eTree.findall("{urn:ns1}bluff")[0] >>> print bluff.attrib["{urn:ns1}val"] abc >>> bluff.attrib["{urn:ns1}val"]="def" >>> print bluff.attrib["{urn:ns1}val"] def >>> #so far everything went well >>> ts=et.tostring(eTree, xml_declaration=True, encoding="UTF-8") >>> print ts >>> # val="def" --> attribute namespace was >>> # lost because of default namespace ? >>> eTree=et.fromstring(ts) >>> bluff=eTree.findall("{urn:ns1}bluff")[0] >>> print bluff.attrib["{urn:ns1}val"] Traceback (most recent call last): File "", line 1, in File "lxml.etree.pyx", line 1893, in lxml.etree._Attrib.__getitem__ (src/lxml/lxml.etree.c:19259) KeyError: '{urn:ns1}val' >>> regards uwe hoffmann From dave.brk at gmail.com Mon Sep 14 15:22:36 2009 From: dave.brk at gmail.com (Dov Reshef) Date: Mon, 14 Sep 2009 16:22:36 +0300 Subject: [lxml-dev] Problem installing lxml 2.2.2 for python 3.1 Message-ID: Hi all I'm trying to install lxml 2.2.2 for python 3.1. (I'm using the egg for python version 3, simply unpacking it to the site-packages folder). However, when I try to use it in my code I get "ImportError: DLL load failed", which if I understand it correctly, means that it can't find the etree.dll even though it's right there in my site-packages folder (etree.pyd). Thanks Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090914/ca1eff73/attachment.htm From stefan_ml at behnel.de Mon Sep 14 17:21:50 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 14 Sep 2009 17:21:50 +0200 Subject: [lxml-dev] Problem installing lxml 2.2.2 for python 3.1 In-Reply-To: References: Message-ID: <4AAE5F8E.9010905@behnel.de> Dov Reshef wrote: > I'm trying to install lxml 2.2.2 for python 3.1. (I'm using the egg for > python version 3, simply unpacking it to the site-packages folder). However, > when I try to use it in my code I get "ImportError: DLL load failed", which > if I understand it correctly, means that it can't find the etree.dll even > though it's right there in my site-packages folder (etree.pyd). 3.1 isn't 3.0 compatible, I guess. We don't currently have binary eggs for 3.1, sorry. Stefan From stefan_ml at behnel.de Mon Sep 14 17:44:51 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 14 Sep 2009 17:44:51 +0200 Subject: [lxml-dev] namespace inconsistency with attributes In-Reply-To: <1252827942.6953.3.camel@schreihals> References: <1252827942.6953.3.camel@schreihals> Message-ID: <4AAE64F3.7050201@behnel.de> Uwe Hoffmann wrote: > here is a python session showing a namespace inconsistency (from my > point of view). It seems that a roundtrip (reading and > saving a xml document ) is not possible with lxml under this > circumstances. Maybe there is a misunderstanding on my side. > > user at workstation:~/tmp$ python > Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) > [GCC 4.3.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import lxml.etree as et >>>> eTree=et.fromstring(""" > ... ... xmlns:o="urn:ns2" > ... xmlns:ns1="urn:ns1" > ... > > ... > ... > ... """) >>>> bluff=eTree.findall("{urn:ns1}bluff")[0] >>>> print bluff.attrib["{urn:ns1}val"] > abc >>>> bluff.attrib["{urn:ns1}val"]="def" >>>> print bluff.attrib["{urn:ns1}val"] > def >>>> #so far everything went well >>>> ts=et.tostring(eTree, xml_declaration=True, encoding="UTF-8") >>>> print ts > > > > >>>> # val="def" --> attribute namespace was >>>> # lost because of default namespace ? >>>> eTree=et.fromstring(ts) >>>> bluff=eTree.findall("{urn:ns1}bluff")[0] >>>> print bluff.attrib["{urn:ns1}val"] > Traceback (most recent call last): > File "", line 1, in > File "lxml.etree.pyx", line 1893, in lxml.etree._Attrib.__getitem__ > (src/lxml/lxml.etree.c:19259) > KeyError: '{urn:ns1}val' Yes, I agree that this is a bug. Attributes in namespaces deserve some more special casing. Could you file a bug report? Thanks, Stefan From sidnei.da.silva at gmail.com Tue Sep 15 05:30:29 2009 From: sidnei.da.silva at gmail.com (Sidnei da Silva) Date: Tue, 15 Sep 2009 00:30:29 -0300 Subject: [lxml-dev] Python 3.1 binary on Windows? In-Reply-To: <1imaw9ksp0t13.oyg3hy1wkp5j.dlg@40tude.net> References: <1n6jo1parl8po.iyhfdc2pakhw.dlg@40tude.net> <200909021619.13985.howesteve@gmail.com> <4A9EC851.4090104@behnel.de> <1imaw9ksp0t13.oyg3hy1wkp5j.dlg@40tude.net> Message-ID: >> Uhm. Yeah. I thought I had those but I have only 3.0. When is the next >> release of lxml due? If it will take a while I can upload a 3.1 build >> of lxml 2.2.2. > > Thanks. This was indeed exactly what I meant. So I'll be waiting, and if in > the meantime you do find the time to upload a 3.1 build, it would be much > appreciated. I managed to build an installer-based one, but somehow building an egg didnt work out. http://pypi.python.org/packages/3.1/l/lxml/lxml-2.2.2.win32-py3.1.exe#md5=ea35d6b3a4d24f8aac60a98a6c99c9b1 Please give it a try. Cheers! -- Sidnei From frank at chagford.com Thu Sep 17 10:05:26 2009 From: frank at chagford.com (Frank Millman) Date: Thu, 17 Sep 2009 10:05:26 +0200 Subject: [lxml-dev] Question about schema validation Message-ID: <20090917080538.823AE71F9@ctb-mesg-2-1.saix.net> Hi all I am using both lxml (version 2.2.2) and XmlCopyEditor (XCE) to validate some instance documents against a schema. I have come across a situation where XCE reports an error, but lxml does not. AFAICT XCE is correct. Could this be a bug in lxml? Here is a snippet from the instance document - Here is the relevant section from the schema - The error message reported by XCE is 'ID buyerName is not unique'. Any thoughts? Frank Millman From stefan_ml at behnel.de Thu Sep 17 16:31:54 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 17 Sep 2009 16:31:54 +0200 Subject: [lxml-dev] Question about schema validation In-Reply-To: <20090917080538.823AE71F9@ctb-mesg-2-1.saix.net> References: <20090917080538.823AE71F9@ctb-mesg-2-1.saix.net> Message-ID: <4AB2485A.3090302@behnel.de> Frank Millman wrote: > I have come across a situation where XCE reports an error, but lxml does > not. AFAICT XCE is correct. Could this be a bug in lxml? > [...] > The error message reported by XCE is 'ID buyerName is not unique'. XML Schema support is known to have minor issues in libxml2. Please make sure you use the latest version of libxml2. Then try to validate the document using xmllint (a tool that comes with libxml2), and if that fails (well, or fails to fail in this case), report the problem on the libxml2 mailing list. Stefan From daniel.albeseder at tttech.com Thu Sep 17 17:46:58 2009 From: daniel.albeseder at tttech.com (Daniel Albeseder) Date: Thu, 17 Sep 2009 17:46:58 +0200 Subject: [lxml-dev] Question about schema validation In-Reply-To: <20090917080538.823AE71F9@ctb-mesg-2-1.saix.net> References: <20090917080538.823AE71F9@ctb-mesg-2-1.saix.net> Message-ID: <1253202418.26787.7.camel@tttdal> On Thu, 2009-09-17 at 10:05 +0200, Frank Millman wrote: > > ... > > > > > > > Here is the relevant section from the schema - > > The error message reported by XCE is 'ID buyerName is not unique'. > > Any thoughts? Sounds correct for me. The schema type xsd:ID defines a unique identifier inside the whole document. If there are obviously two IDs with the same value, this is an error. If dont want the scope of the ID to be over the whole file, you would need to use xsd:key or xsd:unique inside the schema. regards Daniel From lei at ipac.caltech.edu Thu Sep 17 20:00:26 2009 From: lei at ipac.caltech.edu (Mary Lei) Date: Thu, 17 Sep 2009 11:00:26 -0700 Subject: [lxml-dev] lxml.html does not support form.enctype Message-ID: <4AB2793A.8030608@ipac.caltech.edu> Looks like lxml.html does not provide access to form enctype. Any suggestions on how to get enctype from form? -- Mary Lei Software Testing IPAC-NExScl Rm: KS-233 MS: 220-6 Phone: 395-1998 From stefan_ml at behnel.de Fri Sep 18 09:24:27 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Sep 2009 09:24:27 +0200 Subject: [lxml-dev] lxml.html does not support form.enctype In-Reply-To: <4AB2793A.8030608@ipac.caltech.edu> References: <4AB2793A.8030608@ipac.caltech.edu> Message-ID: <4AB335AB.3070404@behnel.de> Mary Lei wrote: > Looks like lxml.html does not provide access > to form enctype. > Any suggestions on how to get enctype from form? What about form.get('enctype') ? Stefan From frank at chagford.com Fri Sep 18 10:49:14 2009 From: frank at chagford.com (Frank Millman) Date: Fri, 18 Sep 2009 10:49:14 +0200 Subject: [lxml-dev] Question about schema validation In-Reply-To: <4AB2485A.3090302@behnel.de> Message-ID: <20090918084924.E304D5F13@ctb-mesg-1-2.saix.net> Stefan Behnel wrote: > > Frank Millman wrote: > > I have come across a situation where XCE reports an error, but lxml > > does not. AFAICT XCE is correct. Could this be a bug in lxml? > > [...] > > The error message reported by XCE is 'ID buyerName is not unique'. > > XML Schema support is known to have minor issues in libxml2. > Please make sure you use the latest version of libxml2. Then > try to validate the document using xmllint (a tool that comes > with libxml2), and if that fails (well, or fails to fail in > this case), report the problem on the libxml2 mailing list. > It is fixed in libxml2 2.7.4. Thanks, Stefan. Frank From jholg at gmx.de Fri Sep 18 17:05:14 2009 From: jholg at gmx.de (jholg at gmx.de) Date: Fri, 18 Sep 2009 17:05:14 +0200 Subject: [lxml-dev] Question about schema validation In-Reply-To: <20090918084924.E304D5F13@ctb-mesg-1-2.saix.net> References: <20090918084924.E304D5F13@ctb-mesg-1-2.saix.net> Message-ID: <20090918150514.223110@gmx.net> Hi, > Stefan Behnel wrote: > > > > Frank Millman wrote: > > > I have come across a situation where XCE reports an error, but lxml > > > does not. AFAICT XCE is correct. Could this be a bug in lxml? > > > [...] > > > The error message reported by XCE is 'ID buyerName is not unique'. > > > > XML Schema support is known to have minor issues in libxml2. > > Please make sure you use the latest version of libxml2. Then > > try to validate the document using xmllint (a tool that comes > > with libxml2), and if that fails (well, or fails to fail in > > this case), report the problem on the libxml2 mailing list. > > > > It is fixed in libxml2 2.7.4. > > FWIW, oxygen (Xerces) reports both errors: E [Xerces] cvc-attribute.3: The value 'buyerName' of attribute 'id' on element 'resourceParameter' is not valid with respect to its type, 'ID'. E [Xerces] cvc-id.2: There are multiple occurrences of ID value 'buyerName'. The XMLSchema coverage got me curious. Here's bits I found out: A statement on the libxml2 mailing list wrt XMLSchema support: http://www.mail-archive.com/xml at gnome.org/msg06791.html W3C offers some schema test suite consisting of schemas, instance docs, and expected validation results, although the info is a bit scattered (I'm not even sure which is the latest "official" version of the tests). With a little test harness that I whipped up I get these results (for the test suite found here: http://www.w3.org/XML/2004/xml-schema-test-suite/xmlschema2006-11-06/xsts-2007-06-20.tar.gz) *** Running test suite suite.xml *** Ran 39420 tests, 37473 ok, 0 failed, 1718 non-canonical failed (canonical test states: ['stable']) Non-canonical failures: {'queried': 54, 'accepted': 1664} 229 tests could not be run: {'schema ../msData/particles/particlesZ015.xsd manually excluded': 1, 'schema ../msData/datatypes/datatypes.xsd manually excluded': 1, 'schema ../msData/particles/particlesZ012.xsd manually excluded': 1, 'XMLSyntaxError': 3, 'XMLSchemaParseError': 221, 'schema ../msData/additional/test264908_1.xsd manually excluded': 2} No guarantees that these are sensible results... Some tests I cannot run due to segfaults when instantiating etree.XMLSchema so I manually exclude them. So either: - the test for ID uniqueness is not in "stable" state - there is no test for it - it's buried in schemas that I have to exclude - my test harness is horribly broken ;-) I'm running lxml 2.2.2 with libxml2 (2, 6, 32) and libxslt (1, 1, 23) on solaris. Holger -- Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate f?r nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02 From dineshbvadhia at hotmail.com Fri Sep 18 20:18:29 2009 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 18 Sep 2009 11:18:29 -0700 Subject: [lxml-dev] xpath problem Message-ID: I'm having a problem with xpath and here are the details: Xml fragment: Ships and Shipping Xpath: '/nitf/head/docdata/identifiedcontent/classifier[@class="indexing_service" and @type="descriptor"]' Code: descriptors_list = tree.xpath('/nitf/head/docdata/identifiedcontent/classifier[@class="indexing_service" and @type="descriptor"]') for descriptors_element in descriptors_list: print descriptors_element.tag, descriptors_element.text for child in descriptors_element.getchildren(): print child.tag, child.text But, the code isn't picking up the required text eg. 'Ships and Shipping'. Any ideas what is incorrect? Dinesh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090918/ae98ae3e/attachment-0001.htm From lei at ipac.caltech.edu Fri Sep 18 20:25:18 2009 From: lei at ipac.caltech.edu (Mary Lei) Date: Fri, 18 Sep 2009 11:25:18 -0700 Subject: [lxml-dev] lxml.html does not support form.enctype In-Reply-To: <4AB335AB.3070404@behnel.de> References: <4AB2793A.8030608@ipac.caltech.edu> <4AB335AB.3070404@behnel.de> Message-ID: <4AB3D08E.9070406@ipac.caltech.edu> Works fine. Thanks. Stefan Behnel wrote: > Mary Lei wrote: >> Looks like lxml.html does not provide access >> to form enctype. >> Any suggestions on how to get enctype from form? > > What about > > form.get('enctype') > > ? > > Stefan -- Mary Lei Software Testing IPAC-NExScl Rm: KS-233 MS: 220-6 Phone: 395-1998 From dineshbvadhia at hotmail.com Fri Sep 18 22:08:52 2009 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 18 Sep 2009 13:08:52 -0700 Subject: [lxml-dev] xpath problem In-Reply-To: <433ebc870909181244m55c00433vb56bbceb19434d59@mail.gmail.com> References: <433ebc870909181244m55c00433vb56bbceb19434d59@mail.gmail.com> Message-ID: oops! works perfectly now. thank-you. From: Chuck Bearden Sent: Friday, September 18, 2009 12:44 PM To: Dinesh B Vadhia Subject: Re: [lxml-dev] xpath problem On Fri, Sep 18, 2009 at 1:18 PM, Dinesh B Vadhia wrote: > I'm having a problem with? xpath and here are the details: > > Xml fragment: > > ? ? ? > ? ? ? ? ? > ? ? ? ? ? ? ? Ships and > Shipping > > Xpath: > '/nitf/head/docdata/identifiedcontent/classifier[@class="indexing_service" > and @type="descriptor"]' > Code: > descriptors_list = tree.xpath('/nitf/head/docdata/identifiedcontent/classifier[@class="indexing_service" and @type="descriptor"]') Hi Dinesh, check your spelling. You have an element 'identified-content' in your fragment (which is lacking the 'nitf' element specified in the XPath, btw), but the XPath refers to 'identifiedcontent'. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090918/184a5e4e/attachment.htm From manu3d at gmail.com Sun Sep 20 13:16:17 2009 From: manu3d at gmail.com (Emanuele D'Arrigo) Date: Sun, 20 Sep 2009 12:16:17 +0100 Subject: [lxml-dev] xpath check, selective xslt Message-ID: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> Hi everybody, a couple of questions: 1) is it possible to check if an element is selected by a particular xpath expression -without- checking if the element is part of the whole node-set returned by the expression evaluation? 2) let's assume we have an xslt-transformed ElementTree and that we add an untransformed branch to it. Is it possible to apply the same xslt transformation selectively, only to the added branch but not to the rest of the tree? Of course this wouldn't help in those situation where transformed(tree+branch) != transformed(tree) + transformed(branch). But excluding those cases, is there a way to do it? Thanks for your help! Manu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090920/37352a6e/attachment.htm From jholg at gmx.de Mon Sep 21 17:12:20 2009 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 21 Sep 2009 17:12:20 +0200 Subject: [lxml-dev] objectify __setattr__/addattr bug Message-ID: <20090921151220.289710@gmx.net> Hi, funny how s.th. like this could go undetected in all lxml.objectify versions: python2.4 -i -c 'from lxml import etree, objectify; objectify.enable_recursive_str(); print etree.__version__; print etree.LIBXML_VERSION, etree.LIBXSLT_VERSION; root=objectify.Element("root")' 2.2.2 (2, 6, 32) (1, 1, 23) >>> root.s1 = '???' Traceback (most recent call last): File "", line 1, in ? File "lxml.objectify.pyx", line 246, in lxml.objectify.ObjectifiedElement.__setattr__ (src/lxml/lxml.objectify.c:3061) File "lxml.objectify.pyx", line 524, in lxml.objectify._appendValue (src/lxml/lxml.objectify.c:5771) File "lxml.objectify.pyx", line 552, in lxml.objectify._setElementValue (src/lxml/lxml.objectify.c:6033) File "public-api.pxi", line 76, in lxml.etree.setNodeText (src/lxml/lxml.etree.c:118529) File "apihelpers.pxi", line 650, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:15144) File "apihelpers.pxi", line 1247, in lxml.etree._utf8 (src/lxml/lxml.etree.c:19727) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes >>> root.addattr('s2', '???') Traceback (most recent call last): File "", line 1, in ? File "lxml.objectify.pyx", line 261, in lxml.objectify.ObjectifiedElement.addattr (src/lxml/lxml.objectify.c:3228) File "lxml.objectify.pyx", line 524, in lxml.objectify._appendValue (src/lxml/lxml.objectify.c:5771) File "lxml.objectify.pyx", line 552, in lxml.objectify._setElementValue (src/lxml/lxml.objectify.c:6033) File "public-api.pxi", line 76, in lxml.etree.setNodeText (src/lxml/lxml.etree.c:118529) File "apihelpers.pxi", line 650, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:15144) File "apihelpers.pxi", line 1247, in lxml.etree._utf8 (src/lxml/lxml.etree.c:19727) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes >>> print root root = None [ObjectifiedElement] s1 = u'' [StringElement] * py:pytype = 'str' s2 = u'' [StringElement] * py:pytype = 'str' >>> So while the assignment (string) rvalue is unacceptable an element gets created nonetheless; I'd say this is a bug. This fixes it: $ svn diff Index: src/lxml/lxml.objectify.pyx =================================================================== --- src/lxml/lxml.objectify.pyx (revision 67827) +++ src/lxml/lxml.objectify.pyx (working copy) @@ -523,9 +523,10 @@ for item in value: _appendValue(parent, tag, item) else: - new_element = cetree.makeSubElement( - parent, tag, None, None, None, None) + new_element = cetree.makeElement( + tag, parent._doc, None, None, None, None, None) _setElementValue(new_element, value) + cetree.appendChild(parent, new_element) cdef _setElementValue(_Element element, value): cdef python.PyObject* _pytype Add some test(s) and check it in? Holger -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From panos at binaryarchitects.com Mon Sep 21 21:29:27 2009 From: panos at binaryarchitects.com (Panos Lambrianides) Date: Mon, 21 Sep 2009 12:29:27 -0700 Subject: [lxml-dev] Help with a problem Message-ID: Hello, I apologize up front if this is an issue that you have seen before. I spent quite of bit of time searching for a solution on google but could not find anything. I have the following problem. I am trying to do a very simple xpath search on an xml file as follows: Python 2.6 (r26:66714, Sep 21 2009, 17:48:26) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xml.etree.ElementTree as etree >>> f= open('out.xml','r') >>> tree = etree.parse(f) >>> tree.xpath('/form/input') Traceback (most recent call last): File "", line 1, in AttributeError: ElementTree instance has no attribute 'xpath' As you can see it complains that ElementTree has no xpath method associated with it. What am I missing here? I have lxml version 2.2.2 installed on Ubuntu 9.04 with python development version 2.6 and libsxlt version 1.1.25. On what may be relate importing etree gives the following error: >>> from lxml import etree Traceback (most recent call last): File "", line 1, in ImportError: /usr/local/lib/python2.6/site-packages/lxml/etree.so: undefined symbol: xsltProcessOneNode >>> Thanks in advance for any help, -Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090921/4d6c9ff9/attachment.htm From manu3d at gmail.com Mon Sep 21 23:19:19 2009 From: manu3d at gmail.com (Emanuele D'Arrigo) Date: Mon, 21 Sep 2009 22:19:19 +0100 Subject: [lxml-dev] Help with a problem In-Reply-To: References: Message-ID: <915dc91d0909211419h1e55db9em7fc5ac3a0c2d0103@mail.gmail.com> 2009/9/21 Panos Lambrianides > I apologize up front if this is an issue that you have seen before. I > spent quite of bit of time searching for a solution on google but could not > find anything. > > I have the following problem. I am trying to do a very simple xpath search > on an xml file as follows: > *>>> import xml.etree.ElementTree as etree* > >>> f= open('out.xml','r') > >>> tree = etree.parse(f) > >>> tree.xpath('/form/input') > Traceback (most recent call last): > File "", line 1, in > AttributeError: ElementTree instance has no attribute 'xpath' > Panos, the problem here is that you are importing python's standard ElementTree library rather than lxml's etree. The ElementTree object in the standard library does not have an xpath() method. You should replace the import statement with "from lxml import etree". I understand you are having problem with that too but on that issue I can't help you. It seems some kind of dependency issue. Maybe Stephen will be able to shed some light. Manu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090921/d582fea3/attachment.htm From panos at binaryarchitects.com Tue Sep 22 00:05:55 2009 From: panos at binaryarchitects.com (Panos Lambrianides) Date: Mon, 21 Sep 2009 22:05:55 +0000 (UTC) Subject: [lxml-dev] Help with a problem References: Message-ID: Thanks to those who responded. There was a configuration issue with lxml2. There was an issue with dynamic linking to an xslt dynamic library. This issue has been resolved by reinstalling using easy_install versus building from source. Regards, -Panos From stefan_ml at behnel.de Sun Sep 27 18:22:56 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 27 Sep 2009 18:22:56 +0200 Subject: [lxml-dev] objectify __setattr__/addattr bug In-Reply-To: <20090921151220.289710@gmx.net> References: <20090921151220.289710@gmx.net> Message-ID: <4ABF9160.3090200@behnel.de> Hi, jholg at gmx.de wrote: > So while the assignment (string) rvalue is unacceptable an element gets created nonetheless; I'd say this is a bug. > > This fixes it: > > $ svn diff > Index: src/lxml/lxml.objectify.pyx > =================================================================== > --- src/lxml/lxml.objectify.pyx (revision 67827) > +++ src/lxml/lxml.objectify.pyx (working copy) > @@ -523,9 +523,10 @@ > for item in value: > _appendValue(parent, tag, item) > else: > - new_element = cetree.makeSubElement( > - parent, tag, None, None, None, None) > + new_element = cetree.makeElement( > + tag, parent._doc, None, None, None, None, None) > _setElementValue(new_element, value) > + cetree.appendChild(parent, new_element) > > cdef _setElementValue(_Element element, value): > cdef python.PyObject* _pytype > > Add some test(s) and check it in? Yes, please do. Thanks! Stefan From marcello at perathoner.de Mon Sep 28 13:19:06 2009 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 28 Sep 2009 13:19:06 +0200 Subject: [lxml-dev] Bug: Attribute names ignore their namespaces Message-ID: <4AC09BAA.7040407@perathoner.de> I want to delete the lang attribute from an html element that has both lang and xml:lang attributes. I found I cannot do that consistently. lxml deletes either lang or xml:lang at random. The results depend on the order the two attributes are specified in the html source. --- lxmlbug.py -------------------------------------- from lxml import etree html = etree.XML ("") del html.attrib['lang'] print etree.tostring (html) # prints expected: html = etree.XML ("") del html.attrib['lang'] print etree.tostring (html) # prints unexpected: print print "lxml.etree: ", etree.LXML_VERSION print "libxml used: ", etree.LIBXML_VERSION print "libxml compiled: ", etree.LIBXML_COMPILED_VERSION print "libxslt used: ", etree.LIBXSLT_VERSION print "libxslt compiled: ", etree.LIBXSLT_COMPILED_VERSION ----------------------------------------------------------- $ python lxmlbug.py lxml.etree: (2, 2, 2, 0) libxml used: (2, 7, 5) libxml compiled: (2, 7, 3) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 24) $ -- Marcello Perathoner webmaster at gutenberg.org From stefan_ml at behnel.de Mon Sep 28 14:04:00 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 28 Sep 2009 14:04:00 +0200 Subject: [lxml-dev] Bug: Attribute names ignore their namespaces In-Reply-To: <4AC09BAA.7040407@perathoner.de> References: <4AC09BAA.7040407@perathoner.de> Message-ID: <4AC0A630.6060205@behnel.de> Hi, Marcello Perathoner wrote: > I want to delete the lang attribute from an html element that has both > lang and xml:lang attributes. I found I cannot do that consistently. > > lxml deletes either lang or xml:lang at random. The results depend on > the order the two attributes are specified in the html source. Yes, looks like a bug to me. Could you please file a bug report on the bug tracker? Thanks, Stefan From stefan_ml at behnel.de Mon Sep 28 15:03:44 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 28 Sep 2009 15:03:44 +0200 Subject: [lxml-dev] lxml and libxslt-1.1.25 Message-ID: <4AC0B430.8050007@behnel.de> Hi all, there's a problem with lxml that shows when using libxslt 1.1.25. By accident, lxml uses an internal function from libxslt (xsltProcessOneNode) that wasn't declared in a public header file. The use of this function is related to XSLT extension elements. The problem didn't really show in earlier versions because the symbol was exported, so lxml could use it on most platforms - until it was declared "static" in libxslt 1.1.25. Daniel Veillard reacted quickly and decided to officially export this function starting from 1.1.26. So while you cannot build lxml with libxslt 1.1.25, it will work perfectly with the already released libxslt 1.1.26. Sorry for any inconvenience this may cause. Stefan From stefan_ml at behnel.de Mon Sep 28 15:46:10 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 28 Sep 2009 15:46:10 +0200 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> Message-ID: <4AC0BE22.5010007@behnel.de> Hi, Emanuele D'Arrigo wrote: > 1) is it possible to check if an element is selected by a particular xpath > expression -without- checking if the element is part of the whole node-set > returned by the expression evaluation? Not that I'm aware of. > 2) let's assume we have an xslt-transformed ElementTree and that we add an > untransformed branch to it. Is it possible to apply the same xslt > transformation selectively, only to the added branch but not to the rest of > the tree? Of course this wouldn't help in those situation where > transformed(tree+branch) != transformed(tree) + transformed(branch). But > excluding those cases, is there a way to do it? Why don't you apply the XSLT to the subtree *before* you insert it? Stefan From jholg at gmx.de Mon Sep 28 17:39:49 2009 From: jholg at gmx.de (jholg at gmx.de) Date: Mon, 28 Sep 2009 17:39:49 +0200 Subject: [lxml-dev] objectify __setattr__/addattr bug In-Reply-To: <4ABF9160.3090200@behnel.de> References: <20090921151220.289710@gmx.net> <4ABF9160.3090200@behnel.de> Message-ID: <20090928153949.249000@gmx.net> Hi, > jholg at gmx.de wrote: > > So while the assignment (string) rvalue is unacceptable an element gets > created nonetheless; I'd say this is a bug. > > [...] > > Add some test(s) and check it in? > > Yes, please do. Done: https://codespeak.net/viewvc/?view=rev&revision=67943 As the testcases need to use unicode data: I'm pretty sure I did not break anything for Py3; I used the helper functions from common_imports but can't currently really test this due to lack of a working Py3 environment. Holger -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From manu3d at gmail.com Tue Sep 29 11:32:26 2009 From: manu3d at gmail.com (Emanuele D'Arrigo) Date: Tue, 29 Sep 2009 10:32:26 +0100 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <4AC0BE22.5010007@behnel.de> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> <4AC0BE22.5010007@behnel.de> Message-ID: <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> 2009/9/28 Stefan Behnel > Emanuele D'Arrigo wrote: > > 1) is it possible to check if an element is selected by a particular > xpath > > expression -without- checking if the element is part of the whole > node-set > > returned by the expression evaluation? > > Not that I'm aware of. > Darn! It'd be useful! To retrieve a whole node-set to check if one element is in it seems to be quite computationally expensive! Ok, I'll try to survive. =) > Why don't you apply the XSLT to the subtree *before* you insert it? > Because some transformations only apply "in context". I.e. suppose a transformation only applies to a
element if it is a child of the element. I must first attach it to the element and only then I can transform it. No? Manu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090929/871d26ad/attachment.htm From stefan_ml at behnel.de Tue Sep 29 11:50:41 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 29 Sep 2009 11:50:41 +0200 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> <4AC0BE22.5010007@behnel.de> <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> Message-ID: <4AC1D871.5080207@behnel.de> Emanuele D'Arrigo wrote: > 2009/9/28 Stefan Behnel > >> Emanuele D'Arrigo wrote: >>> 1) is it possible to check if an element is selected by a particular >> xpath >>> expression -without- checking if the element is part of the whole >> node-set >>> returned by the expression evaluation? >> Not that I'm aware of. > > Darn! It'd be useful! To retrieve a whole node-set to check if one element > is in it seems to be quite computationally expensive! Ok, I'll try to > survive. =) Well, what you *could* do is write an XPath extension function in Python that throws a dedicated SuccessException when called (so that the XPath evaluation terminates), and then catch that from your calling code that executes an XPath expression like this: //whatever[../parent and success(.)] Then give the exception an attribute that stores the element you found, and extract it where you catch the exception. I never tested that, but it should give you an XPath expression that shortcuts after the first hit (or N hits, if you add a counter to the function). Please report back if it works, that might make a nice FAQ entry. >> Why don't you apply the XSLT to the subtree *before* you insert it? > > Because some transformations only apply "in context". I.e. suppose a > transformation only applies to a
element if it is a child of the > element. I must first attach it to the element and only then I > can transform it. No? Ok, then why don't you apply it to the element *after* inserting it? Stefan From manu3d at gmail.com Tue Sep 29 13:07:32 2009 From: manu3d at gmail.com (Emanuele D'Arrigo) Date: Tue, 29 Sep 2009 12:07:32 +0100 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <4AC1D871.5080207@behnel.de> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> <4AC0BE22.5010007@behnel.de> <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> <4AC1D871.5080207@behnel.de> Message-ID: <915dc91d0909290407l11c7730fmcdefe0eb583e0a75@mail.gmail.com> 2009/9/29 Stefan Behnel > I never tested that, but it should give you an XPath expression that > shortcuts after the first hit (or N hits, if you add a counter to the > function). > It would certainly reduce computation time as in the best case the element in question is the first match. But it would still be expensive for the worst case scenario where the element to be matched is the last. And even the best case the element in question might be deep in the tree, requiring an expensive transversal anyway. No, I think it'd be good to implement some kind of reverse xpath evaluation, where the algorithm start from the last component of the xpath expression and works its way backward, from the element to be verified toward the root. I.e. given the tree: and given the element and the xpath "/alpha/bravo/*/delta", the algorithm would first verify that the element's tag is "delta". Then it would check that has a parent, any parent. Then it would check that this parent is the child of "bravo", but as that's not the case the evaluation would return False: the element is not matched by the given xpath expression. I guess things might get tricky evaluating in reverse more complex cases and functions... > >> Why don't you apply the XSLT to the subtree *before* you insert it? > > > > Because some transformations only apply "in context". I.e. suppose a > > transformation only applies to a
element if it is a child of the > > element. I must first attach it to the element and only > then I > > can transform it. No? > > Ok, then why don't you apply it to the element *after* inserting it? > IF the transformation is specific to the subtree sure, as the rest of the document would be unaffected. But the general case is that the transformation in question might be part of a bigger xslt file which might have been applied already to the rest of the document. Applying the same xslt file again to the whole document might have undesirable consequences. =? Manu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090929/1ed17d4a/attachment.htm From roland.hedberg at adm.umu.se Tue Sep 29 15:59:24 2009 From: roland.hedberg at adm.umu.se (Roland Hedberg) Date: Tue, 29 Sep 2009 15:59:24 +0200 Subject: [lxml-dev] lxml and xmlsec Message-ID: Hi! I need xmlsec and I'd like to use lxml instead of libxml, is it possible ? libxml is as it's distributed dependent on libxml, but I don't know how hard it would be to put together a lxmlsec where libxml was exchanged for lxml. Has anyone done it ? --Roland From stefan_ml at behnel.de Tue Sep 29 16:27:38 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 29 Sep 2009 16:27:38 +0200 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <915dc91d0909290407l11c7730fmcdefe0eb583e0a75@mail.gmail.com> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> <4AC0BE22.5010007@behnel.de> <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> <4AC1D871.5080207@behnel.de> <915dc91d0909290407l11c7730fmcdefe0eb583e0a75@mail.gmail.com> Message-ID: <4AC2195A.1010307@behnel.de> Emanuele D'Arrigo wrote: > I think it'd be good to implement some kind of reverse xpath evaluation, > where the algorithm start from the last component of the xpath expression > and works its way backward, from the element to be verified toward the root. > I.e. given the tree: > > > > > > > > > and given the element and the xpath "/alpha/bravo/*/delta", the > algorithm would first verify that the element's tag is "delta". Then it > would check that has a parent, any parent. Then it would check that > this parent is the child of "bravo", but as that's not the case the > evaluation would return False: the element is not matched by the given xpath > expression. I guess things might get tricky evaluating in reverse more > complex cases and functions... "might get tricky" is clearly the wrong wording here. >> >> Why don't you apply the XSLT to the subtree *before* you insert it? >>> Because some transformations only apply "in context". I.e. suppose a >>> transformation only applies to a
element if it is a child of the >>> element. I must first attach it to the element and only >>> then I can transform it. No? >> Ok, then why don't you apply it to the element *after* inserting it? > > IF the transformation is specific to the subtree sure, as the rest of the > document would be unaffected. But the general case is that the > transformation in question might be part of a bigger xslt file which might > have been applied already to the rest of the document. Applying the same > xslt file again to the whole document might have undesirable consequences. Note how I wrote "element", not "document". Stefan From manu3d at gmail.com Tue Sep 29 16:50:09 2009 From: manu3d at gmail.com (Emanuele D'Arrigo) Date: Tue, 29 Sep 2009 15:50:09 +0100 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <4AC2195A.1010307@behnel.de> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> <4AC0BE22.5010007@behnel.de> <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> <4AC1D871.5080207@behnel.de> <915dc91d0909290407l11c7730fmcdefe0eb583e0a75@mail.gmail.com> <4AC2195A.1010307@behnel.de> Message-ID: <915dc91d0909290750n7234131ey5b5b8e81e0e8fc1a@mail.gmail.com> 2009/9/29 Stefan Behnel > > I guess things might get tricky evaluating in reverse more > > complex cases and functions... > > "might get tricky" is clearly the wrong wording here. > Hehehe, fair enough. =) > >> Ok, then why don't you apply it to the element *after* inserting it? > Note how I wrote "element", not "document". > I didn't know it was possible to apply an xslt transformation to an element! That's exactly what I was after! It's true, the APIclearly states: "Calling this object on a tree or Element will execute the XSLT". I just had no idea about it because I only looked hereand all examples refer to ElementTree objects being the input to XSLT objects. Might be good to add an Element example or change an existing one to handle an element. Thank you though, this is really good news! Manu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090929/fbf20c94/attachment.htm From stefan_ml at behnel.de Tue Sep 29 17:10:04 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 29 Sep 2009 17:10:04 +0200 Subject: [lxml-dev] xpath check, selective xslt In-Reply-To: <915dc91d0909290750n7234131ey5b5b8e81e0e8fc1a@mail.gmail.com> References: <915dc91d0909200416p59026e97of0c1fe9a65e1a1c1@mail.gmail.com> <4AC0BE22.5010007@behnel.de> <915dc91d0909290232r78552a9co2d5953c090f4a8cd@mail.gmail.com> <4AC1D871.5080207@behnel.de> <915dc91d0909290407l11c7730fmcdefe0eb583e0a75@mail.gmail.com> <4AC2195A.1010307@behnel.de> <915dc91d0909290750n7234131ey5b5b8e81e0e8fc1a@mail.gmail.com> Message-ID: <4AC2234C.5080209@behnel.de> Emanuele D'Arrigo wrote: > I didn't know it was possible to apply an xslt transformation to an element! > That's exactly what I was after! It's true, the > APIclearly > states: "Calling this object on a tree or Element will execute the > XSLT". I just had no idea about it because I only looked > hereand all examples refer > to ElementTree objects being the input to XSLT > objects. Might be good to add an Element example or change an existing one > to handle an element. > > Thank you though, this is really good news! ... not as good as that. I just checked, and it actually decouples the element from the rest of the document before running the XSLT. So that won't help in the case that the XSLT needs to refer to the ancestors. I wonder if it would make sense to disable the decoupling for plain Elements. The problem is that this might break code. OTOH, I expect little XSLT code to really depend on this, and it's easy to work around by wrapping the element in an ElementTree object. Stefan From reagle at mit.edu Wed Sep 30 23:28:54 2009 From: reagle at mit.edu (Joseph Reagle) Date: Wed, 30 Sep 2009 17:28:54 -0400 Subject: [lxml-dev] Question about lxml.html.builder Message-ID: <200909301728.55127.reagle@mit.edu> I'm using Python 2.5.2 with lxml 2.1.1-1ubuntu1. I'm trying to get some version of the lines involving 'wp_ps' to work: if opts.text: wp_ps = [E.P(p) for p in bio.wp_text.split('\n')] table.append( E.TR( E.TD(''), E.TD().extend(wp_ps), #E.TD(bio.wp_text, colspan='2'), E.TD(bio.eb_text, colspan='2'), valign="top", ), ) Simply, I want to take text with LFs and turn them into Ps.