From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 1 09:59:21 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 1 09:58:46 2006 Subject: [lxml-dev] Updated parser API Message-ID: <44056269.1000406@gkec.informatik.tu-darmstadt.de> Hi, I updated the parser API according to the discussions (and the proposal of Fredrik) that we had in November. It now uses an XMLParser class that simply builds the libxml2 parse options in the constructor. I also added a global function "set_default_parser" that globally sets the default parser (options), or resets them if the supplied parser is None. Although the internal implementation may change later on, I think it is better to have this API in place *now* (i.e. for 0.9), so that we can simply add more features (i.e. keyword arguments) later on without changing the API itself. Since we already discussed this, I applied it directly to the trunk. Note, however, that currently not all parse options are backed by test cases. I added one that tests namespace stripping (in the new file test_parser.py), but considering the fact that most of the functionality is implemented entirely by libxml2, I (lazily) thought it's sufficient to test that the API works in general. Stefan From delza at livingcode.org Wed Mar 1 16:09:22 2006 From: delza at livingcode.org (Dethe Elza) Date: Wed Mar 1 16:09:58 2006 Subject: [lxml-dev] Call for Contribution: lxml 0.9 In-Reply-To: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> Message-ID: <3E6EB389-CB8F-4F15-889C-414D30E2D0AA@livingcode.org> The main feature I'd like to see would be an easy installer that include lxml's dependencies, maybe using easy_install. It's a complex project and the installation is easy to get wrong. Thanks for the work on this. Looking forward to 0.9. --Dethe "the city carries such a cargo of pathos and longing that daily life there vaccinates us against revelation" -- Pain Not Bread, The Rise and Fall of Human Breath From Geraldjohn.M.Manipon at jpl.nasa.gov Wed Mar 1 19:02:31 2006 From: Geraldjohn.M.Manipon at jpl.nasa.gov (Gerald John M. Manipon) Date: Wed Mar 1 19:02:54 2006 Subject: [lxml-dev] extracting namespace prefix map dict Message-ID: <4405E1B7.3080001@jpl.nasa.gov> Hi, Is there a way to extract a namespace prefix map from an etree _Element, i.e.: xmlString= ... and get a dict: nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo", 'sfl': "http://genesis.jpl.nasa.gov/sciflo", 'xsd': "http://www.w3.org/2001/XMLSchema", 'xsi': "http://www.w3.org/2001/XMLSchema-instance"} Currently I'm parsing the xml into a minidom and extracting this info. Any help is greatly appreciated. Thanks, Gerald From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 1 21:18:45 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 1 21:18:12 2006 Subject: [lxml-dev] extracting namespace prefix map dict In-Reply-To: <4405E1B7.3080001@jpl.nasa.gov> References: <4405E1B7.3080001@jpl.nasa.gov> Message-ID: <440601A5.6090703@gkec.informatik.tu-darmstadt.de> Gerald John M. Manipon wrote: > Is there a way to extract a namespace prefix map from > an etree _Element, i.e.: > > xmlString= > > xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo" > xmlns:xsd="http://www.w3.org/2001/XMLSchema" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> > > ... > > > and get a dict: > nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo", > 'sfl': "http://genesis.jpl.nasa.gov/sciflo", > 'xsd': "http://www.w3.org/2001/XMLSchema", > 'xsi': "http://www.w3.org/2001/XMLSchema-instance"} > > Currently I'm parsing the xml into a minidom and extracting > this info. Any help is greatly appreciated. Hi, there isn't an API for that currently. It could be made available, but it actually doesn't fit very well with the intentions of the ElementTree API. ElementTree is not very concerned with prefixes at all, since it deploys James Clark's tag notation ('{namespace}elementname'). Maybe you could tell us what you are actually trying to do with this information? Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 1 21:36:32 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 1 21:35:57 2006 Subject: [lxml-dev] Better Installer (was: Call for Contribution: lxml 0.9) In-Reply-To: <3E6EB389-CB8F-4F15-889C-414D30E2D0AA@livingcode.org> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <3E6EB389-CB8F-4F15-889C-414D30E2D0AA@livingcode.org> Message-ID: <440605D0.3050000@gkec.informatik.tu-darmstadt.de> Dethe Elza wrote: > The main feature I'd like to see would be an easy installer that include > lxml's dependencies, maybe using easy_install. It's a complex project > and the installation is easy to get wrong. Hmm, I'm not quite sure what could be done better here. What you'd have to do for 0.9 is: * install libxml2 and libxslt (which lxml can't do for you) * tar zxf lxml-0.9.tar.gz * cd lxml-0.9 * run "python setup.py install" (or bdist_egg or whatever you run normally) That doesn't sound very error prone to me... But then, that's mainly how it works on Linux. You seem to be on Apple, so I imagine what you're looking for is a readily installable darwin port? Maybe even without compilation? I guess then we'd have to find someone who uses MacOS-X and can provide a port... Stefan From Geraldjohn.M.Manipon at jpl.nasa.gov Wed Mar 1 22:07:28 2006 From: Geraldjohn.M.Manipon at jpl.nasa.gov (Gerald John M. Manipon) Date: Wed Mar 1 22:07:56 2006 Subject: [lxml-dev] extracting namespace prefix map dict In-Reply-To: <440601A5.6090703@gkec.informatik.tu-darmstadt.de> References: <4405E1B7.3080001@jpl.nasa.gov> <440601A5.6090703@gkec.informatik.tu-darmstadt.de> Message-ID: <44060D10.4040800@jpl.nasa.gov> I'm just using it to pass into the xpath() method. Stefan Behnel wrote: > Gerald John M. Manipon wrote: > >>Is there a way to extract a namespace prefix map from >>an etree _Element, i.e.: >> >>xmlString= >> >>> xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo" >> xmlns:xsd="http://www.w3.org/2001/XMLSchema" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> >> >> ... >> >> >>and get a dict: >>nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo", >>'sfl': "http://genesis.jpl.nasa.gov/sciflo", >>'xsd': "http://www.w3.org/2001/XMLSchema", >>'xsi': "http://www.w3.org/2001/XMLSchema-instance"} >> >>Currently I'm parsing the xml into a minidom and extracting >>this info. Any help is greatly appreciated. > > > > Hi, > > there isn't an API for that currently. It could be made available, but it > actually doesn't fit very well with the intentions of the ElementTree API. > ElementTree is not very concerned with prefixes at all, since it deploys James > Clark's tag notation ('{namespace}elementname'). > > Maybe you could tell us what you are actually trying to do with this information? > > Stefan From delza at livingcode.org Thu Mar 2 03:23:38 2006 From: delza at livingcode.org (Dethe Elza) Date: Thu Mar 2 03:24:17 2006 Subject: [lxml-dev] Better Installer (was: Call for Contribution: lxml 0.9) In-Reply-To: <440605D0.3050000@gkec.informatik.tu-darmstadt.de> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <3E6EB389-CB8F-4F15-889C-414D30E2D0AA@livingcode.org> <440605D0.3050000@gkec.informatik.tu-darmstadt.de> Message-ID: <94E66656-3AE4-4B11-BB0D-52BF97579D42@livingcode.org> >> The main feature I'd like to see would be an easy installer that >> include >> lxml's dependencies, maybe using easy_install. It's a complex >> project >> and the installation is easy to get wrong. > > Hmm, I'm not quite sure what could be done better here. What you'd > have to do > for 0.9 is: > > * install libxml2 and libxslt (which lxml can't do for you) Why not? Other python extensions install their dependencies. > * tar zxf lxml-0.9.tar.gz > * cd lxml-0.9 > * run "python setup.py install" (or bdist_egg or whatever you run > normally) > > That doesn't sound very error prone to me... > > But then, that's mainly how it works on Linux. You seem to be on > Apple, so I > imagine what you're looking for is a readily installable darwin > port? Maybe > even without compilation? OS X is my main platform, but I actually encountered trouble installing on Windows, where the steps were: 1. Read install instructions for lxml 2. Follow these to libxml site 3. Find download link on crowded page 4. Find download link to Windows binaries 5. Read install instructions and figure out *which* libraries are needed out of: libxml2, libxslt, openssl, iconv, zlib, xmlsec, and xsldbg) and in what order. Appear to need four libraries: openssl, iconv, libxml2, and libxslt in that order. 6. Download and unzip each set of binaries. 7. Get the exes in my %PATH% 8. Collect the headers in ~/include/ and the DLLs in ~/lib/ and point the config file at them 9. run python setup.py install 10. have it fail 11. Run out of time to futz with it and give up > I guess then we'd have to find someone who uses MacOS-X and can > provide a port... On OS X it isn't nearly such a problem, more of an inconvenience. If necessary I can make an egg at some point for OS X. > > Stefan --Dethe Choosing software is not a neutral act. It must be done consciously; the debate over free and proprietary software can?t be limited to the differences in the applications? features and ergonomics. To choose an operating system, or software, or network architecture is to choose a kind of society. --Lemaire and Decroocq (trans. by Tim Bray) From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 2 07:48:57 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 2 07:48:26 2006 Subject: [lxml-dev] Better Installer under Windows In-Reply-To: <94E66656-3AE4-4B11-BB0D-52BF97579D42@livingcode.org> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <3E6EB389-CB8F-4F15-889C-414D30E2D0AA@livingcode.org> <440605D0.3050000@gkec.informatik.tu-darmstadt.de> <94E66656-3AE4-4B11-BB0D-52BF97579D42@livingcode.org> Message-ID: <44069559.9060800@gkec.informatik.tu-darmstadt.de> Dethe Elza wrote: >>> The main feature I'd like to see would be an easy installer that include >>> lxml's dependencies, maybe using easy_install. It's a complex project >>> and the installation is easy to get wrong. >> >> Hmm, I'm not quite sure what could be done better here. What you'd >> have to do >> for 0.9 is: >> >> * install libxml2 and libxslt (which lxml can't do for you) > > Why not? Other python extensions install their dependencies. It's not a problem as long as it's only about Python dependencies. EasyInstall can do that. It's not a problem if it's only self-contained C extensions for Python. EasyInstall can do that, too. However, libxml2 and libxslt are written in plain C, with their own further dependencies and their installation very much depends on the operating system, its version, the processor architecture, the availability of a C compiler, ... It would be hard work for us to handle all of that in Python. And it's not up to the developers of /lxml/ to provide better ways of installing its dependencies under the various types of systems. If installing libxml2 is a problem, it's a problem with libxml2, not lxml. > OS X is my main platform, but I actually encountered trouble installing > on Windows, where the steps were: > > 1. Read install instructions for lxml > 2. Follow these to libxml site > 3. Find download link on crowded page > 4. Find download link to Windows binaries > 5. Read install instructions and figure out *which* libraries are needed > out of: libxml2, libxslt, openssl, iconv, zlib, xmlsec, and xsldbg) and > in what order. Appear to need four libraries: openssl, iconv, libxml2, > and libxslt in that order. > 6. Download and unzip each set of binaries. > 7. Get the exes in my %PATH% > 8. Collect the headers in ~/include/ and the DLLs in ~/lib/ and point > the config file at them I understand that this is a problem and that it keeps people from using libxml2/libxslt under certain proprietary systems. But you could argue that this is due to the lack of package management in these systems. I actually think I remember having seen that Cygwin comes with a resonable installer that allows you to install various libraries by simply selecting them. If I'm not mistaken, libxml2/xslt should have been amongst them. http://cygwin.com/ > 9. run python setup.py install > 10. have it fail Well, most 'bug' reports about 'having it fail' that we get on the list are actually because of Pyrex rather than lxml. That will change. However, we can't really help you with the way you set up your system before trying to compile lxml. There just isn't much we can do about that. We could only bundle the libraries in a Windows installer. But then, what good is it to have libraries if you include them statically? That would mean having to respond to every security announcement to provide a new installer ourselves - also for every version of Python and so on... That's really a problem with the way installations on Microsoft systems work, not with lxml. Having to provide all dependencies yourself in the single installer binary of your software and overwriting the libraries that were provided by other installers simply isn't the right way to do it. >> I guess then we'd have to find someone who uses MacOS-X and can >> provide a port... > > On OS X it isn't nearly such a problem, more of an inconvenience. If > necessary I can make an egg at some point for OS X. That would be helpful. MacOS *has* package management. Also, MacOS-X libraries tend to be a lot more - hum - static? outdated? - so that installers make more sense there. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 2 07:55:01 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 2 07:54:21 2006 Subject: [lxml-dev] extracting namespace prefix map dict In-Reply-To: <44060D10.4040800@jpl.nasa.gov> References: <4405E1B7.3080001@jpl.nasa.gov> <440601A5.6090703@gkec.informatik.tu-darmstadt.de> <44060D10.4040800@jpl.nasa.gov> Message-ID: <440696C5.4070305@gkec.informatik.tu-darmstadt.de> > Stefan Behnel wrote: >> Gerald John M. Manipon wrote: >> >>> Is there a way to extract a namespace prefix map from >>> an etree _Element, i.e.: >>> >>> xmlString= >>> >>> >> xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo" >>> xmlns:xsd="http://www.w3.org/2001/XMLSchema" >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> >>> >>> ... >>> >>> >>> and get a dict: >>> nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo", >>> 'sfl': "http://genesis.jpl.nasa.gov/sciflo", >>> 'xsd': "http://www.w3.org/2001/XMLSchema", >>> 'xsi': "http://www.w3.org/2001/XMLSchema-instance"} >>> >>> Currently I'm parsing the xml into a minidom and extracting >>> this info. Any help is greatly appreciated. >> >> there isn't an API for that currently. It could be made available, but it >> actually doesn't fit very well with the intentions of the ElementTree >> API. >> ElementTree is not very concerned with prefixes at all, since it >> deploys James >> Clark's tag notation ('{namespace}elementname'). >> >> Maybe you could tell us what you are actually trying to do with this >> information? Gerald John M. Manipon wrote: > I'm just using it to pass into the xpath() method. Then don't do it. You only need to define prefixes that appear in your XPath expressions, where you can use whatever prefix you choose as long as you define it in the provided dictionary. That will get better in lxml 0.9 (and already is in the current SVN of the scoder2 branch). In any case, there is no reason for extracting the prefixes used in the XML. Just use new ones. The namespace URI will make the match between them. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 2 13:55:28 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 2 13:55:50 2006 Subject: [lxml-dev] Clean up of extension function implementation Message-ID: <4406EB40.1090204@gkec.informatik.tu-darmstadt.de> Hi all, I did a lot of cleaning up of my code regarding extension functions and I hope it's now pretty close to 'ready for merging'. You can look at doc/extensions.txt in the scoder2 branch for some examples. One problem, however, remains: the first argument to extension functions, which previously contained the current XPath evaluator. I absolutely cannot see a reason for adding this argument to the call. The only usable thing in the evaluator is the 'evaluate(path)' method, but I wouldn't even bet on it being re-entrant, so I can only hope that no existing code actually uses it. I did not want to break any legacy code by happily changing the argument order of the call, so I just kept that argument in there and added a line in the documentation stating that it should not be used (reserved for future extensions :). The new implementation simply passes None. It looks a bit ugly that way, but, well, we /may/ still succeed in finding a use for it some time after lxml 1.0 ... Have fun, Stefan From paul at zope-europe.org Thu Mar 2 14:25:14 2006 From: paul at zope-europe.org (Paul Everitt) Date: Thu Mar 2 14:25:59 2006 Subject: [lxml-dev] Re: Call for Contribution: lxml 0.9 In-Reply-To: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> Message-ID: Hi Stefan. Thanks for the work on this and sending the heads-up note. Before writing and checking in a test, I want to find out if something is known behavior. I have an XSLT that uses XML data in the stylesheet itself. Essentially: ---- xslt preamble stuff ---- klkdk .....later.... It works with xsltproc but doesn't work with lxml. I wonder if there's something about the resolver that is the issue. Should I file a case for this, or is this known behavior? Also, I'd file a case about not wiring in libxml2's HTMLParser, but I suppose that's not a bug, it's just a missing feature. :^) I *really* need that wired in at some point, but for now, I can just ship some other tool that converts HTML -> well-formed XML. --Paul Stefan Behnel wrote: > Hello, everyone! > > Martijn and I realised that there has not been a release of lxml for quite a > while - despite a lot of important changes to the code base. > > We therefore agreed to prepare the release of lxml 0.9 for > > *Monday, March 20*. > > The remaining time will be used to get the trunk ready for a release. It will > include the current feature set of the trunk plus the support for extension > functions that is currently implemented in the scoder2 branch. The latter > still needs a bit of discussion and merging. Any comments on this are appreciated. > > > For you, this announcement means the following. > > - If you want any features included, make sure you send requests and patches > to the list within the next days or try to fix a date when a patch will be > available. Remember that we may have to discuss patches before applying them. > > - If you feel that you especially like or dislike anything about the new > features and the way they are made available through APIs, feel free to tell > us over the list. > > - If you feel that anything from past discussions has been left out that > should be worth remembering for a new release, speak up now. > > - We know that there are still a few places where documentation would help > users to understand new features. A good way to contribute would therefore be > to look through the documentation in the "doc" directory and see if it is > understandable or if there is anything you might want to add. > > - In the same line, usage examples in the form of doctests would be nice to > have for more API functions. They also belong into the files in the "doc" > directory, which are (or must be) called from the related test cases in > src/lxml/tests. > > Happy coding, > Stefan From paul at zope-europe.org Thu Mar 2 14:26:53 2006 From: paul at zope-europe.org (Paul Everitt) Date: Thu Mar 2 14:32:03 2006 Subject: [lxml-dev] Re: Clean up of extension function implementation In-Reply-To: <4406EB40.1090204@gkec.informatik.tu-darmstadt.de> References: <4406EB40.1090204@gkec.informatik.tu-darmstadt.de> Message-ID: <4406F29D.7080004@zope-europe.org> Stefan Behnel wrote: > Hi all, > > I did a lot of cleaning up of my code regarding extension functions and I hope > it's now pretty close to 'ready for merging'. I'll try to give it a shot today. I was having lots of problems with it a few weeks ago, but was working offline (while traveling) and didn't file any reports, I just reverted to 0.8. My problems weren't with the namespace stuff directly, just scoder2. > You can look at doc/extensions.txt in the scoder2 branch for some examples. > > One problem, however, remains: the first argument to extension functions, > which previously contained the current XPath evaluator. I absolutely cannot > see a reason for adding this argument to the call. The only usable thing in > the evaluator is the 'evaluate(path)' method, but I wouldn't even bet on it > being re-entrant, so I can only hope that no existing code actually uses it. > > I did not want to break any legacy code by happily changing the argument order > of the call, so I just kept that argument in there and added a line in the > documentation stating that it should not be used (reserved for future > extensions :). The new implementation simply passes None. > > It looks a bit ugly that way, but, well, we /may/ still succeed in finding a > use for it some time after lxml 1.0 ... Sorry, I don't have an opinion on this. --Paul From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 2 16:11:14 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 2 16:11:40 2006 Subject: [lxml-dev] document('') stylesheet access in XSLT In-Reply-To: References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> Message-ID: <44070B12.6000809@gkec.informatik.tu-darmstadt.de> Paul Everitt wrote: > Before writing and checking in a test, I want to find out if something > is known behavior. I have an XSLT that uses XML data in the stylesheet > itself. Essentially: > > ---- xslt preamble stuff ---- > > klkdk > > .....later.... > > > It works with xsltproc but doesn't work with lxml. I wonder if there's > something about the resolver that is the issue. > > Should I file a case for this, or is this known behavior? Hmmm, I can reproduce that. This results in an empty root tag 'test': --------------- from lxml import etree xslt = etree.XSLT(etree.XML('''\ ''')) print xslt(etree.XML('')), --------------- Looks like libxslt doesn't know how to access document('') without an XSLT specific entity loader. You can run xsltproc with the '--load-trace' option to see that the entity solver handles that access, too. According to strace, it even reads in the file twice, so we'd either have to provide the XSLT processor with access to the original XSL file (which is impossible if it is generated in memory) or write a replacement to the entity resolver that handles that special case. I personally consider that a bug in libxslt, though. document('') should be a special case /in the library/. > Also, I'd file a case about not wiring in libxml2's HTMLParser, but I > suppose that's not a bug, it's just a missing feature. :^) I *really* > need that wired in at some point, but for now, I can just ship some > other tool that converts HTML -> well-formed XML. Would be nice to have - but it's a little too close to 0.9 to implement the interface code, which also has to be ElementTree compatible to a certain extent. Stefan From paul at zope-europe.org Thu Mar 2 16:37:08 2006 From: paul at zope-europe.org (Paul Everitt) Date: Thu Mar 2 16:38:50 2006 Subject: [lxml-dev] Re: document('') stylesheet access in XSLT In-Reply-To: <44070B12.6000809@gkec.informatik.tu-darmstadt.de> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <44070B12.6000809@gkec.informatik.tu-darmstadt.de> Message-ID: <44071124.7080701@zope-europe.org> Stefan Behnel wrote: > Paul Everitt wrote: >> Before writing and checking in a test, I want to find out if something >> is known behavior. I have an XSLT that uses XML data in the stylesheet >> itself. Essentially: >> >> ---- xslt preamble stuff ---- >> >> klkdk >> >> .....later.... >> >> >> It works with xsltproc but doesn't work with lxml. I wonder if there's >> something about the resolver that is the issue. >> >> Should I file a case for this, or is this known behavior? > > Hmmm, I can reproduce that. This results in an empty root tag 'test': > > --------------- > from lxml import etree > > xslt = etree.XSLT(etree.XML('''\ > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > > > > > > > > ''')) > > print xslt(etree.XML('')), > --------------- > > Looks like libxslt doesn't know how to access document('') without an XSLT > specific entity loader. You can run xsltproc with the '--load-trace' option to > see that the entity solver handles that access, too. According to strace, it > even reads in the file twice, so we'd either have to provide the XSLT > processor with access to the original XSL file (which is impossible if it is > generated in memory) or write a replacement to the entity resolver that > handles that special case. Ok, point made. :^) I'm currently working around this by putting the nodes I want into the input document. However, that makes me pay a penalty on every request, instead of just once when the stylesheet is generated. I can live with it, though. It's already lots faster than other templating approaches. :^) --Paul From bkc at murkworks.com Thu Mar 2 17:21:54 2006 From: bkc at murkworks.com (Brad Clements) Date: Thu Mar 2 17:22:40 2006 Subject: [lxml-dev] Re: document('') stylesheet access in XSLT In-Reply-To: <44071124.7080701@zope-europe.org> References: <44070B12.6000809@gkec.informatik.tu-darmstadt.de> Message-ID: <4406D552.8397.276F7B5C@bkc.murkworks.com> On 2 Mar 2006 at 16:37, Paul Everitt wrote: > Stefan Behnel wrote: > > Paul Everitt wrote: > >> Before writing and checking in a test, I want to find out if > >> something is known behavior. I have an XSLT that uses XML data in > >> the stylesheet itself. Essentially: > >> > >> Sorry to barge in here.. I've been watching lxml for a while, waiting for it to support a custom resolver so I can use it in Paste and Zope, etc. Currently I'm using libxslt with Paste in my "tal2xslt" project. I rely on document('') as a way to inject "system-wide" constants into the .xsl files that are generated from the input TAL. I register a global resolver with libxslt and tag the URI associated with a given document so it can be tied back to the originating request. I use libxml2.readDoc() to associate a custom URI with each document. Each URI I load uses a custom scheme so I know it's "mine". Anyway, in the resolver if the scheme doesn't match my fake scheme, I just hand it back to libxslt and let it resolve itself. Can you do the same in lxml? I poked around http://codespeak.net/svn/lxml/trunk/src/lxml/ but couldn't find the module where you set the resolver. > Ok, point made. :^) I'm currently working around this by putting the > nodes I want into the input document. However, that makes me pay a > penalty on every request, instead of just once when the stylesheet is > generated. > > I can live with it, though. It's already lots faster than other > templating approaches. :^) Yeah, I love it. Edge side includes done via xml/xsl in the client. It's great. -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com AOL-IM or SKYPE: BKClements From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 2 22:16:18 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 2 22:15:29 2006 Subject: [lxml-dev] Re: document('') stylesheet access in XSLT In-Reply-To: <4406D552.8397.276F7B5C@bkc.murkworks.com> References: <44070B12.6000809@gkec.informatik.tu-darmstadt.de> <4406D552.8397.276F7B5C@bkc.murkworks.com> Message-ID: <440760A2.7010006@gkec.informatik.tu-darmstadt.de> Brad Clements wrote: > On 2 Mar 2006 at 16:37, Paul Everitt wrote: > >> Stefan Behnel wrote: >>> Paul Everitt wrote: >>>> Before writing and checking in a test, I want to find out if >>>> something is known behavior. I have an XSLT that uses XML data in >>>> the stylesheet itself. Essentially: > >>>> >>>> > > Sorry to barge in here.. You're very welcome. > I've been watching lxml for a while, waiting for it to support a custom resolver so I > can use it in Paste and Zope, etc. > > Currently I'm using libxslt with Paste in my "tal2xslt" project. I rely on document('') > as a way to inject "system-wide" constants into the .xsl files that are generated > from the input TAL. > > I register a global resolver with libxslt and tag the URI associated with a given > document so it can be tied back to the originating request. > > I use libxml2.readDoc() to associate a custom URI with each document. Each > URI I load uses a custom scheme so I know it's "mine". > > Anyway, in the resolver if the scheme doesn't match my fake scheme, I just hand > it back to libxslt and let it resolve itself. > > Can you do the same in lxml? > > I poked around http://codespeak.net/svn/lxml/trunk/src/lxml/ but couldn't find the > module where you set the resolver. That's because we don't. :) We currently do not care about resolving at all. Everything that works does so because libxml2/xslt handles it itself. Everything that doesn't work - uhm, well, doesn't work. I would absolutely like to see a resolver API in lxml. So, if you're interested in getting it there, you could provide some example code showing how you set it up. Then we could see what would be a good way of extending the current API to support resolvers - especially custom ones. We would especially need to see at what granularity you can set them: at a function-local level, at an object level or at a global level? Stefan From bkc at murkworks.com Fri Mar 3 14:42:27 2006 From: bkc at murkworks.com (Brad Clements) Date: Fri Mar 3 14:43:19 2006 Subject: [lxml-dev] Re: document('') stylesheet access in XSLT In-Reply-To: <440760A2.7010006@gkec.informatik.tu-darmstadt.de> References: <4406D552.8397.276F7B5C@bkc.murkworks.com> Message-ID: <44080173.14541.2C03BAE1@bkc.murkworks.com> On 2 Mar 2006 at 22:16, Stefan Behnel wrote: > > I poked around http://codespeak.net/svn/lxml/trunk/src/lxml/ but > > couldn't find the module where you set the resolver. > > That's because we don't. :) Great, at least I know I'm not blind now. > > We currently do not care about resolving at all. Everything that works > does so because libxml2/xslt handles it itself. Everything that > doesn't work - uhm, well, doesn't work. In that case, I wonder about the original poster's claim that document('') doesn't work. I use xsltproc and xml starlet all the time, I don't think xsltproc sets a resolver at all, but I haven't looked at the source to be certain. Anyway, document('') works in both xsltproc and xml starlet. > ones. We would especially need to see at what granularity you can set > them: at a function-local level, at an object level or at a global > level? The last I knew, libxslt only allowed setting the resolver on a per-process level. Also, I am using the Python level interface, not the C level interface. I'll paste below the current hacky code I use. One note, I'm not using the most recent libxslt. In the version of libxslt I'm using now, if a resolver raises an exception, libxslt treats that to mean "continue with default resolution". apparently the semantics have changed in more recent libxslt versions.. Since I don't want default resolution to occur when an exception happens, I return an empty string in that case. I want to point out that this code isn't really correct in that respect. This code is still in development. _resolver_context = {} _scheme = "memory" _resolver_lock = RLock() def _Resolver(URL,ID,ctxt): try: context, new_uri = extract_uri_and_context(URL) if not context: prefix = URL.split(':')[0] if prefix == _scheme: print "no context for %r" % URL return None resolver = get_resolver(context) if not resolver: print "Could not find resolver for context %r, url %r" % (context, URL) return '' return StringIO.StringIO(resolver(new_uri)) except: print "unexpected exception in resolver" traceback.print_exc() return '' def add_context_to_uri(uri, context): """Mangle the uri, adding a fake scheme and context""" return urlparse.urlunsplit((_scheme, context, uri, '', '')) def extract_uri_and_context(uri): """Un-mangle uri, returns (context, uri) or (None, None) if not our scheme""" parts = urlparse.urlsplit(unquote(uri)) context, new_uri = (None, uri) if parts[0] == _scheme: if parts[1]: context = parts[1] new_uri = parts[2] elif parts[2][:2] == '//': # buggy urlsplit sticks netloc in path context, new_uri = parts[2][2:].split('/',1) new_uri = "/" + new_uri return (context, new_uri) def register_resolver(context, resolver): """add this context to list of resolvers""" _resolver_lock.acquire() try: obj = get_resolver(context) if not obj: _resolver_context[context] = (resolver, 1) else: old_resolver, use_count = obj _resolver_context[context] = (resolver, use_count+1) finally: _resolver_lock.release() def unregister_resolver(context): """remove this context from the list of resolvers""" _resolver_lock.acquire() try: obj = get_resolver(context) if not obj: raise ValueError("resolver context %r not currently registered" % context) old_resolver, use_count = obj use_count -= 1 if use_count > 0: _resolver_context[context] = (old_resolver, use_count) else: del _resolver_context[context] finally: _resolver_lock.release() def get_resolver(context): """Return a resolver for the specified context""" _resolver_lock.acquire() try: obj = _resolver_context.get(context) if not obj: return obj resolver, use_count = obj return resolver finally: _resolver_lock.release() libxml2.setEntityLoader(_Resolver) # this is a process-wide change, -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com AOL-IM or SKYPE: BKClements From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 3 15:28:07 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 3 15:28:01 2006 Subject: [lxml-dev] Re: document('') stylesheet access in XSLT In-Reply-To: <44080173.14541.2C03BAE1@bkc.murkworks.com> References: <4406D552.8397.276F7B5C@bkc.murkworks.com> <44080173.14541.2C03BAE1@bkc.murkworks.com> Message-ID: <44085277.90400@gkec.informatik.tu-darmstadt.de> Brad Clements wrote: > On 2 Mar 2006 at 22:16, Stefan Behnel wrote: >> We currently do not care about resolving at all. Everything that works >> does so because libxml2/xslt handles it itself. Everything that >> doesn't work - uhm, well, doesn't work. > > In that case, I wonder about the original poster's claim that document('') doesn't > work. > > I use xsltproc and xml starlet all the time, I don't think xsltproc sets a resolver at > all, but I haven't looked at the source to be certain. Anyway, document('') works in > both xsltproc and xml starlet. xsltproc does set an own resolver, I checked the source. But the main handling is still done by the libxslt default resolver. I also looked through the libxslt resolver and AFAICT it does not special case document(''). That case is handled by the normal file lookup based on the document base URL (which essentially results in using the base URL unchanged). Hence the double read of the XSL file. There were some places where the code seemed to check a list of in-memory documents, so maybe that would be the place to hook in: provide a parsed representation of the document referenced by its original name or something. But, as usual, the documentation is not very telling. >> ones. We would especially need to see at what granularity you can set >> them: at a function-local level, at an object level or at a global >> level? > > The last I knew, libxslt only allowed setting the resolver on a per-process level. Ok, but that doesn't necessarily keep us from setting it when entering a function and resetting it at the end. Applying an XSLT is basically one function call, so that is a nicely enclosed code block without unpredictable concurrency problems. > Also, I am using the Python level interface, not the C level interface. Both are mostly identical in terms of function calls, so that still helps. Actually, it's even better since lxml is written in Pyrex, so Python code can be copied more or less as is. > I'll paste below the current hacky code I use. One note, I'm not using the most > recent libxslt. In the version of libxslt I'm using now, if a resolver raises an > exception, libxslt treats that to mean "continue with default resolution". > > apparently the semantics have changed in more recent libxslt versions.. Since I > don't want default resolution to occur when an exception happens, I return an > empty string in that case. I want to point out that this code isn't really correct in > that respect. This code is still in development. Thanks for posting it. I'll look through it as soon as I find the time. It's always better to have working code to read and discuss than fancy ideas and no one to implement them. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 3 17:55:01 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 3 17:54:16 2006 Subject: [lxml-dev] document('') stylesheet access in XSLT In-Reply-To: <44071124.7080701@zope-europe.org> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <44070B12.6000809@gkec.informatik.tu-darmstadt.de> <44071124.7080701@zope-europe.org> Message-ID: <440874E5.5030802@gkec.informatik.tu-darmstadt.de> Paul Everitt wrote: > Stefan Behnel wrote: >> from lxml import etree >> >> xslt = etree.XSLT(etree.XML('''\ >> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> >> >> >> >> >> >> >> >> ''')) >> >> print xslt(etree.XML('')), >> --------------- >> >> Looks like libxslt doesn't know how to access document('') without an >> XSLT specific entity loader. Ok, I played with it a bit and found that we can work around this in the case where the XSLT is read in from a file (which should be the majority of cases, I'd say). All we have to do is use the file parser functions from libxml2 in that case, which store the file URL in the document structure. This breaks the case where the document is modified in between (since the changes are not reflected by the file when it is re-read by libxslt), but that is a) a rare case and b) a bug in libxslt, which should recognise the case where the stylesheet itself is referenced. So, I committed the change to the trunk (revision 23950). Please try it to see if it fixes the cases you needed. Stefan From apaku at gmx.de Fri Mar 3 21:38:43 2006 From: apaku at gmx.de (Andreas Pakulat) Date: Fri Mar 3 21:38:39 2006 Subject: [lxml-dev] Updated parser API In-Reply-To: <44056269.1000406@gkec.informatik.tu-darmstadt.de> References: <44056269.1000406@gkec.informatik.tu-darmstadt.de> Message-ID: <20060303203843.GC15128@morpheus> On 01.03.06 09:59:21, Stefan Behnel wrote: > I updated the parser API according to the discussions (and the proposal of > Fredrik) that we had in November. It now uses an XMLParser class that simply > builds the libxml2 parse options in the constructor. I also added a global > function "set_default_parser" that globally sets the default parser (options), > or resets them if the supplied parser is None. Just a short question before I waste hours to try this out: Does this enable me to set "arbitrary" options on the XMLParser, so I could finally test the libxml-enhancement for removal of redundant namespaces (that is in CVS)? See http://bugzilla.gnome.org/show_bug.cgi?id=329347 for details. If the answer is yes, how would I specify the XML_DOM_RECONNS_REMOVEREDUND option? Andreas -- You had some happiness once, but your parents moved away, and you had to leave it behind. From behnel_ml at gkec.informatik.tu-darmstadt.de Sat Mar 4 12:51:45 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sat Mar 4 12:51:10 2006 Subject: [lxml-dev] Updated parser API In-Reply-To: <20060303203843.GC15128@morpheus> References: <44056269.1000406@gkec.informatik.tu-darmstadt.de> <20060303203843.GC15128@morpheus> Message-ID: <44097F51.80007@gkec.informatik.tu-darmstadt.de> Andreas Pakulat wrote: > On 01.03.06 09:59:21, Stefan Behnel wrote: >> I updated the parser API according to the discussions (and the proposal of >> Fredrik) that we had in November. It now uses an XMLParser class that simply >> builds the libxml2 parse options in the constructor. I also added a global >> function "set_default_parser" that globally sets the default parser (options), >> or resets them if the supplied parser is None. > > Just a short question before I waste hours to try this out: > > Does this enable me to set "arbitrary" options on the XMLParser, so I > could finally test the libxml-enhancement for removal of redundant > namespaces (that is in CVS)? See > http://bugzilla.gnome.org/show_bug.cgi?id=329347 for details. Yes and no (my language has the wonderful word "jein" for that and I'd love to use it in english, too). Yes: It allows you to set options on the parser. No: The options must be available at compile time and mapped to keyword arguments by hand. Remember, we're talking about options of a C API here. > If the answer is yes, how would I specify the > XML_DOM_RECONNS_REMOVEREDUND option? This is not a parser option. I think it would rather be an option for a serializer, right? Maybe not even that since it modifies the state of the XML structure... So, no, there isn't currently an API for that. Maybe the best way to integrate the feature would be a method in ElementTree that explicitly traverses the tree to strip redundant declarations and possibly other relicts from copying elements. Something like class _ElementTree: ... def cleanup(self): # call libxml2 cleanup functions on self._context_node Since this is an experimental feature, it will not be supported in lxml 0.9 anyway. But if you could come up with a patch that implements it, it would allow us to integrate it later on and also help others who have the same problem and can afford to use a CVS version of libxml2. Stefan From apaku at gmx.de Sat Mar 4 13:20:30 2006 From: apaku at gmx.de (Andreas Pakulat) Date: Sat Mar 4 13:20:29 2006 Subject: [lxml-dev] Updated parser API In-Reply-To: <44097F51.80007@gkec.informatik.tu-darmstadt.de> References: <44056269.1000406@gkec.informatik.tu-darmstadt.de> <20060303203843.GC15128@morpheus> <44097F51.80007@gkec.informatik.tu-darmstadt.de> Message-ID: <20060304122030.GF13703@morpheus> On 04.03.06 12:51:45, Stefan Behnel wrote: > Andreas Pakulat wrote: > > On 01.03.06 09:59:21, Stefan Behnel wrote: > >> I updated the parser API according to the discussions (and the proposal of > >> Fredrik) that we had in November. It now uses an XMLParser class that simply > >> builds the libxml2 parse options in the constructor. I also added a global > >> function "set_default_parser" that globally sets the default parser (options), > >> or resets them if the supplied parser is None. > > > > Just a short question before I waste hours to try this out: > > > > Does this enable me to set "arbitrary" options on the XMLParser, so I > > could finally test the libxml-enhancement for removal of redundant > > namespaces (that is in CVS)? See > > http://bugzilla.gnome.org/show_bug.cgi?id=329347 for details. > > Yes and no (my language has the wonderful word "jein" for that and I'd love to > use it in english, too). :-) Tell me... > No: The options must be available at compile time and mapped to keyword > arguments by hand. Remember, we're talking about options of a C API here. Ok. As you've probably guessed from the question: I've no deep knowledge about how lxml "wraps" libxml or how libxml itself works/is used.. > > If the answer is yes, how would I specify the > > XML_DOM_RECONNS_REMOVEREDUND option? > > This is not a parser option. I think it would rather be an option for a > serializer, right? Maybe not even that since it modifies the state of the XML > structure... That may be, actually I have no idea. The implementation is in tree.c for libxml2. > Since this is an experimental feature, it will not be supported in lxml 0.9 > anyway. But if you could come up with a patch that implements it, it would > allow us to integrate it later on and also help others who have the same > problem and can afford to use a CVS version of libxml2. I'd really love to, but I don't have the time to do that any time soon, especially as I first need to understand how libxml2 works and then how lxml works and uses libxml2. Maybe I can start something in May..... Andreas -- Tomorrow will be cancelled due to lack of interest. From paul at zope-europe.org Sat Mar 4 16:50:05 2006 From: paul at zope-europe.org (Paul Everitt) Date: Sat Mar 4 16:50:36 2006 Subject: [lxml-dev] Re: document('') stylesheet access in XSLT In-Reply-To: <440874E5.5030802@gkec.informatik.tu-darmstadt.de> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <44070B12.6000809@gkec.informatik.tu-darmstadt.de> <44071124.7080701@zope-europe.org> <440874E5.5030802@gkec.informatik.tu-darmstadt.de> Message-ID: <4409B72D.6020802@zope-europe.org> I'm stuck on compiling the trunk (the swig_sources error) for OS X. I'll boot a machine into Linux and give it a try. Sorry for the delay in replying! --Paul Stefan Behnel wrote: > Paul Everitt wrote: >> Stefan Behnel wrote: >>> from lxml import etree >>> >>> xslt = etree.XSLT(etree.XML('''\ >>> >> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> >>> >>> >>> >>> >>> >>> >>> >>> ''')) >>> >>> print xslt(etree.XML('')), >>> --------------- >>> >>> Looks like libxslt doesn't know how to access document('') without an >>> XSLT specific entity loader. > > Ok, I played with it a bit and found that we can work around this in the case > where the XSLT is read in from a file (which should be the majority of cases, > I'd say). All we have to do is use the file parser functions from libxml2 in > that case, which store the file URL in the document structure. > > This breaks the case where the document is modified in between (since the > changes are not reflected by the file when it is re-read by libxslt), but that > is a) a rare case and b) a bug in libxslt, which should recognise the case > where the stylesheet itself is referenced. > > So, I committed the change to the trunk (revision 23950). > > Please try it to see if it fixes the cases you needed. > > Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Mar 5 09:24:21 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sun Mar 5 09:23:37 2006 Subject: [lxml-dev] swig_sources problem with Pyrex In-Reply-To: <4409DFD0.7030804@zope-europe.org> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <44070B12.6000809@gkec.informatik.tu-darmstadt.de> <44071124.7080701@zope-europe.org> <440874E5.5030802@gkec.informatik.tu-darmstadt.de> <4409B72D.6020802@zope-europe.org> <4409C98E.3070707@gkec.informatik.tu-darmstadt.de> <4409DFD0.7030804@zope-europe.org> Message-ID: <440AA035.20905@gkec.informatik.tu-darmstadt.de> one for the archives ... Paul Everitt wrote: > Stefan Behnel wrote: >> Paul Everitt wrote: >>> I'm stuck on compiling the trunk (the swig_sources error) for OS X. >> >> Check if you have this line in Pyrex/Distutils/build_ext.py (line ~35) >> >> def swig_sources (self, sources, extension=None): > > I have: > > def swig_sources (self, sources): > if not self.extensions: > return Then change it. You may just as well use def swig_sources(self, sources, *otherargs): It doesn't matter. It's just there to allow either two or three arguments. Stefan From paul at zope-europe.org Sun Mar 5 09:49:44 2006 From: paul at zope-europe.org (Paul Everitt) Date: Sun Mar 5 09:50:28 2006 Subject: [lxml-dev] Re: swig_sources problem with Pyrex In-Reply-To: <440AA035.20905@gkec.informatik.tu-darmstadt.de> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <44070B12.6000809@gkec.informatik.tu-darmstadt.de> <44071124.7080701@zope-europe.org> <440874E5.5030802@gkec.informatik.tu-darmstadt.de> <4409B72D.6020802@zope-europe.org> <4409C98E.3070707@gkec.informatik.tu-darmstadt.de> <4409DFD0.7030804@zope-europe.org> <440AA035.20905@gkec.informatik.tu-darmstadt.de> Message-ID: <440AA628.2050702@zope-europe.org> Stefan Behnel wrote: > one for the archives ... > > Paul Everitt wrote: >> Stefan Behnel wrote: >>> Paul Everitt wrote: >>>> I'm stuck on compiling the trunk (the swig_sources error) for OS X. >>> Check if you have this line in Pyrex/Distutils/build_ext.py (line ~35) >>> >>> def swig_sources (self, sources, extension=None): >> I have: >> >> def swig_sources (self, sources): >> if not self.extensions: >> return > > Then change it. You may just as well use > > def swig_sources(self, sources, *otherargs): > > It doesn't matter. It's just there to allow either two or three arguments. Hooray, that worked! Thanks, Stefan. I have the trunk compiled and installed now. And, I can happily report a test case for my malloc problems and segfault. :^) I'm on OS X, Python 2.4. The runpipeline.py in this SVN directory: http://codespeak.net/svn/z3/deliverance/trunk/lib/ ...works fine with lxml 0.8 but fails quite badly with the trunk. Specifically, in line 120 of: http://codespeak.net/svn/z3/deliverance/trunk/lib/runpipeline.py ...I have: resultdoc = processor.apply(contentdoc) I also noted, just by trying to limit the problem to a smaller test case, that applying a stylesheet repeatedly leads to an unusual problem. As shown below, all is fine, unless you re-use the same variable name for the transformation output: >>> from lxml import etree >>> xmldoc = etree.ElementTree(file="../contentdoc.xml") >>> xsldoc = etree.ElementTree(file="compiledtheme.xsl") >>> style = etree.XSLT(xsldoc) >>> result = style.apply(xmldoc) >>> xmldoc2 = etree.ElementTree(file="../contentdoc.xml") >>> result2 = style.apply(xmldoc2) >>> result3 = style.apply(xmldoc2) >>> result3 = style.apply(xmldoc2) python(20949) malloc: *** Deallocation of a pointer not malloced: 0x1; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug python(20949) malloc: *** Deallocation of a pointer not malloced: 0x128b60; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug python(20949) malloc: *** Deallocation of a pointer not malloced: 0x3bc0e0; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug python(20949) malloc: *** Deallocation of a pointer not malloced: 0x390e40; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug python(20949) malloc: *** error for object 0x390de0: double free python(20949) malloc: *** set a breakpoint in szone_error to debug --Paul From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Mar 5 11:15:30 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sun Mar 5 11:15:01 2006 Subject: [lxml-dev] error on trunk when applying stylesheets and copying elements In-Reply-To: <440AA628.2050702@zope-europe.org> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <44070B12.6000809@gkec.informatik.tu-darmstadt.de> <44071124.7080701@zope-europe.org> <440874E5.5030802@gkec.informatik.tu-darmstadt.de> <4409B72D.6020802@zope-europe.org> <4409C98E.3070707@gkec.informatik.tu-darmstadt.de> <4409DFD0.7030804@zope-europe.org> <440AA035.20905@gkec.informatik.tu-darmstadt.de> <440AA628.2050702@zope-europe.org> Message-ID: <440ABA42.8000401@gkec.informatik.tu-darmstadt.de> Paul Everitt wrote: > I can happily report a test case for my malloc problems and > segfault. :^) I'm on OS X, Python 2.4. The runpipeline.py in this SVN > directory: > > http://codespeak.net/svn/z3/deliverance/trunk/lib/ > > ...works fine with lxml 0.8 but fails quite badly with the trunk. > Specifically, in line 120 of: > > http://codespeak.net/svn/z3/deliverance/trunk/lib/runpipeline.py > > ...I have: > > resultdoc = processor.apply(contentdoc) Sorry, I can't reproduce that. I seem to be missing some files, so I can't run the script completely. > I also noted, just by trying to limit the problem to a smaller test > case, that applying a stylesheet repeatedly leads to an unusual problem. > As shown below, all is fine, unless you re-use the same variable name > for the transformation output: It isn't that easy. Simply reusing the variable doesn't trigger a problem on my side. It seems to also depend on the stylesheet you use. Anyway, when I run your runpipeline.py through valgrind (as far as it runs), it brings up a problem in the changeDocumentBelow function, that calls the namespace reconsiliation functions of libxml2. You do an element.append somewhere and that seems to trigger it. I think it's this line: themeroot.append(copy.deepcopy(themedoc.getroot())) Maybe you can do some more testing, I'll also see what I can come up with. If you want to run valgrind yourself, use the command line from doc/valgrind.txt Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Mar 5 13:00:49 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sun Mar 5 13:00:01 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> Message-ID: <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> Andreas Pakulat wrote: > regarding the remove-redundant-namespaces issue there are news: > > kbuchcik implemented the xmlDOMWrapReconcileNamespaces in tree.c of > libxml2 so it should remove redundant NS decl. However neither do I have > any experience with libxml2 nor do I have the time to dig into it so > that I can build a test program for this. > > Thus I ask you guys here, who surely are libxml2 experts, if you could > help me out here. Either some "hack" for lxml that allows me to test > this or a small programm that takes an xml file and applies this > function to it's DOM tree (and outputs the result) would be really > great. Hi, Here is a trivial patch that simply calls the function after having copied an element between documents. I think this shouldn't do any harm since the new CVS options will be ignored by older libxml2 versions. Could you please apply it to the current lxml SVN version and test it against the libxml2 CVS version to see if it helps with the redundant namespace problems? Stefan Index: src/lxml/etree.pyx =================================================================== --- src/lxml/etree.pyx (Revision 23981) +++ src/lxml/etree.pyx (Arbeitskopie) @@ -1369,7 +1369,8 @@ """ changeDocumentBelowHelper(node._c_node, doc) tree.xmlReconciliateNs(doc._c_doc, node._c_node) - + tree.xmlDOMWrapReconcileNamespaces(NULL, node._c_node, 1) + cdef void changeDocumentBelowHelper(xmlNode* c_node, _Document doc): cdef ProxyRef* ref cdef xmlNode* c_current Index: src/lxml/tree.pxd =================================================================== --- src/lxml/tree.pxd (Revision 23981) +++ src/lxml/tree.pxd (Arbeitskopie) @@ -154,6 +154,8 @@ cdef xmlDoc* xmlCopyDoc(xmlDoc* doc, int recursive) cdef xmlNode* xmlCopyNode(xmlNode* node, int extended) cdef int xmlReconciliateNs(xmlDoc* doc, xmlNode* tree) + cdef int xmlDOMWrapReconcileNamespaces(void* ctxt, xmlNode* tree, + int options) cdef xmlBuffer* xmlBufferCreate() cdef char* xmlBufferContent(xmlBuffer* buf) From apaku at gmx.de Sun Mar 5 21:05:27 2006 From: apaku at gmx.de (Andreas Pakulat) Date: Sun Mar 5 21:05:30 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> Message-ID: <20060305200527.GA25655@morpheus> On 05.03.06 13:00:49, Stefan Behnel wrote: > Andreas Pakulat wrote: > > regarding the remove-redundant-namespaces issue there are news: > > > > kbuchcik implemented the xmlDOMWrapReconcileNamespaces in tree.c of > > libxml2 so it should remove redundant NS decl. However neither do I have > > any experience with libxml2 nor do I have the time to dig into it so > > that I can build a test program for this. > > > > Thus I ask you guys here, who surely are libxml2 experts, if you could > > help me out here. Either some "hack" for lxml that allows me to test > > this or a small programm that takes an xml file and applies this > > function to it's DOM tree (and outputs the result) would be really > > great. > > Hi, > > Here is a trivial patch that simply calls the function after having copied an > element between documents. If I understand that correctly it should also work if I create a new Element (with a namespace) and insert it as child right? If that is correct, than this doesn't help. I still get a an extra ns declaration: >>> print etree.tostring(tree) [25296 refs] >>> tree.append(etree.Element("{test}sub1")) [25296 refs] >>> print etree.tostring(tree) [25296 refs] >>> tree.append(etree.Element("{test}sub2")) [25296 refs] >>> print etree.tostring(tree) BTW: Stefan, the setup.py was correct in using xslt-config to get the compiling parameters which of course is part of libxslt, which I first forgot (and that's also why I got the missing function when importing). Andreas -- You will be misunderstood by everyone. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060305/ea3482ae/attachment.pgp From behnel_ml at gkec.informatik.tu-darmstadt.de Sun Mar 5 21:46:33 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Sun Mar 5 21:45:25 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <20060305200527.GA25655@morpheus> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> Message-ID: <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> Andreas Pakulat wrote: > On 05.03.06 13:00:49, Stefan Behnel wrote: >> Here is a trivial patch that simply calls the function after having copied an >> element between documents. > > If I understand that correctly it should also work if I create a new > Element (with a namespace) and insert it as child right? > > If that is correct, than this doesn't help. I still get a an extra ns > declaration: > >>>> print etree.tostring(tree) > > [25296 refs] >>>> tree.append(etree.Element("{test}sub1")) > [25296 refs] >>>> print etree.tostring(tree) > > [25296 refs] >>>> tree.append(etree.Element("{test}sub2")) > [25296 refs] >>>> print etree.tostring(tree) > If I'm not mistaken, this is the expected behaviour from my patch. The problem is that it only fixes the tree of the element itself, not the entire tree. If you added the tree itself to a new tree, it should fix the current douplication of namespaces that you saw. I know this is not quite what was intended, but could you check that this happens? That would tell us that the libxml2 function works so far. We'd then have to fix a way for calling it at the right place... Stefan From apaku at gmx.de Sun Mar 5 23:40:00 2006 From: apaku at gmx.de (Andreas Pakulat) Date: Sun Mar 5 23:39:54 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> Message-ID: <20060305224000.GD1461@morpheus> On 05.03.06 21:46:33, Stefan Behnel wrote: > > Andreas Pakulat wrote: > > On 05.03.06 13:00:49, Stefan Behnel wrote: > >> Here is a trivial patch that simply calls the function after having copied an > >> element between documents. > > > > If I understand that correctly it should also work if I create a new > > Element (with a namespace) and insert it as child right? > > > > If that is correct, than this doesn't help. I still get a an extra ns > > declaration: > > > >>>> print etree.tostring(tree) > > > > [25296 refs] > >>>> tree.append(etree.Element("{test}sub1")) > > [25296 refs] > >>>> print etree.tostring(tree) > > > > [25296 refs] > >>>> tree.append(etree.Element("{test}sub2")) > > [25296 refs] > >>>> print etree.tostring(tree) > > > > If I'm not mistaken, this is the expected behaviour from my patch. The problem > is that it only fixes the tree of the element itself, not the entire tree. If > you added the tree itself to a new tree, it should fix the current > douplication of namespaces that you saw. So the following should not happen, if I understand you correctly? >>> from lxml.etree import * [25180 refs] >>> doc = fromstring("") [25226 refs] >>> doc.append("{test}sub") Traceback (most recent call last): File "", line 1, in ? File "etree.pyx", line 397, in etree._Element.append TypeError: Argument 'element' has incorrect type (expected etree._Element, got str) [25278 refs] >>> doc.append(Element("{test}sub")) [25278 refs] >>> tostring(doc) '' [25280 refs] >>> doc2 = fromstring("
") [25284 refs] >>> doc2.append(doc) [25284 refs] >>> tostring(doc2) '
' [25284 refs] However this works: >>> doc = fromstring("") [25285 refs] >>> doc.append(Element("{test}sub")) [25285 refs] >>> tostring(doc) '' [25285 refs] So at least something works (my system lxml doesn't show this behaviour). However I think this is the normal ns-cleanup working and it doesn't fix the bug I reported with libxml... Andreas -- You have an unusual equipment for success. Be sure to use it properly. From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Mar 6 07:51:44 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon Mar 6 07:50:52 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <20060305224000.GD1461@morpheus> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> <20060305224000.GD1461@morpheus> Message-ID: <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> Andreas Pakulat wrote: > So the following should not happen, if I understand you correctly? > >>>> from lxml.etree import * > [25180 refs] >>>> doc = fromstring("") > [25226 refs] >>>> doc.append("{test}sub") > Traceback (most recent call last): > File "", line 1, in ? > File "etree.pyx", line 397, in etree._Element.append > TypeError: Argument 'element' has incorrect type (expected etree._Element, got str) > [25278 refs] >>>> doc.append(Element("{test}sub")) > [25278 refs] >>>> tostring(doc) > '' > [25280 refs] >>>> doc2 = fromstring("
") > [25284 refs] >>>> doc2.append(doc) > [25284 refs] >>>> tostring(doc2) > '
' Correct, that should not happen, as my patch calls the DOMWrap function on "elem" at the before last line. So I would assume the modified libxml2 function doesn't solve the problem. > However this works: > >>>> doc = fromstring("") > [25285 refs] >>>> doc.append(Element("{test}sub")) > [25285 refs] >>>> tostring(doc) > '' > [25285 refs] > > So at least something works (my system lxml doesn't show this > behaviour). However I think this is the normal ns-cleanup working and it > doesn't fix the bug I reported with libxml... Right, the Element() call should create an ns0 prefix, which is then merged with the existing one. So that should just work as before. Stefan From K.Buchcik at 4commerce.de Mon Mar 6 14:06:37 2006 From: K.Buchcik at 4commerce.de (Kasimier Buchcik) Date: Mon Mar 6 14:11:21 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> <20060305224000.GD1461@morpheus> <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> Message-ID: <1141650397.1318.19.camel@librax> Hi, On Mon, 2006-03-06 at 07:51 +0100, Stefan Behnel wrote: > Andreas Pakulat wrote: [...] > > '
' > > Correct, that should not happen, as my patch calls the DOMWrap function on > "elem" at the before last line. So I would assume the modified libxml2 > function doesn't solve the problem. > > > > However this works: > > > >>>> doc = fromstring("") > > [25285 refs] > >>>> doc.append(Element("{test}sub")) > > [25285 refs] > >>>> tostring(doc) > > '' > > [25285 refs] > > > > So at least something works (my system lxml doesn't show this > > behaviour). However I think this is the normal ns-cleanup working and it > > doesn't fix the bug I reported with libxml... > > Right, the Element() call should create an ns0 prefix, which is then merged > with the existing one. So that should just work as before. The function xmlDOMWrapReconcileNamespaces() does not try to eliminate namespace declarations for different namespace prefixes. This is due to QNames in attribute/element content. QNames need a corresponding ns-prefix to be in scope; thus Libxml2 tries to avoid automatic renaming of prefixes. Example: y:myQNameValue An elimination of the ns-decl with the "y" prefix would break the QName. So if lxml does somehow create distinct ns-prefixes (I'm not familiar with lxml's mechanism here), then the current elimination mechanism won't be usefull. Regards, Kasimier From apaku at gmx.de Mon Mar 6 14:44:57 2006 From: apaku at gmx.de (Andreas Pakulat) Date: Mon Mar 6 14:44:54 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <1141650397.1318.19.camel@librax> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> <20060305224000.GD1461@morpheus> <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> <1141650397.1318.19.camel@librax> Message-ID: <20060306134457.GA24288@morpheus> On 06.03.06 14:06:37, Kasimier Buchcik wrote: > The function xmlDOMWrapReconcileNamespaces() does not try to eliminate > namespace declarations for different namespace prefixes. But that's exactly what the libxml bugreport is about. > This is due to QNames in attribute/element content. QNames need a > corresponding ns-prefix to be in scope; thus Libxml2 tries to avoid > automatic renaming of prefixes. Now I'm not too familiar with the specs, but does ":" in element content need escaping? If not, then how can you distinguish a string content containing ":" at some point from a QName as element content, if you don't have an XML Schema at hand that tells you? > Example: > > > y:myQNameValue > For my personal use-case it would be sufficient if the bar element could take the prefix from foo and you leave the extra ns-decl in it so the QName is still in scope. > An elimination of the ns-decl with the "y" prefix would break the > QName. You could change it's prefix too, however you probably need to do that on the whole subtree of bar right? Andreas -- You will overcome the attacks of jealous associates. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060306/3b5cdc8c/attachment-0001.pgp From K.Buchcik at 4commerce.de Mon Mar 6 15:54:01 2006 From: K.Buchcik at 4commerce.de (Kasimier Buchcik) Date: Mon Mar 6 15:59:32 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <20060306134457.GA24288@morpheus> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> <20060305224000.GD1461@morpheus> <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> <1141650397.1318.19.camel@librax> <20060306134457.GA24288@morpheus> Message-ID: <1141656841.1318.35.camel@librax> Hi, On Mon, 2006-03-06 at 14:44 +0100, Andreas Pakulat wrote: > On 06.03.06 14:06:37, Kasimier Buchcik wrote: > > The function xmlDOMWrapReconcileNamespaces() does not try to eliminate > > namespace declarations for different namespace prefixes. > > But that's exactly what the libxml bugreport is about. Then I'm not eager to implement this. But maybe someone else will enhance the function to do what you want. > > This is due to QNames in attribute/element content. QNames need a > > corresponding ns-prefix to be in scope; thus Libxml2 tries to avoid > > automatic renaming of prefixes. > > Now I'm not too familiar with the specs, but does ":" in element content > need escaping? If not, then how can you distinguish a string content > containing ":" at some point from a QName as element content, if you > don't have an XML Schema at hand that tells you? This is exactly the problem: the tree modification functions do not know where you intended to use QNames, so currently the only robust way to keep the correct prefix for a QName in scope, is to avoid modifiying prefixes of ns-declarations by QName-in-text-content ignorant mechanisms. > > Example: > > > > > > y:myQNameValue > > > > For my personal use-case it would be sufficient if the bar element could > take the prefix from foo and you leave the extra ns-decl in it so the > QName is still in scope. Hmm, what you describe here is not an elimination of redundant ns-declarations. > > An elimination of the ns-decl with the "y" prefix would break the > > QName. > > You could change it's prefix too, however you probably need to do that > on the whole subtree of bar right? As you already correctly observed, we cannot change the prefix of the QName, since the Libxml2 functions do not know where you intended to use QNames and where not. Regards, Kasimier From apaku at gmx.de Mon Mar 6 16:11:58 2006 From: apaku at gmx.de (Andreas Pakulat) Date: Mon Mar 6 19:23:44 2006 Subject: [lxml-dev] Request for help on testing new libxml2 feature In-Reply-To: <1141656841.1318.35.camel@librax> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> <20060305224000.GD1461@morpheus> <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> <1141650397.1318.19.camel@librax> <20060306134457.GA24288@morpheus> <1141656841.1318.35.camel@librax> Message-ID: <20060306151158.GB2700@morpheus> On 06.03.06 15:54:01, Kasimier Buchcik wrote: > On Mon, 2006-03-06 at 14:44 +0100, Andreas Pakulat wrote: > > On 06.03.06 14:06:37, Kasimier Buchcik wrote: > > > The function xmlDOMWrapReconcileNamespaces() does not try to eliminate > > > namespace declarations for different namespace prefixes. > > > > But that's exactly what the libxml bugreport is about. > > Then I'm not eager to implement this. But maybe someone else will > enhance the function to do what you want. :-( > > > This is due to QNames in attribute/element content. QNames need a > > > corresponding ns-prefix to be in scope; thus Libxml2 tries to avoid > > > automatic renaming of prefixes. > > > > Now I'm not too familiar with the specs, but does ":" in element content > > need escaping? If not, then how can you distinguish a string content > > containing ":" at some point from a QName as element content, if you > > don't have an XML Schema at hand that tells you? > > This is exactly the problem: the tree modification functions do not know > where you intended to use QNames, so currently the only robust way to > keep the correct prefix for a QName in scope, is to avoid modifiying > prefixes of ns-declarations by QName-in-text-content ignorant > mechanisms. I guess these modification function cannot use a xml schema document that is references by the xml document? If they could, you could say: All element content that is not an element itself is a string, which would be OK with the XML spec, AFAIK. This way you would either know (from the schema) that the content is a QName (or can contain one) or treat it as simple text. > > > Example: > > > > > > > > > y:myQNameValue > > > > > > > For my personal use-case it would be sufficient if the bar element could > > take the prefix from foo and you leave the extra ns-decl in it so the > > QName is still in scope. > > Hmm, what you describe here is not an elimination of redundant > ns-declarations. Right, as I said this is just my usecase, where a document like is turned into something like the following, if I insert new Elements not using SubElement class, but the Element one: content And I'd like to avoid this extra namespace declaration. Also I'm going to add new elements very often and thus the xml document is only machine-readable afterwards, because it's cluttered with namespaces. Andreas -- You will pass away very quickly. From tseaver at palladion.com Mon Mar 6 22:12:47 2006 From: tseaver at palladion.com (Tres Seaver) Date: Mon Mar 6 22:13:10 2006 Subject: [lxml-dev] Re: Request for help on testing new libxml2 feature In-Reply-To: <20060306151158.GB2700@morpheus> References: <20060201195525.GA6040@morpheus.apaku.dnsalias.org> <440AD2F1.1050502@gkec.informatik.tu-darmstadt.de> <20060305200527.GA25655@morpheus> <440B4E29.8070906@gkec.informatik.tu-darmstadt.de> <20060305224000.GD1461@morpheus> <440BDC00.4010908@gkec.informatik.tu-darmstadt.de> <1141650397.1318.19.camel@librax> <20060306134457.GA24288@morpheus> <1141656841.1318.35.camel@librax> <20060306151158.GB2700@morpheus> Message-ID: <440CA5CF.4070004@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andreas Pakulat wrote: > On 06.03.06 15:54:01, Kasimier Buchcik wrote: > >>>>This is due to QNames in attribute/element content. QNames need a >>>>corresponding ns-prefix to be in scope; thus Libxml2 tries to avoid >>>>automatic renaming of prefixes. >>> >>>Now I'm not too familiar with the specs, but does ":" in element content >>>need escaping? If not, then how can you distinguish a string content >>>containing ":" at some point from a QName as element content, if you >>>don't have an XML Schema at hand that tells you? >> >>This is exactly the problem: the tree modification functions do not know >>where you intended to use QNames, so currently the only robust way to >>keep the correct prefix for a QName in scope, is to avoid modifiying >>prefixes of ns-declarations by QName-in-text-content ignorant >>mechanisms. > > > I guess these modification function cannot use a xml schema document > that is references by the xml document? If they could, you could say: > All element content that is not an element itself is a string, which > would be OK with the XML spec, AFAIK. This way you would either know > (from the schema) that the content is a QName (or can contain one) or > treat it as simple text. Frankly, I think QNames in element text / attribute values are such a rare edge case that they could be neglected; making the namespace-compacting stuff an option leaves them "safe", while still allowing the dominant case to clean up nicely. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEDKXP+gerLs4ltQ4RAibQAKC+ZdVGHbHvgFuWm00MNGNelWGgCgCgrqkd utSo2Umwp+f90jH4GzixUlE= =opUL -----END PGP SIGNATURE----- From bkc at murkworks.com Mon Mar 6 22:47:23 2006 From: bkc at murkworks.com (Brad Clements) Date: Mon Mar 6 22:47:45 2006 Subject: [lxml-dev] OT: document('') in xslt Message-ID: <440C679B.31813.3D32BD0F@bkc.murkworks.com> For those who use document('') in stylesheets, especially those xsl files sent to browsers. Using document('') in an xslt file causes Firefox and Mozilla to lock-up. This appears to finally be fixed in today's build: https://bugzilla.mozilla.org/show_bug.cgi?id=205778 What |Removed |Added -------------------------------------------------------------------------- -- Keywords|fixed1.8.0.2 |verified1.8.0.2 ------- Comment #51 from walkerrunner@yahoo.com 2006-03-06 13:33 PST ------- Verified on Firefox 1.5.0.2 buildsfrom 20060306 on Windows, Mac and Linux -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com AOL-IM or SKYPE: BKClements -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20060306/1458e483/attachment.htm From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Mar 6 22:49:12 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon Mar 6 22:48:25 2006 Subject: [lxml-dev] Speed-up for moving elements inside the same document Message-ID: <440CAE58.1060801@gkec.informatik.tu-darmstadt.de> Hi, I decided to dislike the idea of deep-traversing a tree (changeDocumentBelow) whenever its root element is moved to a different parent. So I tried to come up with some cases where this is not necessary. One such case is this: ----------------------- from lxml.etree import Element, SubElement a = Element('{a}test') # create ten children with a child each which has another child map(SubElement, map(SubElement, map(SubElement, [a]*10, ['{ns%s}e'%i for i in range(10)]), ['{x}c1']*10), ['{y}c2']*10) for i in range(100000): el = a[0] a[-1] = el ----------------------- Here, we basically change the order of the elements below a single parent. This case gets about 20% faster on my machine if we skip the traversal in changeDocumentBelow. It is obviously not necessary, since the document reference is not changed anyway. If the tree structure below the moved element is bigger, the gain also increases. I added a third argument to the function that switches recursion on or off. It is set to False by the calling functions if the moved node and its new parent are in the same document. If this is the case, it means that all their children must also be in the same document (by induction). I tried some other cases, but most of them seem to induce infinite loops in the libxml2 calls - I'm not quite sure why. Anyway, this one is a simpler one and works as expected. I would like to apply the attached patch, so, if others are interested, I'd be glad to get some feedback if it changes something in their real-world use cases... Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: avoid-recursion.patch Type: text/x-patch Size: 3866 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060306/c406978b/avoid-recursion.bin From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 7 08:22:15 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 7 08:21:25 2006 Subject: [lxml-dev] OT: document('') in xslt In-Reply-To: <440C679B.31813.3D32BD0F@bkc.murkworks.com> References: <440C679B.31813.3D32BD0F@bkc.murkworks.com> Message-ID: <440D34A7.6020709@gkec.informatik.tu-darmstadt.de> Brad Clements wrote: > For those who use document('') in stylesheets, especially those xsl > files sent to browsers. Using document('') in an xslt file causes > Firefox and Mozilla to lock-up. > > This appears to finally be fixed in today's build: > > https://bugzilla.mozilla.org/show_bug.cgi?id=205778 It's pretty impressive how this bug dates back to 2003 ... And the funny thing: the problem is that they are using the same mechanism for resolving document('') as libxml2: reread the XSL document - great idea! Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 7 11:18:00 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 7 11:17:06 2006 Subject: [lxml-dev] Benchmark script Message-ID: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de> Hi, as I was at it anyway, I wrote a benchmark script that allows benchmarking simple test cases against different XML trees. Here are the benchmarks I have for now, taken from the new file bench.py in the trunk: def bench_append_from_document(self, tree1, root1, tree2, root2): "1,2" # needs trees 1 and 2 for el in root2: root1.append(root2[0]) def bench_rotate_children(self, tree, root): "1 2" # runs on tree 1 or 2 independently for i in range(100): root[-1] = root[0] def bench_reorder(self, tree, root): "1 2" for i in range(len(root)/2): root[-i] = root[0] The doc strings define the trees (or set of trees) that the benchmark is run against. Calling obviously uses reflection, so you can simply add new methods like the above to have them benchmarked. As an example, here are the results before and after my last patch: ## current trunk: # python bench.py append_from_document (T1,T2 ) 0.0586 0.0584 0.0583 msec/pass, avg: 0.0584 reorder (T1 ) 0.3182 0.3200 0.3183 msec/pass, avg: 0.3188 reorder (T2 ) 0.0951 0.0948 0.0952 msec/pass, avg: 0.0950 rotate_children (T1 ) 14.6780 14.4417 14.3684 msec/pass, avg: 14.4960 rotate_children (T2 ) 0.8209 0.8150 0.8139 msec/pass, avg: 0.8166 ## after applying the patch: # python bench.py append_from_document (T1,T2 ) 0.0582 0.0591 0.0583 msec/pass, avg: 0.0585 reorder (T1 ) 0.2437 0.2629 0.2432 msec/pass, avg: 0.2499 reorder (T2 ) 0.0926 0.0922 0.0923 msec/pass, avg: 0.0924 rotate_children (T1 ) 7.6664 7.6860 7.7656 msec/pass, avg: 7.7060 rotate_children (T2 ) 0.7434 0.7454 0.7599 msec/pass, avg: 0.7496 This is pretty significant in some cases, so I will apply the patch to the trunk. It also tells me that I will have to look at the rotate_children case as it is highly dependent on the type of tree (which it shouldn't be too much, I'd say). So, as usual, feel free to test it and find new benchmark cases for it. Stefan From faassen at infrae.com Wed Mar 8 17:27:14 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 8 17:27:25 2006 Subject: [lxml-dev] Benchmark script In-Reply-To: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de> References: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de> Message-ID: <440F05E2.3000108@infrae.com> Stefan Behnel wrote: [snip] > So, as usual, feel free to test it and find new benchmark cases for it. I'd be interesting to see how lxml compares to (c)ElementTree in benchmarks like this. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 8 18:04:42 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 8 18:04:00 2006 Subject: [lxml-dev] Benchmark script In-Reply-To: <440F05E2.3000108@infrae.com> References: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de> <440F05E2.3000108@infrae.com> Message-ID: <440F0EAA.5020207@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Stefan Behnel wrote: > [snip] > >> So, as usual, feel free to test it and find new benchmark cases for it. > > I'd be interesting to see how lxml compares to (c)ElementTree in > benchmarks like this. That's an easy one! :) Install them, do a checkout of the trunk and run make python bench.py -i -a (-i == in place, -a == use ElementTree and cElementTree if available) It will take a while - and in the end you will not be very impressed. One of the results of the benchmarks is that ElementTree beats lxml sometimes and that cElementTree beats lxml in most cases. The problems are: a) lxml has to deal with all sorts of backpointers and recursive clean ups in the libxml2 structure, which (c)ElementTree doesn't. b) I had to write the tests in a way that prevents lxml and ElementTree from benefitting from their different semantics (e.g. the implicit removal of lxml elements when copying) So, it's somewhat tricky to come up with comparable benchmarks - and if you find one, it's most likely one where ElementTree wins by default... Still, it's interesting to see the comparison. It tells you where lxml has its weaknesses. I've been putting some effort into removing or lifting some during the last two days, so it's a bit better now. But the relation above still holds. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 8 18:30:04 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 8 18:29:13 2006 Subject: [lxml-dev] Restructuring of namespace setting in Element() and SubElement() Message-ID: <440F149C.60004@gkec.informatik.tu-darmstadt.de> Hi, just a quick status update. I committed a patch to the trunk (24115) that basically refactors the above functions and the _addNamespaces function. The two are the main API functions (and the oldest grown ones around, I guess), so as I was running a few benchmarks anyway, I wanted to check if I could come up with some tweaks to make them faster. I took a closer look at them and found a couple of potential bugs and inefficient indirections. The result is that (I think) they should be more straight forward to read now and about 10-30% faster according to my benchmark. One of the problems was that name and namespace of the new element were set twice. That was actually good, as they were set incorrectly the first time, so that bug was masked. However, that was neither very readable nor efficient. The new implementation moves the namespace handling and prefix setting into the _addNamespaces function (which I renamed to _setNamespaces). It is therefore combined with the namespace setup from the nsmap dictionary, so that the prefixes and the C namespace structure only have to be generated once. The test cases pass just as before. I also added a few new ones whenever I found sequences of code that made me wonder. So I can somewhat hope I didn't replace the old bugs by new ones... Have fun, Stefan From fredrik at pythonware.com Wed Mar 8 18:32:49 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Mar 8 18:40:16 2006 Subject: [lxml-dev] Re: Benchmark script References: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de><440F05E2.3000108@infrae.com> <440F0EAA.5020207@gkec.informatik.tu-darmstadt.de> Message-ID: Stefan Behnel wrote: > That's an easy one! :) > > Install them, do a checkout of the trunk and run > > make > python bench.py -i -a > > (-i == in place, -a == use ElementTree and cElementTree if available) > > It will take a while - and in the end you will not be very impressed. One of > the results of the benchmarks is that ElementTree beats lxml sometimes and > that cElementTree beats lxml in most cases. can you post a copy of the results ? From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 8 19:05:11 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 8 19:04:31 2006 Subject: [lxml-dev] Re: Benchmark script In-Reply-To: References: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de><440F05E2.3000108@infrae.com> <440F0EAA.5020207@gkec.informatik.tu-darmstadt.de> Message-ID: <440F1CD7.8090901@gkec.informatik.tu-darmstadt.de> Fredrik Lundh wrote: > Stefan Behnel wrote: > >> That's an easy one! :) >> >> Install them, do a checkout of the trunk and run >> >> make >> python bench.py -i -a >> >> (-i == in place, -a == use ElementTree and cElementTree if available) >> >> It will take a while - and in the end you will not be very impressed. One of >> the results of the benchmarks is that ElementTree beats lxml sometimes and >> that cElementTree beats lxml in most cases. > > can you post a copy of the results ? Here you go. Some of them look a bit bogous - I only have my laptop to run them. So, if the numbers are statistically questionable, you better run the benchmark yourself (or look for bugs in the implementation...). I compiled both lxml and cElementTree for my machine, in case you want to know. The results are also a bit lengthy. They are based on four different trees: a deep one (depth 7, 3 children per node), two broader ones (children per node at each level: root-520-26 and root-26-520) and a small one (root-26-3). Some of the benchmarks need two trees to run, so the script runs them with all possible combinations of the four trees. This allows you to see to what extend the size of the source and destination trees of whatever-the-benchmark-does matters. The benchmark script itself is here: http://codespeak.net/svn/lxml/trunk/bench.py It's pretty easy to extend, just add a new benchmark method to the BenchMark class and run the script. The number of trees that it needs is automatically inferred from its number of arguments. Stefan -------------- next part -------------- Preparing test suites and trees ... Running benchmark on etree, ElementTree, cElementTree Setup times for trees in seconds: etree : 0.3495, 0.3159, 0.0705, 0.0024 ElementTree : 0.2363, 0.4836, 0.0681, 0.0010 cElementTree : 0.0379, 0.0163, 0.0090, 0.0001 etree append_elements (T1 ) 2.6187 2.7251 2.8163 msec/pass, best: 2.6187 ElementTree append_elements (T1 ) 0.3295 0.3307 0.3304 msec/pass, best: 0.3295 cElementTree append_elements (T1 ) 0.0845 0.2258 0.0835 msec/pass, best: 0.0835 etree append_elements (T2 ) 19.6533 20.0642 18.7933 msec/pass, best: 18.7933 ElementTree append_elements (T2 ) 5.1974 4.9660 4.9829 msec/pass, best: 4.9660 cElementTree append_elements (T2 ) 0.8010 0.8131 1.3228 msec/pass, best: 0.8010 etree append_elements (T3 ) 0.2354 0.1766 0.1877 msec/pass, best: 0.1766 ElementTree append_elements (T3 ) 0.1077 0.1066 0.1087 msec/pass, best: 0.1066 cElementTree append_elements (T3 ) 0.0321 0.0330 0.0339 msec/pass, best: 0.0321 etree append_elements (T4 ) 0.8439 0.8159 1.3547 msec/pass, best: 0.8159 ElementTree append_elements (T4 ) 0.2296 0.2230 0.2199 msec/pass, best: 0.2199 cElementTree append_elements (T4 ) 0.0380 0.0364 0.0351 msec/pass, best: 0.0351 etree append_from_document (T1,T2 ) 23.6859 17.6534 14.8655 msec/pass, best: 14.8655 ElementTree append_from_document (T1,T2 ) 8.1959 8.4057 10.5931 msec/pass, best: 8.1959 cElementTree append_from_document (T1,T2 ) 0.4955 0.5058 0.4992 msec/pass, best: 0.4955 etree append_from_document (T1,T3 ) 1.0019 1.0577 1.6441 msec/pass, best: 1.0019 ElementTree append_from_document (T1,T3 ) 0.0825 0.0910 0.0844 msec/pass, best: 0.0825 cElementTree append_from_document (T1,T3 ) 0.0320 0.0305 0.0351 msec/pass, best: 0.0305 etree append_from_document (T1,T4 ) 1.2400 1.7096 1.9829 msec/pass, best: 1.2400 ElementTree append_from_document (T1,T4 ) 0.1672 0.1701 0.1688 msec/pass, best: 0.1672 cElementTree append_from_document (T1,T4 ) 0.0514 0.0545 0.0548 msec/pass, best: 0.0514 etree append_from_document (T2,T1 ) 27.1049 33.2417 33.2802 msec/pass, best: 27.1049 ElementTree append_from_document (T2,T1 ) 1.1742 1.1052 3.5085 msec/pass, best: 1.1052 cElementTree append_from_document (T2,T1 ) 0.0577 0.0683 0.0610 msec/pass, best: 0.0577 etree append_from_document (T2,T3 ) 0.9404 0.8878 0.9161 msec/pass, best: 0.8878 ElementTree append_from_document (T2,T3 ) 0.0856 0.0889 0.0909 msec/pass, best: 0.0856 cElementTree append_from_document (T2,T3 ) 0.0325 0.0311 0.0352 msec/pass, best: 0.0311 etree append_from_document (T2,T4 ) 1.6886 2.0050 1.9701 msec/pass, best: 1.6886 ElementTree append_from_document (T2,T4 ) 1.6004 2.6041 2.7645 msec/pass, best: 1.6004 cElementTree append_from_document (T2,T4 ) 0.0462 0.0494 0.0462 msec/pass, best: 0.0462 etree append_from_document (T3,T1 ) 23.2682 24.6881 24.8672 msec/pass, best: 23.2682 ElementTree append_from_document (T3,T1 ) 2.5431 2.8296 4.5559 msec/pass, best: 2.5431 cElementTree append_from_document (T3,T1 ) 0.0609 0.0653 0.0613 msec/pass, best: 0.0609 etree append_from_document (T3,T2 ) 11.9439 13.6612 13.1227 msec/pass, best: 11.9439 ElementTree append_from_document (T3,T2 ) 5.9978 6.3030 6.3133 msec/pass, best: 5.9978 cElementTree append_from_document (T3,T2 ) 0.5033 0.5084 0.5037 msec/pass, best: 0.5033 etree append_from_document (T3,T4 ) 0.3002 0.3107 0.2871 msec/pass, best: 0.2871 ElementTree append_from_document (T3,T4 ) 0.2445 0.1626 0.1605 msec/pass, best: 0.1605 cElementTree append_from_document (T3,T4 ) 0.0384 0.0334 0.0338 msec/pass, best: 0.0334 etree append_from_document (T4,T1 ) 24.0083 25.5595 25.3945 msec/pass, best: 24.0083 ElementTree append_from_document (T4,T1 ) 0.1816 0.1750 0.1844 msec/pass, best: 0.1750 cElementTree append_from_document (T4,T1 ) 0.0653 0.0677 0.0679 msec/pass, best: 0.0653 etree append_from_document (T4,T2 ) 13.1701 13.0942 13.2213 msec/pass, best: 13.0942 ElementTree append_from_document (T4,T2 ) 5.0062 5.2756 5.7883 msec/pass, best: 5.0062 cElementTree append_from_document (T4,T2 ) 0.5210 0.5819 0.5079 msec/pass, best: 0.5079 etree append_from_document (T4,T3 ) 0.5563 1.2271 0.6637 msec/pass, best: 0.5563 ElementTree append_from_document (T4,T3 ) 0.0805 0.0755 0.0736 msec/pass, best: 0.0736 cElementTree append_from_document (T4,T3 ) 0.0276 0.0250 0.0254 msec/pass, best: 0.0250 etree clear (T1 ) 8.5457 7.9606 7.4105 msec/pass, best: 7.4105 ElementTree clear (T1 ) 17.8255 18.8504 17.0351 msec/pass, best: 17.0351 cElementTree clear (T1 ) 2.1414 2.3636 2.9883 msec/pass, best: 2.1414 etree clear (T2 ) 16.9409 7.0505 17.4210 msec/pass, best: 7.0505 ElementTree clear (T2 ) 38.2996 37.6155 27.9877 msec/pass, best: 27.9877 cElementTree clear (T2 ) 2.4261 2.3950 2.3893 msec/pass, best: 2.3893 etree clear (T3 ) 0.4994 0.4925 0.4937 msec/pass, best: 0.4925 ElementTree clear (T3 ) 4.3199 4.4805 4.4340 msec/pass, best: 4.3199 cElementTree clear (T3 ) 0.4300 0.4456 0.4420 msec/pass, best: 0.4300 etree clear (T4 ) 0.0230 0.0251 0.0234 msec/pass, best: 0.0230 ElementTree clear (T4 ) 0.0522 0.0501 0.0515 msec/pass, best: 0.0501 cElementTree clear (T4 ) 0.0070 0.0069 0.0069 msec/pass, best: 0.0069 etree create_subelements (T1 ) 0.8444 0.8633 0.8598 msec/pass, best: 0.8444 ElementTree create_subelements (T1 ) 0.5108 0.4110 0.4049 msec/pass, best: 0.4049 cElementTree create_subelements (T1 ) 0.0613 0.0671 0.0666 msec/pass, best: 0.0613 etree create_subelements (T2 ) 16.4280 15.6007 17.4013 msec/pass, best: 15.6007 ElementTree create_subelements (T2 ) 6.4784 6.5251 6.5057 msec/pass, best: 6.4784 cElementTree create_subelements (T2 ) 0.6267 0.5625 0.5439 msec/pass, best: 0.5439 etree create_subelements (T3 ) 0.1782 0.1495 0.1590 msec/pass, best: 0.1495 ElementTree create_subelements (T3 ) 0.1202 0.1270 0.1189 msec/pass, best: 0.1189 cElementTree create_subelements (T3 ) 0.0295 0.0281 0.0321 msec/pass, best: 0.0281 etree create_subelements (T4 ) 0.7118 0.7083 0.7148 msec/pass, best: 0.7083 ElementTree create_subelements (T4 ) 0.2945 0.2923 0.2944 msec/pass, best: 0.2923 cElementTree create_subelements (T4 ) 0.0219 0.0216 0.0228 msec/pass, best: 0.0216 etree insert_from_document (T1,T2 ) 33.0793 23.2559 23.3976 msec/pass, best: 23.2559 ElementTree insert_from_document (T1,T2 ) 9.5483 10.5742 11.7306 msec/pass, best: 9.5483 cElementTree insert_from_document (T1,T2 ) 1.5568 1.3950 1.3709 msec/pass, best: 1.3709 etree insert_from_document (T1,T3 ) 3.3307 3.2447 3.6380 msec/pass, best: 3.2447 ElementTree insert_from_document (T1,T3 ) 0.1029 0.0960 0.1061 msec/pass, best: 0.0960 cElementTree insert_from_document (T1,T3 ) 0.0406 0.0423 0.0431 msec/pass, best: 0.0406 etree insert_from_document (T1,T4 ) 1.7418 2.0915 2.1230 msec/pass, best: 1.7418 ElementTree insert_from_document (T1,T4 ) 0.2430 0.2482 0.2488 msec/pass, best: 0.2430 cElementTree insert_from_document (T1,T4 ) 0.0777 0.0859 0.0809 msec/pass, best: 0.0777 etree insert_from_document (T2,T1 ) 34.0752 31.5777 35.4281 msec/pass, best: 31.5777 ElementTree insert_from_document (T2,T1 ) 2.6473 2.5387 4.7799 msec/pass, best: 2.5387 cElementTree insert_from_document (T2,T1 ) 0.1180 0.1313 0.1245 msec/pass, best: 0.1180 etree insert_from_document (T2,T3 ) 1.2871 1.4429 1.5116 msec/pass, best: 1.2871 ElementTree insert_from_document (T2,T3 ) 0.1033 0.1039 0.1071 msec/pass, best: 0.1033 cElementTree insert_from_document (T2,T3 ) 0.0471 0.0501 0.0499 msec/pass, best: 0.0471 etree insert_from_document (T2,T4 ) 1.9967 1.9259 2.0996 msec/pass, best: 1.9259 ElementTree insert_from_document (T2,T4 ) 2.2778 2.2210 2.9068 msec/pass, best: 2.2210 cElementTree insert_from_document (T2,T4 ) 0.1068 0.1087 0.1072 msec/pass, best: 0.1068 etree insert_from_document (T3,T1 ) 20.2503 23.9721 21.2222 msec/pass, best: 20.2503 ElementTree insert_from_document (T3,T1 ) 3.3407 2.8879 3.7921 msec/pass, best: 2.8879 cElementTree insert_from_document (T3,T1 ) 0.1046 0.0877 0.0832 msec/pass, best: 0.0832 etree insert_from_document (T3,T2 ) 18.5366 20.5677 21.1396 msec/pass, best: 18.5366 ElementTree insert_from_document (T3,T2 ) 25.1812 8.6554 8.6306 msec/pass, best: 8.6306 cElementTree insert_from_document (T3,T2 ) 1.3172 1.3074 1.3212 msec/pass, best: 1.3074 etree insert_from_document (T3,T4 ) 0.3818 0.3595 0.3517 msec/pass, best: 0.3517 ElementTree insert_from_document (T3,T4 ) 0.3012 0.2785 0.2326 msec/pass, best: 0.2326 cElementTree insert_from_document (T3,T4 ) 0.0627 0.0581 0.0589 msec/pass, best: 0.0581 etree insert_from_document (T4,T1 ) 24.0863 25.1853 24.8056 msec/pass, best: 24.0863 ElementTree insert_from_document (T4,T1 ) 0.2589 0.2595 0.2536 msec/pass, best: 0.2536 cElementTree insert_from_document (T4,T1 ) 0.0936 0.0983 0.0967 msec/pass, best: 0.0936 etree insert_from_document (T4,T2 ) 21.3952 22.1335 22.2327 msec/pass, best: 21.3952 ElementTree insert_from_document (T4,T2 ) 6.7473 6.9179 7.2210 msec/pass, best: 6.7473 cElementTree insert_from_document (T4,T2 ) 1.3629 1.3921 1.3564 msec/pass, best: 1.3564 etree insert_from_document (T4,T3 ) 0.5727 0.6047 0.5907 msec/pass, best: 0.5727 ElementTree insert_from_document (T4,T3 ) 0.0950 0.0929 0.0976 msec/pass, best: 0.0929 cElementTree insert_from_document (T4,T3 ) 0.0338 0.0373 0.0339 msec/pass, best: 0.0338 etree reorder (T1 ) 13.7370 13.4423 14.8938 msec/pass, best: 13.4423 ElementTree reorder (T1 ) 0.1945 0.8481 1.6554 msec/pass, best: 0.1945 cElementTree reorder (T1 ) 0.0658 0.0709 0.0663 msec/pass, best: 0.0658 etree reorder (T2 ) 18.3126 20.8942 20.8194 msec/pass, best: 18.3126 ElementTree reorder (T2 ) 5.2827 5.5325 5.6348 msec/pass, best: 5.2827 cElementTree reorder (T2 ) 1.1879 1.1766 1.1588 msec/pass, best: 1.1588 etree reorder (T3 ) 0.0151 0.0163 0.0158 msec/pass, best: 0.0151 ElementTree reorder (T3 ) 0.0417 0.0436 0.0410 msec/pass, best: 0.0410 cElementTree reorder (T3 ) 0.0247 0.0242 0.0233 msec/pass, best: 0.0233 etree reorder (T4 ) 0.1560 0.1559 0.1550 msec/pass, best: 0.1550 ElementTree reorder (T4 ) 0.1251 0.1220 0.1210 msec/pass, best: 0.1210 cElementTree reorder (T4 ) 0.0213 0.0214 0.0236 msec/pass, best: 0.0213 etree reorder_slice (T1 ) 12.0721 13.7968 15.4897 msec/pass, best: 12.0721 ElementTree reorder_slice (T1 ) 0.2232 0.2023 0.2043 msec/pass, best: 0.2023 cElementTree reorder_slice (T1 ) 0.0591 0.0645 0.0654 msec/pass, best: 0.0591 etree reorder_slice (T2 ) 20.7641 21.2409 20.5686 msec/pass, best: 20.5686 ElementTree reorder_slice (T2 ) 5.5664 5.8027 5.8425 msec/pass, best: 5.5664 cElementTree reorder_slice (T2 ) 1.2022 1.1533 1.2419 msec/pass, best: 1.1533 etree reorder_slice (T3 ) 0.0146 0.0133 0.0170 msec/pass, best: 0.0133 ElementTree reorder_slice (T3 ) 0.0406 0.0449 0.0415 msec/pass, best: 0.0406 cElementTree reorder_slice (T3 ) 0.0232 0.0232 0.0237 msec/pass, best: 0.0232 etree reorder_slice (T4 ) 0.1669 0.1637 0.1630 msec/pass, best: 0.1630 ElementTree reorder_slice (T4 ) 0.1264 0.1288 0.1269 msec/pass, best: 0.1264 cElementTree reorder_slice (T4 ) 0.0229 0.0211 0.0217 msec/pass, best: 0.0211 etree replace_children (T1 ) 8.0109 8.5548 8.8374 msec/pass, best: 8.0109 ElementTree replace_children (T1 ) 19.4384 19.5091 19.5183 msec/pass, best: 19.4384 cElementTree replace_children (T1 ) 2.0199 2.0196 2.0339 msec/pass, best: 2.0196 etree replace_children (T2 ) 28.7735 29.8588 31.2469 msec/pass, best: 28.7735 ElementTree replace_children (T2 ) 28.4966 30.1117 27.4139 msec/pass, best: 27.4139 cElementTree replace_children (T2 ) 2.9685 3.0111 3.1016 msec/pass, best: 2.9685 etree replace_children (T3 ) 0.7104 0.6892 0.7067 msec/pass, best: 0.6892 ElementTree replace_children (T3 ) 4.5159 4.5425 4.4612 msec/pass, best: 4.4612 cElementTree replace_children (T3 ) 0.4480 0.4526 0.4497 msec/pass, best: 0.4480 etree replace_children (T4 ) 0.9078 0.9043 0.8908 msec/pass, best: 0.8908 ElementTree replace_children (T4 ) 0.3466 0.3405 0.3500 msec/pass, best: 0.3405 cElementTree replace_children (T4 ) 0.0381 0.0384 0.0368 msec/pass, best: 0.0368 etree rotate_children (T1 ) 27.8303 28.3840 28.3697 msec/pass, best: 27.8303 ElementTree rotate_children (T1 ) 1.2922 1.1920 1.1607 msec/pass, best: 1.1607 cElementTree rotate_children (T1 ) 0.1670 0.1657 0.1693 msec/pass, best: 0.1657 etree rotate_children (T2 ) 7.5876 8.5908 8.6592 msec/pass, best: 7.5876 ElementTree rotate_children (T2 ) 1.7101 1.3314 1.3988 msec/pass, best: 1.3314 cElementTree rotate_children (T2 ) 0.4529 0.4038 0.5509 msec/pass, best: 0.4038 etree rotate_children (T3 ) 4.5568 4.5938 4.4606 msec/pass, best: 4.4606 ElementTree rotate_children (T3 ) 0.6129 0.5613 0.5740 msec/pass, best: 0.5613 cElementTree rotate_children (T3 ) 0.1412 0.1379 0.1383 msec/pass, best: 0.1379 etree rotate_children (T4 ) 0.8887 0.8646 0.8843 msec/pass, best: 0.8646 ElementTree rotate_children (T4 ) 0.5087 0.5158 0.5187 msec/pass, best: 0.5087 cElementTree rotate_children (T4 ) 0.1269 0.1282 0.1233 msec/pass, best: 0.1233 From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 8 22:38:24 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 8 22:37:34 2006 Subject: [lxml-dev] Re: Benchmark script In-Reply-To: <440F1CD7.8090901@gkec.informatik.tu-darmstadt.de> References: <440D5DD8.5090001@gkec.informatik.tu-darmstadt.de><440F05E2.3000108@infrae.com> <440F0EAA.5020207@gkec.informatik.tu-darmstadt.de> <440F1CD7.8090901@gkec.informatik.tu-darmstadt.de> Message-ID: <440F4ED0.6090103@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > Fredrik Lundh wrote: >> Stefan Behnel wrote: >>> in the end you will not be very impressed. One of >>> the results of the benchmarks is that ElementTree beats lxml sometimes and >>> that cElementTree beats lxml in most cases. >> >> can you post a copy of the results ? > > Here you go. [...] Hi, just wanted to add a note that the results of the comparison actually do depend on what benchmarks you run. Here's an example where lxml is pretty competitive. I'll add more benchmarks to the script over time. def bench_remove_children(self, root): for child in root: root.remove(child) def bench_remove_children_reversed(self, root): for child in reversed(root[:]): root.remove(child) Stefan -------------- next part -------------- etree remove_nodes (T1 ) 7.1806 7.8207 7.5744 msec/pass, best: 7.1806 ElementTree remove_nodes (T1 ) 9.9351 10.3656 10.2434 msec/pass, best: 9.9351 cElementTree remove_nodes (T1 ) 1.1753 1.1938 1.1930 msec/pass, best: 1.1753 etree remove_nodes (T2 ) 10.7585 11.1049 11.0373 msec/pass, best: 10.7585 ElementTree remove_nodes (T2 ) 163.0597 163.6652 163.5988 msec/pass, best: 163.0597 cElementTree remove_nodes (T2 ) 4.5629 4.6345 4.6852 msec/pass, best: 4.5629 etree remove_nodes (T3 ) 0.4624 0.4984 0.4635 msec/pass, best: 0.4624 ElementTree remove_nodes (T3 ) 3.1772 3.1977 3.1922 msec/pass, best: 3.1772 cElementTree remove_nodes (T3 ) 0.3280 0.3293 0.3732 msec/pass, best: 0.3280 etree remove_nodes (T4 ) 0.1855 0.1881 0.2297 msec/pass, best: 0.1855 ElementTree remove_nodes (T4 ) 0.4550 0.4557 0.4541 msec/pass, best: 0.4541 cElementTree remove_nodes (T4 ) 0.0217 0.0215 0.0216 msec/pass, best: 0.0215 etree remove_nodes_reversed (T1 ) 8.6140 7.7091 7.2880 msec/pass, best: 7.2880 ElementTree remove_nodes_reversed (T1 ) 19.3782 18.9275 18.9883 msec/pass, best: 18.9275 cElementTree remove_nodes_reversed (T1 ) 2.2662 2.2573 2.2869 msec/pass, best: 2.2573 etree remove_nodes_reversed (T2 ) 12.0817 12.6200 13.0714 msec/pass, best: 12.0817 ElementTree remove_nodes_reversed (T2 ) 648.7812 647.7359 671.5900 msec/pass, best: 647.7359 cElementTree remove_nodes_reversed (T2 ) 12.8756 12.9411 12.8823 msec/pass, best: 12.8756 etree remove_nodes_reversed (T3 ) 0.3564 0.3560 0.3582 msec/pass, best: 0.3560 ElementTree remove_nodes_reversed (T3 ) 4.4038 4.4472 4.3877 msec/pass, best: 4.3877 cElementTree remove_nodes_reversed (T3 ) 0.4653 0.4736 0.4737 msec/pass, best: 0.4653 etree remove_nodes_reversed (T4 ) 0.1996 0.1971 0.1951 msec/pass, best: 0.1951 ElementTree remove_nodes_reversed (T4 ) 1.6460 1.6219 1.6266 msec/pass, best: 1.6219 cElementTree remove_nodes_reversed (T4 ) 0.0534 0.0503 0.0538 msec/pass, best: 0.0503 From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 9 09:15:51 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 9 09:16:18 2006 Subject: [lxml-dev] New benchmark results Message-ID: <440FE437.7050506@gkec.informatik.tu-darmstadt.de> Here, we go, this is *much* better. I looked over one of the bottlenecks in the API, the _elementFactory function. I found that the main problem was my implementation of the namespace element lookup, which translated to extremely inefficient C code. I rewrote that with explicit Python API calls and that gives us between 10% and 400% speed up in the benchmarks, depending on the API intensiveness of the tests. Note that lxml is faster than ElementTree in the benchmark tree setup now. Now, the main area where lxml is substantially slower than ElementTree is when moving elements between documents by hand. This is rare and can be avoided pretty often, I guess. Note that moving an entire XML tree to a new document in the obvious one-step operation is *not* so expensive in lxml. What matters is the overhead of creating a Python representation of elements, which is purely API related. Everything that runs internally runs mainly at the speed of libxml2. A special beauty here is the deepcopy benchmark. There still is a certain overhead in the API-to-libxml2 mapping, so programs that extensively create elements one after the other are better off with ElementTree and cElementTree. This advantage already diminishes when SubElement is used, for which lxml is now faster than ElementTree. In all other cases, I wouldn't see a major advantage for the two. Especially none that should keep people from using lxml. Depending on the tree size/structure and the operation, lxml and cElementTree can both take the lead. Since the APIs are largely compatible, I see the main decision maker as follows: * If your problem is small and not performance critical, use whatever you like * If you create tons of independent elements, use cElementTree or consider changing your algorithms :) * If your program is mainly API bound, use cElementTree * If your program is dominated by large XML structures, try both lxml and cElementTree * If you often need to copy large trees, consider lxml * If you need features like XPath, XSLT, RNG, custom APIs, ... - use lxml * If you need those features and more speed, send a patch to this list Have fun, Stefan -------------- next part -------------- Preparing test suites and trees ... Running benchmark on etree, ElementTree, cElementTree Setup times for trees in seconds: etree : 0.1827, 0.1651, 0.0371, 0.0008 ElementTree : 0.2251, 0.2564, 0.0574, 0.0010 cElementTree : 0.0378, 0.0176, 0.0088, 0.0001 etree append_elements (T1 ) 2.1396 2.2991 2.2673 msec/pass, best: 2.1396 ElementTree append_elements (T1 ) 0.3253 0.3294 0.3451 msec/pass, best: 0.3253 cElementTree append_elements (T1 ) 0.1065 0.0814 0.0795 msec/pass, best: 0.0795 etree append_elements (T2 ) 9.5490 9.8586 9.8201 msec/pass, best: 9.5490 ElementTree append_elements (T2 ) 4.9283 4.9106 4.9252 msec/pass, best: 4.9106 cElementTree append_elements (T2 ) 0.7983 0.8127 0.8101 msec/pass, best: 0.7983 etree append_elements (T3 ) 0.1166 0.1110 0.1099 msec/pass, best: 0.1099 ElementTree append_elements (T3 ) 0.1056 0.1081 0.1097 msec/pass, best: 0.1056 cElementTree append_elements (T3 ) 0.0308 0.0326 0.0314 msec/pass, best: 0.0308 etree append_elements (T4 ) 0.6297 0.4669 0.4662 msec/pass, best: 0.4662 ElementTree append_elements (T4 ) 0.2243 0.2092 0.2179 msec/pass, best: 0.2092 cElementTree append_elements (T4 ) 0.0375 0.0376 0.0355 msec/pass, best: 0.0355 etree append_from_document (T1,T2 ) 17.5267 12.2076 12.4225 msec/pass, best: 12.2076 ElementTree append_from_document (T1,T2 ) 7.2593 8.3792 8.1089 msec/pass, best: 7.2593 cElementTree append_from_document (T1,T2 ) 0.4959 0.5207 0.4878 msec/pass, best: 0.4878 etree append_from_document (T1,T3 ) 0.8263 0.8588 0.8604 msec/pass, best: 0.8263 ElementTree append_from_document (T1,T3 ) 0.0907 0.0806 0.0843 msec/pass, best: 0.0806 cElementTree append_from_document (T1,T3 ) 0.0324 0.0324 0.0348 msec/pass, best: 0.0324 etree append_from_document (T1,T4 ) 1.6374 1.8921 1.7790 msec/pass, best: 1.6374 ElementTree append_from_document (T1,T4 ) 0.1675 0.1767 0.1696 msec/pass, best: 0.1675 cElementTree append_from_document (T1,T4 ) 0.0590 0.0578 0.0561 msec/pass, best: 0.0561 etree append_from_document (T2,T1 ) 46.6267 44.5527 33.6675 msec/pass, best: 33.6675 ElementTree append_from_document (T2,T1 ) 1.6526 1.2971 1.9397 msec/pass, best: 1.2971 cElementTree append_from_document (T2,T1 ) 0.0589 0.0589 0.0579 msec/pass, best: 0.0579 etree append_from_document (T2,T3 ) 0.8258 0.8747 0.8474 msec/pass, best: 0.8258 ElementTree append_from_document (T2,T3 ) 0.0849 0.0896 0.0837 msec/pass, best: 0.0837 cElementTree append_from_document (T2,T3 ) 0.0333 0.0438 0.0326 msec/pass, best: 0.0326 etree append_from_document (T2,T4 ) 1.6711 1.8064 1.8190 msec/pass, best: 1.6711 ElementTree append_from_document (T2,T4 ) 2.0507 2.4498 2.7369 msec/pass, best: 2.0507 cElementTree append_from_document (T2,T4 ) 0.0447 0.0477 0.0477 msec/pass, best: 0.0447 etree append_from_document (T3,T1 ) 23.7482 23.7627 24.5559 msec/pass, best: 23.7482 ElementTree append_from_document (T3,T1 ) 3.3216 3.7194 3.4479 msec/pass, best: 3.3216 cElementTree append_from_document (T3,T1 ) 0.0639 0.1068 0.0728 msec/pass, best: 0.0639 etree append_from_document (T3,T2 ) 9.1862 9.7372 10.1377 msec/pass, best: 9.1862 ElementTree append_from_document (T3,T2 ) 6.2612 7.0192 6.8504 msec/pass, best: 6.2612 cElementTree append_from_document (T3,T2 ) 0.4900 0.5061 0.5034 msec/pass, best: 0.4900 etree append_from_document (T3,T4 ) 0.1238 0.1324 0.1245 msec/pass, best: 0.1238 ElementTree append_from_document (T3,T4 ) 0.1831 0.1520 0.1646 msec/pass, best: 0.1520 cElementTree append_from_document (T3,T4 ) 0.0322 0.0579 0.0326 msec/pass, best: 0.0322 etree append_from_document (T4,T1 ) 23.8607 24.0458 24.7333 msec/pass, best: 23.8607 ElementTree append_from_document (T4,T1 ) 0.1792 0.1784 0.1760 msec/pass, best: 0.1760 cElementTree append_from_document (T4,T1 ) 0.0677 0.0696 0.0694 msec/pass, best: 0.0677 etree append_from_document (T4,T2 ) 9.6456 10.0888 10.2759 msec/pass, best: 9.6456 ElementTree append_from_document (T4,T2 ) 4.9667 5.1865 5.2433 msec/pass, best: 4.9667 cElementTree append_from_document (T4,T2 ) 0.4923 0.5122 1.4466 msec/pass, best: 0.4923 etree append_from_document (T4,T3 ) 0.5469 0.5745 0.5633 msec/pass, best: 0.5469 ElementTree append_from_document (T4,T3 ) 0.0769 0.0754 0.0798 msec/pass, best: 0.0754 cElementTree append_from_document (T4,T3 ) 0.0248 0.0252 0.0253 msec/pass, best: 0.0248 etree clear (T1 ) 6.8760 6.9473 6.9588 msec/pass, best: 6.8760 ElementTree clear (T1 ) 17.0238 17.1312 17.4397 msec/pass, best: 17.0238 cElementTree clear (T1 ) 2.1736 2.1512 2.1850 msec/pass, best: 2.1512 etree clear (T2 ) 6.9850 7.1344 7.1863 msec/pass, best: 6.9850 ElementTree clear (T2 ) 28.4882 17.6843 17.8492 msec/pass, best: 17.6843 cElementTree clear (T2 ) 2.4351 2.4483 2.4590 msec/pass, best: 2.4351 etree clear (T3 ) 0.5403 0.5393 0.5486 msec/pass, best: 0.5393 ElementTree clear (T3 ) 4.3183 4.3355 4.3676 msec/pass, best: 4.3183 cElementTree clear (T3 ) 0.3575 0.3696 0.3694 msec/pass, best: 0.3575 etree clear (T4 ) 0.0243 0.0230 0.0225 msec/pass, best: 0.0225 ElementTree clear (T4 ) 0.0497 0.0468 0.0491 msec/pass, best: 0.0468 cElementTree clear (T4 ) 0.0065 0.0069 0.0066 msec/pass, best: 0.0065 etree create_subelements (T1 ) 0.4447 0.4291 0.4390 msec/pass, best: 0.4291 ElementTree create_subelements (T1 ) 0.4023 0.4029 0.4081 msec/pass, best: 0.4023 cElementTree create_subelements (T1 ) 0.0650 0.0624 0.0642 msec/pass, best: 0.0624 etree create_subelements (T2 ) 6.4463 6.5367 6.4625 msec/pass, best: 6.4463 ElementTree create_subelements (T2 ) 6.4680 6.4867 6.5039 msec/pass, best: 6.4680 cElementTree create_subelements (T2 ) 0.5437 0.5510 0.5644 msec/pass, best: 0.5437 etree create_subelements (T3 ) 0.0982 0.0962 0.0966 msec/pass, best: 0.0962 ElementTree create_subelements (T3 ) 0.1408 0.1207 0.1170 msec/pass, best: 0.1170 cElementTree create_subelements (T3 ) 0.0279 0.0268 0.0265 msec/pass, best: 0.0265 etree create_subelements (T4 ) 0.3160 0.3148 0.3172 msec/pass, best: 0.3148 ElementTree create_subelements (T4 ) 0.3121 0.2866 0.2813 msec/pass, best: 0.2813 cElementTree create_subelements (T4 ) 0.0210 0.0210 0.0212 msec/pass, best: 0.0210 etree deepcopy (T1 ) 17.9121 19.2798 18.9026 msec/pass, best: 17.9121 ElementTree deepcopy (T1 ) 805.1189 805.3489 872.4165 msec/pass, best: 805.1189 cElementTree deepcopy (T1 ) 505.1526 479.0999 474.0529 msec/pass, best: 474.0529 etree deepcopy (T2 ) 30.3835 31.2109 31.1521 msec/pass, best: 30.3835 ElementTree deepcopy (T2 ) 1027.0189 1034.5323 1110.9516 msec/pass, best: 1027.0189 cElementTree deepcopy (T2 ) 534.4865 499.5629 523.3296 msec/pass, best: 499.5629 etree deepcopy (T3 ) 3.2103 3.1447 3.1190 msec/pass, best: 3.1190 ElementTree deepcopy (T3 ) 188.7364 189.6626 192.9635 msec/pass, best: 188.7364 cElementTree deepcopy (T3 ) 124.1634 121.2389 121.2373 msec/pass, best: 121.2373 etree deepcopy (T4 ) 0.6969 0.6899 0.7117 msec/pass, best: 0.6899 ElementTree deepcopy (T4 ) 4.4663 4.3800 4.4279 msec/pass, best: 4.3800 cElementTree deepcopy (T4 ) 3.1183 3.1667 3.0353 msec/pass, best: 3.0353 etree getchildren (T1 ) 21.8075 22.5544 22.9289 msec/pass, best: 21.8075 ElementTree getchildren (T1 ) 0.1317 0.1362 0.1310 msec/pass, best: 0.1310 cElementTree getchildren (T1 ) 0.9512 0.9820 0.9762 msec/pass, best: 0.9512 etree getchildren (T2 ) 20.1090 17.5955 17.8679 msec/pass, best: 17.5955 ElementTree getchildren (T2 ) 1.4564 1.4540 1.4398 msec/pass, best: 1.4398 cElementTree getchildren (T2 ) 1.3277 1.3638 1.3366 msec/pass, best: 1.3277 etree getchildren (T3 ) 0.0357 0.0358 0.0353 msec/pass, best: 0.0353 ElementTree getchildren (T3 ) 0.0633 0.0640 0.0679 msec/pass, best: 0.0633 cElementTree getchildren (T3 ) 0.0240 0.0254 0.0255 msec/pass, best: 0.0240 etree getchildren (T4 ) 0.1035 0.0940 0.0942 msec/pass, best: 0.0940 ElementTree getchildren (T4 ) 0.0671 0.0709 0.0690 msec/pass, best: 0.0671 cElementTree getchildren (T4 ) 0.0276 0.0272 0.0280 msec/pass, best: 0.0272 etree insert_from_document (T1,T2 ) 24.8505 24.1553 25.6988 msec/pass, best: 24.1553 ElementTree insert_from_document (T1,T2 ) 8.7017 9.8994 9.8052 msec/pass, best: 8.7017 cElementTree insert_from_document (T1,T2 ) 1.2120 1.2260 1.2225 msec/pass, best: 1.2120 etree insert_from_document (T1,T3 ) 1.0492 1.1782 1.3616 msec/pass, best: 1.0492 ElementTree insert_from_document (T1,T3 ) 0.0971 0.1012 0.1044 msec/pass, best: 0.0971 cElementTree insert_from_document (T1,T3 ) 0.0451 0.0489 0.0429 msec/pass, best: 0.0429 etree insert_from_document (T1,T4 ) 1.7054 1.8474 1.8518 msec/pass, best: 1.7054 ElementTree insert_from_document (T1,T4 ) 0.2471 0.2510 0.2550 msec/pass, best: 0.2471 cElementTree insert_from_document (T1,T4 ) 0.0815 0.0940 0.0868 msec/pass, best: 0.0815 etree insert_from_document (T2,T1 ) 32.2051 34.0006 34.0085 msec/pass, best: 32.2051 ElementTree insert_from_document (T2,T1 ) 2.2987 3.2959 2.1304 msec/pass, best: 2.1304 cElementTree insert_from_document (T2,T1 ) 0.1201 0.1238 0.1221 msec/pass, best: 0.1201 etree insert_from_document (T2,T3 ) 1.3658 1.2899 1.4140 msec/pass, best: 1.2899 ElementTree insert_from_document (T2,T3 ) 0.1069 0.1094 0.1085 msec/pass, best: 0.1069 cElementTree insert_from_document (T2,T3 ) 0.0470 0.0556 0.0524 msec/pass, best: 0.0470 etree insert_from_document (T2,T4 ) 2.6039 2.7250 2.7508 msec/pass, best: 2.6039 ElementTree insert_from_document (T2,T4 ) 2.2039 2.5299 2.5815 msec/pass, best: 2.2039 cElementTree insert_from_document (T2,T4 ) 0.1059 0.1134 0.1127 msec/pass, best: 0.1059 etree insert_from_document (T3,T1 ) 19.8009 25.0803 23.4647 msec/pass, best: 19.8009 ElementTree insert_from_document (T3,T1 ) 3.0262 4.1957 3.7703 msec/pass, best: 3.0262 cElementTree insert_from_document (T3,T1 ) 0.1040 0.1033 0.0954 msec/pass, best: 0.0954 etree insert_from_document (T3,T2 ) 15.1248 15.4261 16.6505 msec/pass, best: 15.1248 ElementTree insert_from_document (T3,T2 ) 7.8220 8.0915 8.5682 msec/pass, best: 7.8220 cElementTree insert_from_document (T3,T2 ) 1.1612 1.1757 1.2460 msec/pass, best: 1.1612 etree insert_from_document (T3,T4 ) 0.1667 0.4397 0.1828 msec/pass, best: 0.1667 ElementTree insert_from_document (T3,T4 ) 0.2935 0.2366 0.2363 msec/pass, best: 0.2363 cElementTree insert_from_document (T3,T4 ) 0.0566 0.0568 0.0734 msec/pass, best: 0.0566 etree insert_from_document (T4,T1 ) 23.9678 24.8289 24.7205 msec/pass, best: 23.9678 ElementTree insert_from_document (T4,T1 ) 0.2536 0.2725 0.2724 msec/pass, best: 0.2536 cElementTree insert_from_document (T4,T1 ) 0.0970 0.0986 0.0969 msec/pass, best: 0.0969 etree insert_from_document (T4,T2 ) 17.4943 17.6244 18.9083 msec/pass, best: 17.4943 ElementTree insert_from_document (T4,T2 ) 6.4159 6.5594 6.6816 msec/pass, best: 6.4159 cElementTree insert_from_document (T4,T2 ) 1.1943 1.1961 1.2079 msec/pass, best: 1.1943 etree insert_from_document (T4,T3 ) 0.5701 0.5741 0.5685 msec/pass, best: 0.5685 ElementTree insert_from_document (T4,T3 ) 0.1002 0.0967 0.0964 msec/pass, best: 0.0964 cElementTree insert_from_document (T4,T3 ) 0.0336 0.0379 0.0327 msec/pass, best: 0.0327 etree remove_children (T1 ) 6.9888 7.1405 7.1501 msec/pass, best: 6.9888 ElementTree remove_children (T1 ) 10.2381 10.3145 10.3398 msec/pass, best: 10.2381 cElementTree remove_children (T1 ) 1.1740 1.2146 1.2070 msec/pass, best: 1.1740 etree remove_children (T2 ) 8.0086 8.0501 8.0835 msec/pass, best: 8.0086 ElementTree remove_children (T2 ) 171.0074 165.6075 174.0978 msec/pass, best: 165.6075 cElementTree remove_children (T2 ) 4.5207 4.6791 4.7362 msec/pass, best: 4.5207 etree remove_children (T3 ) 0.5668 0.5530 0.5762 msec/pass, best: 0.5530 ElementTree remove_children (T3 ) 3.1857 3.1828 3.2323 msec/pass, best: 3.1828 cElementTree remove_children (T3 ) 0.2588 0.2608 0.2584 msec/pass, best: 0.2584 etree remove_children (T4 ) 0.0684 0.0753 0.0700 msec/pass, best: 0.0684 ElementTree remove_children (T4 ) 0.4579 0.4570 0.4711 msec/pass, best: 0.4570 cElementTree remove_children (T4 ) 0.0209 0.0210 0.0207 msec/pass, best: 0.0207 etree remove_children_reversed (T1 ) 11.9009 11.9145 11.8435 msec/pass, best: 11.8435 ElementTree remove_children_reversed (T1 ) 19.3379 18.8514 20.4126 msec/pass, best: 18.8514 cElementTree remove_children_reversed (T1 ) 2.2264 2.3288 2.2562 msec/pass, best: 2.2264 etree remove_children_reversed (T2 ) 12.9722 14.8116 12.7335 msec/pass, best: 12.7335 ElementTree remove_children_reversed (T2 ) 673.7946 684.9349 669.5603 msec/pass, best: 669.5603 cElementTree remove_children_reversed (T2 ) 13.0890 12.8449 12.9061 msec/pass, best: 12.8449 etree remove_children_reversed (T3 ) 0.5227 0.5268 0.5229 msec/pass, best: 0.5227 ElementTree remove_children_reversed (T3 ) 4.4230 5.2407 4.4353 msec/pass, best: 4.4230 cElementTree remove_children_reversed (T3 ) 0.4222 0.4244 0.4180 msec/pass, best: 0.4180 etree remove_children_reversed (T4 ) 0.0738 0.0731 0.0736 msec/pass, best: 0.0731 ElementTree remove_children_reversed (T4 ) 1.6637 1.6356 1.6363 msec/pass, best: 1.6356 cElementTree remove_children_reversed (T4 ) 0.0510 0.0484 0.0499 msec/pass, best: 0.0484 etree reorder (T1 ) 14.5806 16.2235 15.8373 msec/pass, best: 14.5806 ElementTree reorder (T1 ) 0.8176 2.1794 2.6297 msec/pass, best: 0.8176 cElementTree reorder (T1 ) 0.0634 0.0676 0.0605 msec/pass, best: 0.0605 etree reorder (T2 ) 18.1266 18.8180 18.8672 msec/pass, best: 18.1266 ElementTree reorder (T2 ) 5.5504 5.5097 5.7170 msec/pass, best: 5.5097 cElementTree reorder (T2 ) 1.1481 1.1568 1.1591 msec/pass, best: 1.1481 etree reorder (T3 ) 0.0130 0.0156 0.0155 msec/pass, best: 0.0130 ElementTree reorder (T3 ) 0.0437 0.0459 0.0446 msec/pass, best: 0.0437 cElementTree reorder (T3 ) 0.0244 0.0228 0.0230 msec/pass, best: 0.0228 etree reorder (T4 ) 0.0731 0.0780 0.0734 msec/pass, best: 0.0731 ElementTree reorder (T4 ) 0.1282 0.1242 0.1238 msec/pass, best: 0.1238 cElementTree reorder (T4 ) 0.0217 0.0218 0.0220 msec/pass, best: 0.0217 etree reorder_slice (T1 ) 14.5325 17.0733 15.9148 msec/pass, best: 14.5325 ElementTree reorder_slice (T1 ) 0.5114 1.8668 2.6623 msec/pass, best: 0.5114 cElementTree reorder_slice (T1 ) 0.0629 0.0636 0.0633 msec/pass, best: 0.0629 etree reorder_slice (T2 ) 19.0733 22.6151 20.3017 msec/pass, best: 19.0733 ElementTree reorder_slice (T2 ) 5.5578 5.8006 5.7953 msec/pass, best: 5.5578 cElementTree reorder_slice (T2 ) 1.1598 1.1522 1.1647 msec/pass, best: 1.1522 etree reorder_slice (T3 ) 0.0145 0.0135 0.0133 msec/pass, best: 0.0133 ElementTree reorder_slice (T3 ) 0.0421 0.0435 0.0434 msec/pass, best: 0.0421 cElementTree reorder_slice (T3 ) 0.0225 0.0221 0.0229 msec/pass, best: 0.0221 etree reorder_slice (T4 ) 0.0796 0.0779 0.0786 msec/pass, best: 0.0779 ElementTree reorder_slice (T4 ) 0.1390 0.1334 0.1426 msec/pass, best: 0.1334 cElementTree reorder_slice (T4 ) 0.0259 0.0219 0.0217 msec/pass, best: 0.0217 etree replace_children (T1 ) 7.4184 7.9814 8.2130 msec/pass, best: 7.4184 ElementTree replace_children (T1 ) 19.4359 19.5994 19.5809 msec/pass, best: 19.4359 cElementTree replace_children (T1 ) 2.0112 2.0221 2.0522 msec/pass, best: 2.0112 etree replace_children (T2 ) 17.0823 19.4184 20.3127 msec/pass, best: 17.0823 ElementTree replace_children (T2 ) 26.4215 26.6201 26.9273 msec/pass, best: 26.4215 cElementTree replace_children (T2 ) 2.9001 2.8836 2.8504 msec/pass, best: 2.8504 etree replace_children (T3 ) 0.6777 0.6651 0.6562 msec/pass, best: 0.6562 ElementTree replace_children (T3 ) 5.2446 4.5064 4.6229 msec/pass, best: 4.5064 cElementTree replace_children (T3 ) 0.4712 0.4945 0.4554 msec/pass, best: 0.4554 etree replace_children (T4 ) 0.4540 0.5324 0.4619 msec/pass, best: 0.4540 ElementTree replace_children (T4 ) 0.3328 0.3317 0.3333 msec/pass, best: 0.3317 cElementTree replace_children (T4 ) 0.0388 0.0385 0.0382 msec/pass, best: 0.0382 etree rotate_children (T1 ) 27.4596 27.0203 27.8874 msec/pass, best: 27.0203 ElementTree rotate_children (T1 ) 1.2427 1.2087 1.1850 msec/pass, best: 1.1850 cElementTree rotate_children (T1 ) 0.1627 0.1654 0.1633 msec/pass, best: 0.1627 etree rotate_children (T2 ) 7.2919 7.9870 7.1657 msec/pass, best: 7.1657 ElementTree rotate_children (T2 ) 2.2686 2.5049 1.8473 msec/pass, best: 1.8473 cElementTree rotate_children (T2 ) 0.4213 0.4067 0.4378 msec/pass, best: 0.4067 etree rotate_children (T3 ) 3.9796 3.9531 4.7094 msec/pass, best: 3.9531 ElementTree rotate_children (T3 ) 0.6443 0.5682 0.5796 msec/pass, best: 0.5682 cElementTree rotate_children (T3 ) 0.1355 0.1322 0.1368 msec/pass, best: 0.1322 etree rotate_children (T4 ) 0.3995 0.3204 0.3173 msec/pass, best: 0.3173 ElementTree rotate_children (T4 ) 0.5233 0.5198 0.5278 msec/pass, best: 0.5198 cElementTree rotate_children (T4 ) 0.1210 0.1199 0.1275 msec/pass, best: 0.1199 etree set_attributes (T1 ) 6.0470 6.6676 6.6880 msec/pass, best: 6.0470 ElementTree set_attributes (T1 ) 0.1447 0.1446 0.1479 msec/pass, best: 0.1446 cElementTree set_attributes (T1 ) 0.0691 0.0665 0.0666 msec/pass, best: 0.0665 etree set_attributes (T2 ) 8.8447 9.5652 9.4956 msec/pass, best: 8.8447 ElementTree set_attributes (T2 ) 1.6476 1.6582 1.6447 msec/pass, best: 1.6447 cElementTree set_attributes (T2 ) 0.7479 0.7352 0.7474 msec/pass, best: 0.7352 etree set_attributes (T3 ) 0.0568 0.0928 0.0583 msec/pass, best: 0.0568 ElementTree set_attributes (T3 ) 0.0625 0.0613 0.0658 msec/pass, best: 0.0613 cElementTree set_attributes (T3 ) 0.0272 0.0240 0.0255 msec/pass, best: 0.0240 etree set_attributes (T4 ) 0.1534 0.1562 0.1530 msec/pass, best: 0.1530 ElementTree set_attributes (T4 ) 0.0779 0.0759 0.0762 msec/pass, best: 0.0759 cElementTree set_attributes (T4 ) 0.0268 0.0282 0.0261 msec/pass, best: 0.0261 etree setget_attributes (T1 ) 5.4190 6.6268 6.6813 msec/pass, best: 5.4190 ElementTree setget_attributes (T1 ) 0.2305 0.2438 0.2276 msec/pass, best: 0.2276 cElementTree setget_attributes (T1 ) 0.1521 0.0983 0.0920 msec/pass, best: 0.0920 etree setget_attributes (T2 ) 10.5145 11.2669 11.2211 msec/pass, best: 10.5145 ElementTree setget_attributes (T2 ) 3.2395 3.1300 3.1196 msec/pass, best: 3.1196 cElementTree setget_attributes (T2 ) 1.1320 1.1729 1.1571 msec/pass, best: 1.1320 etree setget_attributes (T3 ) 0.0784 0.0807 0.0913 msec/pass, best: 0.0784 ElementTree setget_attributes (T3 ) 0.0977 0.0835 0.0889 msec/pass, best: 0.0835 cElementTree setget_attributes (T3 ) 0.0295 0.0285 0.0333 msec/pass, best: 0.0285 etree setget_attributes (T4 ) 0.2418 0.2438 0.2392 msec/pass, best: 0.2392 ElementTree setget_attributes (T4 ) 0.1654 0.1779 0.1539 msec/pass, best: 0.1539 cElementTree setget_attributes (T4 ) 0.0476 0.0458 0.0506 msec/pass, best: 0.0458 From fredrik at pythonware.com Thu Mar 9 15:02:40 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Mar 9 15:08:56 2006 Subject: [lxml-dev] Re: New benchmark results References: <440FE437.7050506@gkec.informatik.tu-darmstadt.de> Message-ID: Stefan Behnel wrote: > A special beauty here is the deepcopy benchmark. python's deepcopy implementation is horribly slow. I guess I have to do some- thing about that in cET... I currently recommend the following approach for mass- production of fresh trees from a given template: http://effbot.python-hosting.com/file/stuff/sandbox/elementlib/clone.py ::: do you have any tag/attribute/text access tests in your benchmark, btw? one big ad- vantage of cElementTree is that cET builds real Python objects on the way in; there's (usually) no extra conversion cost when you access an Element attribute. From faassen at infrae.com Thu Mar 9 19:16:38 2006 From: faassen at infrae.com (Martijn Faassen) Date: Thu Mar 9 19:17:00 2006 Subject: [lxml-dev] Better Installer In-Reply-To: <94E66656-3AE4-4B11-BB0D-52BF97579D42@livingcode.org> References: <44043ED0.4020505@gkec.informatik.tu-darmstadt.de> <3E6EB389-CB8F-4F15-889C-414D30E2D0AA@livingcode.org> <440605D0.3050000@gkec.informatik.tu-darmstadt.de> <94E66656-3AE4-4B11-BB0D-52BF97579D42@livingcode.org> Message-ID: <44107106.3070703@infrae.com> Dethe Elza wrote: >>> The main feature I'd like to see would be an easy installer that >>> include >>> lxml's dependencies, maybe using easy_install. It's a complex project >>> and the installation is easy to get wrong. >> >> >> Hmm, I'm not quite sure what could be done better here. What you'd >> have to do >> for 0.9 is: >> >> * install libxml2 and libxslt (which lxml can't do for you) > > > Why not? Other python extensions install their dependencies. Stephan already answered this, but I'll jump in -- I actually looked for a bit at setuptools last year in the hope it could tackle this problem, but there wasn't much that could help there... So, yeah, we're aware that this can be a problem, but it's not easy to fix unfortunately. Contributions in that department are of course very welcome. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 10 10:00:54 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 10 09:50:30 2006 Subject: [lxml-dev] Re: New benchmark results In-Reply-To: References: <440FE437.7050506@gkec.informatik.tu-darmstadt.de> Message-ID: <44114046.2040903@gkec.informatik.tu-darmstadt.de> Fredrik Lundh wrote: > Stefan Behnel wrote: > >> A special beauty here is the deepcopy benchmark. > > python's deepcopy implementation is horribly slow. I guess I have to do some- > thing about that in cET... I currently recommend the following approach for mass- > production of fresh trees from a given template: > > http://effbot.python-hosting.com/file/stuff/sandbox/elementlib/clone.py Thanks, that makes the benchmark suite run considerably faster. You should consider putting something like this into Element.__deepcopy__. > do you have any tag/attribute/text access tests in your benchmark, btw? one big ad- > vantage of cElementTree is that cET builds real Python objects on the way in; there's > (usually) no extra conversion cost when you access an Element attribute. I just added some and attached the result. Feel free to send me some more as a patch against http://codespeak.net/svn/lxml/trunk/bench.py You can configure how they are called by using a decorator: def bench_tag(self, root): for child in root: child.tag @with_text(utext=True, text=True, no_text=True) def bench_text(self, root): for child in root: child.text The first is called on all trees without attributes and text, the second one is called three times for each tree: without text (-), with ASCII text (S) and with Unicode text (U). Look at the tree setup in bench.py and the results in the attached log to see what I mean. There is also a decorator @with_attributes(True/False) Stefan -------------- next part -------------- Preparing test suites and trees ... Running benchmark on lxe, ET, cET Setup times for trees in seconds: lxe: -- T- U- -A TA UA T1: 0.1907 0.1578 0.1588 0.1592 0.1583 0.1581 T2: 0.1376 0.1388 0.1383 0.1424 0.1421 0.1425 T3: 0.0319 0.0308 0.0322 0.0495 0.0511 0.0500 T4: 0.0007 0.0007 0.0007 0.0012 0.0013 0.0011 ET : -- T- U- -A TA UA T1: 0.2313 0.2763 0.2166 0.2659 0.2128 0.2509 T2: 0.2578 0.1883 0.2500 0.1920 0.2361 0.2668 T3: 0.0512 0.0544 0.0518 0.0573 0.0538 0.0584 T4: 0.0009 0.0008 0.0007 0.0007 0.0008 0.0008 cET: -- T- U- -A TA UA T1: 0.0361 0.0348 0.0355 0.0554 0.0351 0.0360 T2: 0.0143 0.0139 0.0143 0.0149 0.0142 0.0140 T3: 0.0088 0.0088 0.0085 0.0123 0.0121 0.0170 T4: 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 lxe: tag (-- T1 ) 0.1260 0.1212 0.1831 msec/pass, best: 0.1212 ET : tag (-- T1 ) 0.1113 0.1098 0.1132 msec/pass, best: 0.1098 cET: tag (-- T1 ) 0.0406 0.0404 0.0432 msec/pass, best: 0.0404 lxe: tag (-- T2 ) 1.1775 1.2149 1.1726 msec/pass, best: 1.1726 ET : tag (-- T2 ) 1.0539 1.0603 1.0587 msec/pass, best: 1.0539 cET: tag (-- T2 ) 0.3180 0.3406 0.3449 msec/pass, best: 0.3180 lxe: tag (-- T3 ) 0.0353 0.0319 0.0383 msec/pass, best: 0.0319 ET : tag (-- T3 ) 0.0562 0.0675 0.0608 msec/pass, best: 0.0562 cET: tag (-- T3 ) 0.0099 0.0075 0.0077 msec/pass, best: 0.0075 lxe: tag (-- T4 ) 0.0665 0.0510 0.0500 msec/pass, best: 0.0500 ET : tag (-- T4 ) 0.0503 0.0478 0.0482 msec/pass, best: 0.0478 cET: tag (-- T4 ) 0.0158 0.0152 0.0142 msec/pass, best: 0.0142 lxe: text (-- T1 ) 0.1227 0.1120 0.1130 msec/pass, best: 0.1120 ET : text (-- T1 ) 0.1086 0.1273 0.1375 msec/pass, best: 0.1086 cET: text (-- T1 ) 0.0403 0.0407 0.0424 msec/pass, best: 0.0403 lxe: text (S- T1 ) 0.1112 0.1299 0.1150 msec/pass, best: 0.1112 ET : text (S- T1 ) 0.1107 0.1101 0.1113 msec/pass, best: 0.1101 cET: text (S- T1 ) 0.0405 0.0409 0.0404 msec/pass, best: 0.0404 lxe: text (U- T1 ) 0.1124 0.1315 0.1174 msec/pass, best: 0.1124 ET : text (U- T1 ) 0.1152 0.1132 0.1127 msec/pass, best: 0.1127 cET: text (U- T1 ) 0.0448 0.0410 0.0401 msec/pass, best: 0.0401 lxe: text (-- T2 ) 1.0198 1.0201 1.0330 msec/pass, best: 1.0198 ET : text (-- T2 ) 1.0598 1.0430 1.0629 msec/pass, best: 1.0430 cET: text (-- T2 ) 0.3317 0.3290 0.3309 msec/pass, best: 0.3290 lxe: text (S- T2 ) 0.9765 0.9936 1.0045 msec/pass, best: 0.9765 ET : text (S- T2 ) 1.0950 1.0390 1.0646 msec/pass, best: 1.0390 cET: text (S- T2 ) 0.3151 0.3374 0.3238 msec/pass, best: 0.3151 lxe: text (U- T2 ) 0.9843 1.0201 1.0384 msec/pass, best: 0.9843 ET : text (U- T2 ) 1.0933 1.1132 1.0576 msec/pass, best: 1.0576 cET: text (U- T2 ) 0.3155 0.3490 0.3386 msec/pass, best: 0.3155 lxe: text (-- T3 ) 0.0318 0.0316 0.0300 msec/pass, best: 0.0300 ET : text (-- T3 ) 0.0568 0.0580 0.0544 msec/pass, best: 0.0544 cET: text (-- T3 ) 0.0095 0.0070 0.0071 msec/pass, best: 0.0070 lxe: text (S- T3 ) 0.0372 0.0339 0.0323 msec/pass, best: 0.0323 ET : text (S- T3 ) 0.0545 0.0548 0.0573 msec/pass, best: 0.0545 cET: text (S- T3 ) 0.0080 0.0065 0.0082 msec/pass, best: 0.0065 lxe: text (U- T3 ) 0.0349 0.0377 0.0322 msec/pass, best: 0.0322 ET : text (U- T3 ) 0.0537 0.0556 0.0575 msec/pass, best: 0.0537 cET: text (U- T3 ) 0.0104 0.0090 0.0090 msec/pass, best: 0.0090 lxe: text (-- T4 ) 0.0408 0.0433 0.0421 msec/pass, best: 0.0408 ET : text (-- T4 ) 0.3062 0.0502 0.0475 msec/pass, best: 0.0475 cET: text (-- T4 ) 0.0184 0.0152 0.0148 msec/pass, best: 0.0148 lxe: text (S- T4 ) 0.0399 0.0401 0.0416 msec/pass, best: 0.0399 ET : text (S- T4 ) 0.0526 0.0493 0.0490 msec/pass, best: 0.0490 cET: text (S- T4 ) 0.0159 0.0154 0.0152 msec/pass, best: 0.0152 lxe: text (U- T4 ) 0.0422 0.0404 0.0404 msec/pass, best: 0.0404 ET : text (U- T4 ) 0.0499 0.0514 0.0492 msec/pass, best: 0.0492 cET: text (U- T4 ) 0.0151 0.0152 0.0153 msec/pass, best: 0.0151 From skink at evhr.net Fri Mar 10 12:06:42 2006 From: skink at evhr.net (Fabien SCHWOB) Date: Fri Mar 10 12:07:15 2006 Subject: [lxml-dev] Fedora Core 4 and lxml Message-ID: <20060310110642.7E7517EC18@postix.sdv.fr> I'm trying to build lxml on FC4. I've already installed Pyrex and applied the gcc4-small-patch. But when I'm trying to build lxml with 'python setup.py install' or 'make', I get the following error : building 'lxml.etree' extension Traceback (most recent call last): File "setup.py", line 29, in ? cmdclass = {'build_ext': build_pyx} File "/usr/lib/python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/usr/lib/python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/usr/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/usr/lib/python2.4/distutils/command/install.py", line 506, in run self.run_command('build') File "/usr/lib/python2.4/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/usr/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/usr/lib/python2.4/distutils/command/build.py", line 112, in run self.run_command(cmd_name) File "/usr/lib/python2.4/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/usr/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/usr/lib/python2.4/distutils/command/build_ext.py", line 279, in run self.build_extensions() File "/usr/lib/python2.4/distutils/command/build_ext.py", line 405, in build_extensions self.build_extension(ext) File "/usr/lib/python2.4/distutils/command/build_ext.py", line 442, in build_extension sources = self.swig_sources(sources, ext) TypeError: swig_sources() takes exactly 2 arguments (3 given) Does someone has an idea on how to solve this problem ? Thanks in advance. -- Fabien From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 10 12:26:05 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 10 12:15:45 2006 Subject: [lxml-dev] Fedora Core 4 and lxml In-Reply-To: <20060310110642.7E7517EC18@postix.sdv.fr> References: <20060310110642.7E7517EC18@postix.sdv.fr> Message-ID: <4411624D.70301@gkec.informatik.tu-darmstadt.de> Fabien SCHWOB wrote: > I'm trying to build lxml on FC4. I've already installed Pyrex and > applied the gcc4-small-patch. But when I'm trying to build lxml with > 'python setup.py install' or 'make', I get the following error : > [...] > TypeError: swig_sources() takes exactly 2 arguments (3 given) http://codespeak.net/pipermail/lxml-dev/2006-March/000954.html Stefan From faassen at infrae.com Fri Mar 10 12:38:21 2006 From: faassen at infrae.com (Martijn Faassen) Date: Fri Mar 10 12:38:12 2006 Subject: [lxml-dev] Fedora Core 4 and lxml In-Reply-To: <4411624D.70301@gkec.informatik.tu-darmstadt.de> References: <20060310110642.7E7517EC18@postix.sdv.fr> <4411624D.70301@gkec.informatik.tu-darmstadt.de> Message-ID: <4411652D.1030803@infrae.com> Stefan Behnel wrote: > Fabien SCHWOB wrote: > >>I'm trying to build lxml on FC4. I've already installed Pyrex and >>applied the gcc4-small-patch. But when I'm trying to build lxml with >>'python setup.py install' or 'make', I get the following error : >>[...] >>TypeError: swig_sources() takes exactly 2 arguments (3 given) > > > http://codespeak.net/pipermail/lxml-dev/2006-March/000954.html How do we make sure people don't run into this one again? Get a patch accepted by Pyrex? Actually we should start releasing with the generated .c code included so people won't need Pyrex anymore. That'll be the best approach for at least people using the release. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 10 13:02:39 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 10 13:03:12 2006 Subject: [lxml-dev] Fedora Core 4 and lxml In-Reply-To: <4411652D.1030803@infrae.com> References: <20060310110642.7E7517EC18@postix.sdv.fr> <4411624D.70301@gkec.informatik.tu-darmstadt.de> <4411652D.1030803@infrae.com> Message-ID: <44116ADF.8040705@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Stefan Behnel wrote: >> Fabien SCHWOB wrote: >> >>> I'm trying to build lxml on FC4. I've already installed Pyrex and >>> applied the gcc4-small-patch. But when I'm trying to build lxml with >>> 'python setup.py install' or 'make', I get the following error : >>> [...] >>> TypeError: swig_sources() takes exactly 2 arguments (3 given) >> >> >> http://codespeak.net/pipermail/lxml-dev/2006-March/000954.html > > How do we make sure people don't run into this one again? Get a patch > accepted by Pyrex? Like that's ever gonna happen ... :) > Actually we should start releasing with the generated .c code included > so people won't need Pyrex anymore. That'll be the best approach for at > least people using the release. etree.c is currently included when you run python setup.py sdist or anything in that line, so it will be in 0.9 and work around all bugs that normal users can encounter from that side. And developers who want to contribute to lxml can well read some documentation first... You should add the bug fix above to the Pyrex section of installation.html on the web page. Stefan From ogrisel at nuxeo.com Fri Mar 10 13:13:03 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Fri Mar 10 13:13:54 2006 Subject: [lxml-dev] Re: Fedora Core 4 and lxml In-Reply-To: <44116ADF.8040705@gkec.informatik.tu-darmstadt.de> References: <20060310110642.7E7517EC18@postix.sdv.fr> <4411624D.70301@gkec.informatik.tu-darmstadt.de> <4411652D.1030803@infrae.com> <44116ADF.8040705@gkec.informatik.tu-darmstadt.de> Message-ID: Stefan Behnel a ?crit : >> How do we make sure people don't run into this one again? Get a patch >> accepted by Pyrex? > > Like that's ever gonna happen ... :) > >> Actually we should start releasing with the generated .c code included >> so people won't need Pyrex anymore. That'll be the best approach for at >> least people using the release. > > etree.c is currently included when you run python setup.py sdist > or anything in that line, so it will be in 0.9 and work around all bugs that > normal users can encounter from that side. And developers who want to > contribute to lxml can well read some documentation first... > > You should add the bug fix above to the Pyrex section of installation.html on > the web page. +1 and even host a temporary branch/fork of Pyrex with all the necessary patches included in the lxml repos for the convenience of lxml hackers. Once the patches are included in upstream Pyrex we can get rid of the local branch. -- Olivier From skink at evhr.net Fri Mar 10 13:35:24 2006 From: skink at evhr.net (Fabien SCHWOB) Date: Fri Mar 10 13:35:55 2006 Subject: [lxml-dev] Fedora Core 4 and lxml In-Reply-To: <4411652D.1030803@infrae.com> Message-ID: <20060310123524.742117EC17@postix.sdv.fr> > > I'm trying to build lxml on FC4. I've already installed Pyrex and > > applied the gcc4-small-patch. But when I'm trying to build lxml > > with 'python setup.py install' or 'make', I get the following error : > > [...] > > TypeError: swig_sources() takes exactly 2 arguments (3 given) > http://codespeak.net/pipermail/lxml-dev/2006-March/000954.html Thanks, this problem is solved, but now I have another one. You can see the log below. The English translation of "erreur: membre gauche de l'affectation invalide" (french) is "error: left member of the assignment is invalid". Thanks ============8<====== LOGS ========8<================ running build_ext building 'lxml.etree' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m32 -march=i386 -mtune=pentium4 -fasynchronous-unwind-tables -D_GNU_SOURCE -fPIC -fPIC -I/usr/include/python2.4 -c src/lxml/etree.c -o build/temp.linux-i686-2.4/src/lxml/etree.o -w -I/usr/include/libxml2 src/lxml/etree.c: In function ?__pyx_f_5etree__parseDocument?: src/lxml/etree.c:704: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__documentFactory?: src/lxml/etree.c:724: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:732: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:746: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__elementTreeFactory?: src/lxml/etree.c:1851: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__newElementTree?: src/lxml/etree.c:1873: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:1881: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:1904: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:1920: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_8_Element___setslice__?: src/lxml/etree.c:2101: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:2178: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_8_Element___getslice__?: src/lxml/etree.c:3089: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:3123: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_8_Element_getchildren?: src/lxml/etree.c:3684: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:3696: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__elementFactory?: src/lxml/etree.c:4163: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4170: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4229: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4242: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4287: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__commentFactory?: src/lxml/etree.c:4604: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4610: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4644: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:4666: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__attribFactory?: src/lxml/etree.c:5676: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:5682: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:5703: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:5725: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_20ElementChildIterator___init__?: src/lxml/etree.c:5750: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:5755: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_20ElementChildIterator___next__?: src/lxml/etree.c:5824: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:5848: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_Element?: src/lxml/etree.c:6059: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6060: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6096: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6102: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_Comment?: src/lxml/etree.c:6155: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6182: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_SubElement?: src/lxml/etree.c:6241: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6275: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_ElementTree?: src/lxml/etree.c:6332: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6333: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6343: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6352: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6361: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6369: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_tostring?: src/lxml/etree.c:6574: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6608: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_parse?: src/lxml/etree.c:6689: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:6694: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__find_element_class?: src/lxml/etree.c:7519: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7555: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_4XSLT___init__?: src/lxml/etree.c:7644: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7649: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_4XSLT___call__?: src/lxml/etree.c:7751: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7752: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7753: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7763: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7769: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:7930: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__xsltResultTreeFactory?: src/lxml/etree.c:8129: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:8136: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_22XPathDocumentEvaluator___init__?: src/lxml/etree.c:8200: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:8201: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:8210: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:8216: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_22XPathDocumentEvaluator__hold?: src/lxml/etree.c:8813: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:8913: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__wrapXPathObject?: src/lxml/etree.c:9268: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:9438: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__createNodeSetResult?: src/lxml/etree.c:9680: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:9724: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__xpathCallback?: src/lxml/etree.c:9879: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:9880: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:9921: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:9945: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_9XMLSchema___init__?: src/lxml/etree.c:10286: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:10292: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_getProxy?: src/lxml/etree.c:11059: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__documentOrRaise?: src/lxml/etree.c:11576: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:11581: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:11614: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__documentOf?: src/lxml/etree.c:11705: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree__rootNodeOf?: src/lxml/etree.c:11782: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_f_5etree_changeDocumentBelowHelper?: src/lxml/etree.c:13060: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:13092: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__NodeBase?: src/lxml/etree.c:13559: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:13560: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_clear_5etree__NodeBase?: src/lxml/etree.c:13591: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__ElementTree?: src/lxml/etree.c:13719: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:13720: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_clear_5etree__ElementTree?: src/lxml/etree.c:13746: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:13748: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__Element?: src/lxml/etree.c:13889: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__Comment?: src/lxml/etree.c:14138: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__Attrib?: src/lxml/etree.c:14325: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__NamespaceRegistry?: src/lxml/etree.c:14641: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree__XSLTResultTree?: src/lxml/etree.c:14985: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_clear_5etree__XSLTResultTree?: src/lxml/etree.c:15009: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree_XPathDocumentEvaluator?: src/lxml/etree.c:15137: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:15138: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:15139: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_clear_5etree_XPathDocumentEvaluator?: src/lxml/etree.c:15204: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:15206: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree_XPathElementEvaluator?: src/lxml/etree.c:15351: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:15352: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_clear_5etree_XPathElementEvaluator?: src/lxml/etree.c:15376: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?__pyx_tp_new_5etree_Parser?: src/lxml/etree.c:15951: erreur: membre gauche de l'affectation invalide src/lxml/etree.c: In function ?initetree?: src/lxml/etree.c:16139: erreur: membre gauche de l'affectation invalide src/lxml/etree.c:16801: erreur: membre gauche de l'affectation invalide error: command 'gcc' failed with exit status 1 From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 10 13:43:25 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 10 13:43:58 2006 Subject: [lxml-dev] Fedora Core 4 and lxml In-Reply-To: <20060310123524.742117EC17@postix.sdv.fr> References: <20060310123524.742117EC17@postix.sdv.fr> Message-ID: <4411746D.9080204@gkec.informatik.tu-darmstadt.de> Fabien SCHWOB ? ?crit: >>> I'm trying to build lxml on FC4. I've already installed Pyrex and >>> applied the gcc4-small-patch. But when I'm trying to build lxml >>> with 'python setup.py install' or 'make', I get the following error : >>> [...] >>> TypeError: swig_sources() takes exactly 2 arguments (3 given) > >> http://codespeak.net/pipermail/lxml-dev/2006-March/000954.html > > Thanks, this problem is solved, but now I have another one. You can see > the log below. The English translation of "erreur: membre gauche de > l'affectation invalide" (french) is "error: left member of the > assignment is invalid". Merci, j'avais compris ?a. :) Are you sure you applied the two patches from http://codespeak.net/lxml/installation.html especially the first one? Stefan > ============8<====== LOGS ========8<================ > > running build_ext > building 'lxml.etree' extension > gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m32 -march=i386 -mtune=pentium4 > -fasynchronous-unwind-tables -D_GNU_SOURCE -fPIC -fPIC > -I/usr/include/python2.4 -c src/lxml/etree.c -o > build/temp.linux-i686-2.4/src/lxml/etree.o -w -I/usr/include/libxml2 > src/lxml/etree.c: In function ?__pyx_f_5etree__parseDocument?: > src/lxml/etree.c:704: erreur: membre gauche de l'affectation invalide [...] From behnel_ml at gkec.informatik.tu-darmstadt.de Fri Mar 10 14:08:10 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Fri Mar 10 14:08:49 2006 Subject: [lxml-dev] Re: Fedora Core 4 and lxml In-Reply-To: References: <20060310110642.7E7517EC18@postix.sdv.fr> <4411624D.70301@gkec.informatik.tu-darmstadt.de> <4411652D.1030803@infrae.com> <44116ADF.8040705@gkec.informatik.tu-darmstadt.de> Message-ID: <44117A3A.1040109@gkec.informatik.tu-darmstadt.de> Olivier Grisel wrote: > host a temporary branch/fork of Pyrex with all the necessary > patches included in the lxml repos for the convenience of lxml hackers. > > Once the patches are included in upstream Pyrex we can get rid of the > local branch. Ok, I think that's a good idea. Since there will be no real development on Pyrex on our side, I simply imported it into https://codespeak.net/svn/lxml/pyrex It's the patched version of 0.9.3.1. It also contains an SRPM and an RPM, so people can download a working version right away. http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1.tar.gz http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1-1.src.rpm http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1-1.noarch.rpm Fabien, please try these. Stefan From faassen at infrae.com Fri Mar 10 14:18:27 2006 From: faassen at infrae.com (Martijn Faassen) Date: Fri Mar 10 14:18:34 2006 Subject: [lxml-dev] Re: Fedora Core 4 and lxml In-Reply-To: <44117A3A.1040109@gkec.informatik.tu-darmstadt.de> References: <20060310110642.7E7517EC18@postix.sdv.fr> <4411624D.70301@gkec.informatik.tu-darmstadt.de> <4411652D.1030803@infrae.com> <44116ADF.8040705@gkec.informatik.tu-darmstadt.de> <44117A3A.1040109@gkec.informatik.tu-darmstadt.de> Message-ID: <44117CA3.6070103@infrae.com> Stefan Behnel wrote: > Olivier Grisel wrote: > >>host a temporary branch/fork of Pyrex with all the necessary >>patches included in the lxml repos for the convenience of lxml hackers. >> >>Once the patches are included in upstream Pyrex we can get rid of the >>local branch. > > > Ok, I think that's a good idea. Since there will be no real development on > Pyrex on our side, I simply imported it into > > https://codespeak.net/svn/lxml/pyrex Good idea under the circumstances. Unfortunate too. Perhaps we should tell them we forked it because they don't fix the problems or accept our patches to get them to do somthing. :) > It's the patched version of 0.9.3.1. It also contains an SRPM and an RPM, so > people can download a working version right away. > > http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1.tar.gz > http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1-1.src.rpm > http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1-1.noarch.rpm Oh, cool. Regards, Martijn From skink at evhr.net Fri Mar 10 14:37:53 2006 From: skink at evhr.net (Fabien SCHWOB) Date: Fri Mar 10 14:38:17 2006 Subject: [lxml-dev] Re: Fedora Core 4 and lxml In-Reply-To: <44117A3A.1040109@gkec.informatik.tu-darmstadt.de> Message-ID: <20060310133753.B46587EC13@postix.sdv.fr> > Ok, I think that's a good idea. Since there will be no real > development on Pyrex on our side, I simply imported it into > > https://codespeak.net/svn/lxml/pyrex > > It's the patched version of 0.9.3.1. It also contains an SRPM and an > RPM, so people can download a working version right away. > > http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1.tar.gz > http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1-1.src.rpm > http://codespeak.net/svn/lxml/pyrex/dist/Pyrex-0.9.3.1-1.noarch.rpm > > Fabien, please try these. Thanks, it seems to work !! Now, I can have fun with DOM and XPath. Thanks for your help everybody. And I must say that lxml definitely rocks : a very good and pythonic API, good performance and the support of a lot of "XML friends (xpath, etc..)". From skink at evhr.net Mon Mar 13 11:57:12 2006 From: skink at evhr.net (Fabien SCHWOB) Date: Mon Mar 13 11:57:59 2006 Subject: [lxml-dev] XPath error ? Message-ID: <20060313105712.170E47EC49@postix.sdv.fr> Hello, I've successfully compiled and installed lxml, but now I have XPath errors. To be precise, the error is that no nodes are returned. I'm using "//a" on "http://www.w3.org/". It's seems to be an namespace error but I haven't found how to give to the .xpath() function the default namespace. Here is my code : ==================== 8< ======================= import lxml.etree import urllib import sys def show_skel(element, ident = 0): print " "*ident, element.tag for node in element: show_skel(node, ident+4) url = "http://www.w3.org" expr_xpath = "//a" f = lxml.etree.StringIO("".join(urllib.urlopen(url).readlines())) doc = lxml.etree.parse(f) r = doc.xpath(expr_xpath) #show_skel(doc.getroot()) print len(r) if (len(r) > 0): for node in r: print node.tag ==================== 8< ======================= Thanks in advance From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Mar 13 12:18:26 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon Mar 13 12:07:20 2006 Subject: [lxml-dev] XPath error ? In-Reply-To: <20060313105712.170E47EC49@postix.sdv.fr> References: <20060313105712.170E47EC49@postix.sdv.fr> Message-ID: <44155502.2060304@gkec.informatik.tu-darmstadt.de> Fabien SCHWOB wrote: > I've successfully compiled and installed lxml, but now I have XPath > errors. To be precise, the error is that no nodes are returned. I'm > using "//a" on "http://www.w3.org/". It's seems to be an namespace error > but I haven't found how to give to the .xpath() function the default > namespace. You mean like described in doc/xpath.txt or doc/api.txt? http://codespeak.net/lxml/xpath.html http://codespeak.net/lxml/api.html The default namespace is None, but you can also use an XPath expression like "//xhtml:a" is you prefer. Stefan From skink at evhr.net Mon Mar 13 13:09:55 2006 From: skink at evhr.net (Fabien SCHWOB) Date: Mon Mar 13 13:10:17 2006 Subject: [lxml-dev] XPath error ? In-Reply-To: <44155502.2060304@gkec.informatik.tu-darmstadt.de> Message-ID: <20060313120955.5D9357EC0F@postix.sdv.fr> > You mean like described in doc/xpath.txt or doc/api.txt? > > http://codespeak.net/lxml/xpath.html > http://codespeak.net/lxml/api.html > > The default namespace is None, but you can also use an XPath > expression like > "//xhtml:a" is you prefer. I've seen these two pages. But the thing I would like to make is to make an XPath query against the default namespace (like in ). How can I make that ? Thanks From paul at zope-europe.org Mon Mar 13 13:19:46 2006 From: paul at zope-europe.org (Paul Everitt) Date: Mon Mar 13 13:20:55 2006 Subject: [lxml-dev] Re: XPath error ? In-Reply-To: <20060313120955.5D9357EC0F@postix.sdv.fr> References: <44155502.2060304@gkec.informatik.tu-darmstadt.de> <20060313120955.5D9357EC0F@postix.sdv.fr> Message-ID: <44156362.3020003@zope-europe.org> Fabien SCHWOB wrote: >> You mean like described in doc/xpath.txt or doc/api.txt? >> >> http://codespeak.net/lxml/xpath.html >> http://codespeak.net/lxml/api.html >> >> The default namespace is None, but you can also use an XPath >> expression like >> "//xhtml:a" is you prefer. > > I've seen these two pages. But the thing I would like to make is to make > an XPath query against the default namespace (like in xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">). > How can I make that ? something.xpath("//xhtml:a", {'xhtml':'http://www.w3.org/1999/xhtml'}) --Paul From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Mar 13 14:23:23 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon Mar 13 14:10:49 2006 Subject: [lxml-dev] XPath error ? In-Reply-To: <20060313120955.5D9357EC0F@postix.sdv.fr> References: <20060313120955.5D9357EC0F@postix.sdv.fr> Message-ID: <4415724B.8010401@gkec.informatik.tu-darmstadt.de> Fabien SCHWOB wrote: >> You mean like described in doc/xpath.txt or doc/api.txt? >> >> http://codespeak.net/lxml/xpath.html >> http://codespeak.net/lxml/api.html >> >> The default namespace is None, but you can also use an XPath >> expression like >> "//xhtml:a" is you prefer. > > I've seen these two pages. But the thing I would like to make is to make > an XPath query against the default namespace (like in xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">). > How can I make that ? Just as Paul told you. I was actually wrong, you can't register a default namespace prefix for XPath expressions. Sorry. If you try to do this: .>>> root = XML('
') .>>> root.xpath("//b", {None : "uri:test"}) you will receive a TypeError. That is somewhat unfortunate, but due to libxml2. I can't see a way to define the default namespace for an XPath expression in libxml2. This behaviour now also has a test case, since it's unlikely to change in the future. Thanks for pointing me at it. Anyway, just go with a non-empty prefix. Stefan From skink at evhr.net Mon Mar 13 15:28:37 2006 From: skink at evhr.net (Fabien SCHWOB) Date: Mon Mar 13 15:29:31 2006 Subject: [lxml-dev] XPath error ? In-Reply-To: <4415724B.8010401@gkec.informatik.tu-darmstadt.de> Message-ID: <20060313142837.4E2D47EC0E@postix.sdv.fr> > Just as Paul told you. I was actually wrong, you can't register a > default namespace prefix for XPath expressions. Sorry. > > If you try to do this: > > .>>> root = XML('') > .>>> root.xpath("//b", {None : "uri:test"}) > > you will receive a TypeError. That is somewhat unfortunate, but due > to libxml2. I can't see a way to define the default namespace for an > XPath expression in libxml2. This behaviour now also has a test case, > since it's unlikely to change in the future. Thanks for pointing me at > it. > > Anyway, just go with a non-empty prefix. The problem is that in the application I'm developing, I got the xpath expression from the users, and I can't them to change their expressions to add namespace. It's sad, but I think that for this project I will use ruby. I will gave a try more deeply to Python for some personal projects. Thanks for all the help you gave to me, everyone. From K.Buchcik at 4commerce.de Mon Mar 13 15:37:43 2006 From: K.Buchcik at 4commerce.de (Kasimier Buchcik) Date: Mon Mar 13 15:45:26 2006 Subject: [lxml-dev] XPath error ? In-Reply-To: <20060313142837.4E2D47EC0E@postix.sdv.fr> References: <20060313142837.4E2D47EC0E@postix.sdv.fr> Message-ID: <1142260663.1327.6.camel@librax> Hi, On Mon, 2006-03-13 at 15:28 +0100, Fabien SCHWOB wrote: > > Just as Paul told you. I was actually wrong, you can't register a > > default namespace prefix for XPath expressions. Sorry. > > > > If you try to do this: > > > > .>>> root = XML('') > > .>>> root.xpath("//b", {None : "uri:test"}) > > > > you will receive a TypeError. That is somewhat unfortunate, but due > > to libxml2. I can't see a way to define the default namespace for an > > XPath expression in libxml2. This behaviour now also has a test case, > > since it's unlikely to change in the future. Thanks for pointing me at > > it. This behaviour is based on the XPath 1.0 spec. http://www.w3.org/TR/xpath#node-tests : "if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded)" IIRC, then this was changed in XPath 2.0. [...] Regards, Kasimier From faassen at infrae.com Mon Mar 13 17:49:23 2006 From: faassen at infrae.com (Martijn Faassen) Date: Mon Mar 13 17:48:50 2006 Subject: [lxml-dev] XPath error ? In-Reply-To: <20060313142837.4E2D47EC0E@postix.sdv.fr> References: <20060313142837.4E2D47EC0E@postix.sdv.fr> Message-ID: <4415A293.5010409@infrae.com> Fabien SCHWOB wrote: >>Just as Paul told you. I was actually wrong, you can't register a >>default namespace prefix for XPath expressions. Sorry. >> >>If you try to do this: >> >>.>>> root = XML('') >>.>>> root.xpath("//b", {None : "uri:test"}) >> >>you will receive a TypeError. That is somewhat unfortunate, but due >>to libxml2. I can't see a way to define the default namespace for an >>XPath expression in libxml2. This behaviour now also has a test case, >>since it's unlikely to change in the future. Thanks for pointing me at >>it. >>Anyway, just go with a non-empty prefix. > > The problem is that in the application I'm developing, I got the xpath > expression from the users, and I can't them to change their expressions > to add namespace. > > It's sad, but I think that for this project I will use ruby. I will gave > a try more deeply to Python for some personal projects. If Ruby's XML library implements XPath 1.0 then you'll have the same issue with Ruby. lxml in this respect follows the XPath standard (and any libxml2-based library will do so unless it takes special measures to break XPath 1.0 compliance). If you really do not want this behavior, you might be able to get away with preprocessing the XML itself to rip off all namespaces. Then the (non-XPath 1.0 compliant) expressions you desire will work. As Kasimier mentioned in his followup, XPath 2.0 may do what you desire. I do not know if libxml2 will implement XPath 2.0 - if so it's certainly not anywhere close, I suspect. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 15 14:15:16 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 15 13:47:02 2006 Subject: [lxml-dev] New implementation of element.getiterator() Message-ID: <44181364.5000806@gkec.informatik.tu-darmstadt.de> Hi all, element.getiterator() now has a real iterator implementation (as opposed to the previous list builder). It is similar to the already existing child iterator in that it keeps Python references to nodes to benefit from Python garbage collection. There are two new iterator classes: * ElementDepthFirstIterator - iterates over a node and all its subelements in depth first pre-order (i.e. document order) * ElementTagFilter - filters out elements from an iterable that do not match a specific tag The combination of both implements the getiterator() functionality. This is less efficient than a single combined iterator could be, as creating Python objects just to throw them away afterwards does not come for free in lxml. However, I think that both iterators are useful in their own right, so I split them up for now, mainly to avoid code duplication. I added the following benchmarks to show the gain compared to the original implementation (see the attached logs): def bench_getiterator(self, root): list(islice(root.getiterator(), 10, 110)) def bench_getiterator_tag(self, root): list(islice(root.getiterator("{b}a"), 3, 10)) def bench_getiterator_tag_all(self, root): list(islice(root.getiterator("{b}a"), 10, 150)) Note that I had to change the tree setup to get sufficiently repetitive tag names for these tests. The second log I attached compares the new implementation to ElementTree and cElementTree. lxml now beats ET in all of the above tests and even cET in most cases. Maybe I should still consider removing the double iterator handicap... Have fun, Stefan -------------- next part -------------- Preparing test suites and trees ... Running benchmark on lxe, ET, cET Setup times for trees in seconds: lxe: -- S- U- -A SA UA T1: 0.1967 0.1532 0.1738 0.1522 0.1516 0.1511 T2: 0.1615 0.1575 0.1600 0.1603 0.1617 0.1599 T3: 0.0308 0.0294 0.0294 0.0476 0.0494 0.0484 T4: 0.0007 0.0007 0.0007 0.0011 0.0011 0.0012 ET : -- S- U- -A SA UA T1: 0.2303 0.2816 0.2180 0.2715 0.2156 0.2566 T2: 0.2992 0.2313 0.2859 0.2315 0.2774 0.3093 T3: 0.0524 0.0548 0.0519 0.0580 0.0536 0.0594 T4: 0.0009 0.0008 0.0012 0.0008 0.0008 0.0008 cET: -- S- U- -A SA UA T1: 0.0361 0.0354 0.0351 0.0350 0.0373 0.0352 T2: 0.0358 0.0362 0.0375 0.0368 0.0360 0.0364 T3: 0.0086 0.0086 0.0085 0.0122 0.0121 0.0152 T4: 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 lxe: getiterator (-- T1 ) 5.4358 5.6172 5.6089 msec/pass, best: 5.4358 ET : getiterator (-- T1 ) 43.8617 43.9253 44.0674 msec/pass, best: 43.8617 cET: getiterator (-- T1 ) 0.3359 0.3485 0.3402 msec/pass, best: 0.3359 lxe: getiterator (-- T2 ) 7.5207 10.2384 9.3765 msec/pass, best: 7.5207 ET : getiterator (-- T2 ) 45.6706 45.9193 46.3493 msec/pass, best: 45.6706 cET: getiterator (-- T2 ) 0.3397 0.3491 0.3451 msec/pass, best: 0.3397 lxe: getiterator (-- T3 ) 0.2269 0.2267 0.2315 msec/pass, best: 0.2267 ET : getiterator (-- T3 ) 11.0083 11.2267 11.2683 msec/pass, best: 11.0083 cET: getiterator (-- T3 ) 0.4715 0.3912 0.3913 msec/pass, best: 0.3912 lxe: getiterator (-- T4 ) 4.5090 0.1437 0.1446 msec/pass, best: 0.1437 ET : getiterator (-- T4 ) 0.2379 0.2461 0.2298 msec/pass, best: 0.2298 cET: getiterator (-- T4 ) 0.2123 0.1953 0.1979 msec/pass, best: 0.1953 lxe: getiterator_tag (-- T1 ) 31.7239 32.0591 32.7678 msec/pass, best: 31.7239 ET : getiterator_tag (-- T1 ) 37.6585 36.3009 36.3366 msec/pass, best: 36.3009 cET: getiterator_tag (-- T1 ) 30.9611 31.2982 32.2575 msec/pass, best: 30.9611 lxe: getiterator_tag (-- T2 ) 0.6885 0.7165 0.7245 msec/pass, best: 0.6885 ET : getiterator_tag (-- T2 ) 37.0500 37.3288 37.3845 msec/pass, best: 37.0500 cET: getiterator_tag (-- T2 ) 0.6012 0.6058 0.6023 msec/pass, best: 0.6012 lxe: getiterator_tag (-- T3 ) 6.5275 6.5976 6.5382 msec/pass, best: 6.5275 ET : getiterator_tag (-- T3 ) 9.1281 9.3763 11.5126 msec/pass, best: 9.1281 cET: getiterator_tag (-- T3 ) 7.5623 7.5396 7.4457 msec/pass, best: 7.4457 lxe: getiterator_tag (-- T4 ) 0.1903 0.1859 0.1840 msec/pass, best: 0.1840 ET : getiterator_tag (-- T4 ) 0.1951 0.1996 0.1955 msec/pass, best: 0.1951 cET: getiterator_tag (-- T4 ) 4.3555 0.1929 0.1904 msec/pass, best: 0.1904 lxe: getiterator_tag_all (-- T1 ) 33.6318 34.9976 33.9690 msec/pass, best: 33.6318 ET : getiterator_tag_all (-- T1 ) 36.3718 36.0605 35.9895 msec/pass, best: 35.9895 cET: getiterator_tag_all (-- T1 ) 31.7803 31.8099 32.6252 msec/pass, best: 31.7803 lxe: getiterator_tag_all (-- T2 ) 34.8316 34.5458 36.4856 msec/pass, best: 34.5458 ET : getiterator_tag_all (-- T2 ) 36.9701 38.3700 37.3110 msec/pass, best: 36.9701 cET: getiterator_tag_all (-- T2 ) 34.0031 32.9882 33.7108 msec/pass, best: 32.9882 lxe: getiterator_tag_all (-- T3 ) 6.6296 6.5843 6.6435 msec/pass, best: 6.5843 ET : getiterator_tag_all (-- T3 ) 9.1701 9.3035 9.5170 msec/pass, best: 9.1701 cET: getiterator_tag_all (-- T3 ) 11.8023 7.2707 7.4327 msec/pass, best: 7.2707 lxe: getiterator_tag_all (-- T4 ) 4.6077 0.1829 0.1837 msec/pass, best: 0.1829 ET : getiterator_tag_all (-- T4 ) 0.1940 0.1847 0.1873 msec/pass, best: 0.1847 cET: getiterator_tag_all (-- T4 ) 0.2014 0.1895 0.1882 msec/pass, best: 0.1882 -------------- next part -------------- Preparing test suites and trees ... Running benchmark on lxe Setup times for trees in seconds: lxe: -- S- U- -A SA UA T1: 0.2061 0.1642 0.1633 0.1646 0.1645 0.1805 T2: 0.1709 0.1703 0.1699 0.1737 0.1734 0.1740 T3: 0.0324 0.0314 0.0320 0.0504 0.0511 0.0510 T4: 0.0008 0.0008 0.0008 0.0012 0.0012 0.0012 lxe: getiterator (-- T1 ) 80.8973 77.7289 77.4245 msec/pass, best: 77.4245 lxe: getiterator (-- T2 ) 84.2623 85.7539 85.2087 msec/pass, best: 84.2623 lxe: getiterator (-- T3 ) 21.5209 17.3502 18.8891 msec/pass, best: 17.3502 lxe: getiterator (-- T4 ) 0.4163 0.4116 0.4997 msec/pass, best: 0.4116 lxe: getiterator_tag (-- T1 ) 79.2238 81.4558 79.4984 msec/pass, best: 79.2238 lxe: getiterator_tag (-- T2 ) 86.4460 86.6350 86.4704 msec/pass, best: 86.4460 lxe: getiterator_tag (-- T3 ) 22.5506 18.3274 18.5131 msec/pass, best: 18.3274 lxe: getiterator_tag (-- T4 ) 4.6912 0.4282 0.4641 msec/pass, best: 0.4282 lxe: getiterator_tag_all (-- T1 ) 85.4131 83.0776 85.7234 msec/pass, best: 83.0776 lxe: getiterator_tag_all (-- T2 ) 84.7853 83.9738 84.1555 msec/pass, best: 83.9738 lxe: getiterator_tag_all (-- T3 ) 22.0715 17.8837 17.7980 msec/pass, best: 17.7980 lxe: getiterator_tag_all (-- T4 ) 0.4290 0.4228 0.4179 msec/pass, best: 0.4179 From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 16 13:32:53 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 16 12:58:24 2006 Subject: [lxml-dev] Better error reporting Message-ID: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> Hi, since we often have requests regarding better error reporting in lxml, I implemented a little error log. lxml now keeps a bounded list of output messages from libxml2/xslt (that normally appear on stdout) and provides access to the log entries through its exceptions. LxmlException objects now have an additional attribute "error_log" that contain the last log entries (up to 20 by default). So, if an exception is raised, whoever catches it can print out the error messages from libxml2/xslt to see where things went wrong. This behaviour can be switched on or off at compile time and defaults to off. If off, the attribute simply stores an empty tuple. I hope that comes close to what was asked for. Stefan From faassen at infrae.com Thu Mar 16 13:15:45 2006 From: faassen at infrae.com (Martijn Faassen) Date: Thu Mar 16 13:16:15 2006 Subject: [lxml-dev] Better error reporting In-Reply-To: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> References: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> Message-ID: <441956F1.3070201@infrae.com> Stefan Behnel wrote: > Hi, > > since we often have requests regarding better error reporting in lxml, I > implemented a little error log. lxml now keeps a bounded list of output > messages from libxml2/xslt (that normally appear on stdout) and provides > access to the log entries through its exceptions. > > LxmlException objects now have an additional attribute "error_log" that > contain the last log entries (up to 20 by default). So, if an exception is > raised, whoever catches it can print out the error messages from libxml2/xslt > to see where things went wrong. > > This behaviour can be switched on or off at compile time and defaults to off. > If off, the attribute simply stores an empty tuple. What's the motivation for defaulting this to 'off'? > I hope that comes close to what was asked for. This sounds very useful! Great! How precise is this information? Do we see the last 5 lines for an XML parsing error that occured earlier in a later XSLT reception? We also have the case for RelaxNG/Schema reporting where no exception is raised if the XML is not valid according to the schema. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 16 14:55:17 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 16 14:20:49 2006 Subject: [lxml-dev] Better error reporting In-Reply-To: <441956F1.3070201@infrae.com> References: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> <441956F1.3070201@infrae.com> Message-ID: <44196E45.5020609@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Stefan Behnel wrote: >> since we often have requests regarding better error reporting in lxml, I >> implemented a little error log. lxml now keeps a bounded list of output >> messages from libxml2/xslt (that normally appear on stdout) and provides >> access to the log entries through its exceptions. >> >> LxmlException objects now have an additional attribute "error_log" that >> contain the last log entries (up to 20 by default). So, if an >> exception is >> raised, whoever catches it can print out the error messages from >> libxml2/xslt >> to see where things went wrong. >> >> This behaviour can be switched on or off at compile time and defaults >> to off. If off, the attribute simply stores an empty tuple. > > What's the motivation for defaulting this to 'off'? I consider this more of a debug helper than something you'd use in production. It doesn't really help you in /handling/ errors as all you get is non computer readable strings with error messages. It /does/ help you when you read through them to see where the bugs in your applications are. It does not help your /program/ to handle errors. Since the log had to be copied at the moment of exception raising, I first thought that exception creation becomes somewhat faster if it's switched off. I changed the error_log attribute to a property now that accesses the *current* log. This means that it does no longer guarantee to conserve the exact state of the error log at exception creation time (which should not be a problem if you do not make any libxml calls before catching it), but it also means that there is no longer any performance penalty for creating exceptions. So now, I would also prefer the default to be "on". > How precise is this information? Do we see the last 5 lines for an XML > parsing error that occured earlier in a later XSLT reception? Well, you see the last 20 lines - if you explicitly print them. They are not (currently) shown in the exception message or the traceback. The code cannot really try to find the line that triggered the error, so it's really just 20 lines of messages. You /could/ extend this to incorporate more of the information provided by the libxml2/xslt error calls (like error types, etc.). On XML errors, for example, libxml2 gives you this: int domain : What part of the library raised this er int code : The error code, e.g. an xmlParserError char * message : human-readable informative error messag xmlErrorLevel level : how consequent is the error char * file : the filename int line : the line number if available char * str1 : extra string information char * str2 : extra string information char * str3 : extra string information int int1 : extra number information int int2 : column number of the error or 0 if N/A void * ctxt : the parser context if available void * node : the node in the tree The problem is: the more information you put into the log, the slower the application becomes. Providing the element that triggered the error, for example, would rather be out of scope. Note that you have to convert this information to Python representations in order to store it in the log. > We also have the case for RelaxNG/Schema reporting where no exception is > raised if the XML is not valid according to the schema. I added error_log properties to the RelaxNG and XMLSchema classes. That should solve that problem. Stefan From faassen at infrae.com Thu Mar 16 16:12:21 2006 From: faassen at infrae.com (Martijn Faassen) Date: Thu Mar 16 16:12:52 2006 Subject: [lxml-dev] Better error reporting In-Reply-To: <44196E45.5020609@gkec.informatik.tu-darmstadt.de> References: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> <441956F1.3070201@infrae.com> <44196E45.5020609@gkec.informatik.tu-darmstadt.de> Message-ID: <44198055.6010406@infrae.com> Stefan Behnel wrote: > Martijn Faassen wrote: > >>Stefan Behnel wrote: >> >>>since we often have requests regarding better error reporting in lxml, I >>>implemented a little error log. lxml now keeps a bounded list of output >>>messages from libxml2/xslt (that normally appear on stdout) and provides >>>access to the log entries through its exceptions. >>> >>>LxmlException objects now have an additional attribute "error_log" that >>>contain the last log entries (up to 20 by default). So, if an >>>exception is >>>raised, whoever catches it can print out the error messages from >>>libxml2/xslt >>>to see where things went wrong. >>> >>>This behaviour can be switched on or off at compile time and defaults >>>to off. If off, the attribute simply stores an empty tuple. >> >>What's the motivation for defaulting this to 'off'? > > I consider this more of a debug helper than something you'd use in production. > It doesn't really help you in /handling/ errors as all you get is non computer > readable strings with error messages. It /does/ help you when you read through > them to see where the bugs in your applications are. It does not help your > /program/ to handle errors. It depends on what your program does. If I for instance use lxml in a larger application where I want to allow developers to upload their XSLT stylesheets, I'd like my application to show the users what errors they had in their XSLT stylesheet if they upload the wrong one. > Since the log had to be copied at the moment of exception raising, I first > thought that exception creation becomes somewhat faster if it's switched off. > I changed the error_log attribute to a property now that accesses the > *current* log. This means that it does no longer guarantee to conserve the > exact state of the error log at exception creation time (which should not be a > problem if you do not make any libxml calls before catching it), but it also > means that there is no longer any performance penalty for creating exceptions. > So now, I would also prefer the default to be "on". Okay good. Another question: How does error logging work in combination with threads? I noticed that the code in lxml that turned off the talkativeness of libxml2 actually only worked for the main thread, and that new threads that use lxml do become talkative again. >>How precise is this information? Do we see the last 5 lines for an XML >>parsing error that occured earlier in a later XSLT reception? > > Well, you see the last 20 lines - if you explicitly print them. They are not > (currently) shown in the exception message or the traceback. The code cannot > really try to find the line that triggered the error, so it's really just 20 > lines of messages. > > You /could/ extend this to incorporate more of the information provided by the > libxml2/xslt error calls (like error types, etc.). On XML errors, for example, > libxml2 gives you this: > > int domain : What part of the library raised this er > int code : The error code, e.g. an xmlParserError > char * message : human-readable informative error messag > xmlErrorLevel level : how consequent is the error > char * file : the filename > int line : the line number if available > char * str1 : extra string information > char * str2 : extra string information > char * str3 : extra string information > int int1 : extra number information > int int2 : column number of the error or 0 if N/A > void * ctxt : the parser context if available > void * node : the node in the tree > > The problem is: the more information you put into the log, the slower the > application becomes. Providing the element that triggered the error, for > example, would rather be out of scope. Note that you have to convert this > information to Python representations in order to store it in the log. I'm not too concerned that slowing down exceptions somewhat is going to impact things that badly - these exceptions are typically not occuring very often. Since it's lxml's mission to make libxml2 usable by mortal python programmers with a nice API, I consider it part of our mission to make the error API as nice as possible too, providing as much information as we can, in an easy to understand way. That's all future music though. I think this is already a great step forward, I'm just pointing where I'd like to go. >>We also have the case for RelaxNG/Schema reporting where no exception is >>raised if the XML is not valid according to the schema. > > I added error_log properties to the RelaxNG and XMLSchema classes. That should > solve that problem. Another way that might be more consistent is to add new methods that either silently validate or, in case of validation errors, raise an exception. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 16 19:39:24 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 16 19:04:48 2006 Subject: [lxml-dev] Better error reporting In-Reply-To: <44198055.6010406@infrae.com> References: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> <441956F1.3070201@infrae.com> <44196E45.5020609@gkec.informatik.tu-darmstadt.de> <44198055.6010406@infrae.com> Message-ID: <4419B0DC.9080400@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Another question: How does error logging work in combination with > threads? I noticed that the code in lxml that turned off the > talkativeness of libxml2 actually only worked for the main thread, and > that new threads that use lxml do become talkative again. According to the libxml2 docs, that's intentional. Each thread has to configure that for itself. Currently, there isn't that much in lxml anyway that takes care of threads. Everything that's module level will interfere. A way to get around this would be to set an error log in each sensible function. Hmm, I actually think that would be the right way. I'll code this up and see how it turns out. >> libxml2 gives you this: >> >> int domain : What part of the library raised this er >> int code : The error code, e.g. an xmlParserError >> char * message : human-readable informative error messag >> xmlErrorLevel level : how consequent is the error >> char * file : the filename >> int line : the line number if available >> char * str1 : extra string information >> char * str2 : extra string information >> char * str3 : extra string information >> int int1 : extra number information >> int int2 : column number of the error or 0 if N/A >> void * ctxt : the parser context if available >> void * node : the node in the tree >> >> The problem is: the more information you put into the log, the slower the >> application becomes. Providing the element that triggered the error, for >> example, would rather be out of scope. Note that you have to convert this >> information to Python representations in order to store it in the log. > > I'm not too concerned that slowing down exceptions somewhat is going to > impact things that badly - these exceptions are typically not occuring > very often. Since it's lxml's mission to make libxml2 usable by mortal > python programmers with a nice API, I consider it part of our mission to > make the error API as nice as possible too, providing as much > information as we can, in an easy to understand way. > > That's all future music though. I think this is already a great step > forward, I'm just pointing where I'd like to go. I also thought a bit more about this. It would be better to store more information and then allow filtering based on domain and error codes. RNG classes should only return RNG errors, for example (although earlier failures may have contributed to the current error...). Maybe use a dedicated log entry class rather than plain strings? >>> We also have the case for RelaxNG/Schema reporting where no exception is >>> raised if the XML is not valid according to the schema. >> >> I added error_log properties to the RelaxNG and XMLSchema classes. >> That should >> solve that problem. > > Another way that might be more consistent is to add new methods that > either silently validate or, in case of validation errors, raise an > exception. Hmmm, I don't know. If that's only for retrieving more precise error information... Maybe a method like "assert" could be meaningful here. Stefan From faassen at infrae.com Thu Mar 16 20:00:12 2006 From: faassen at infrae.com (Martijn Faassen) Date: Thu Mar 16 20:00:38 2006 Subject: [lxml-dev] Better error reporting In-Reply-To: <4419B0DC.9080400@gkec.informatik.tu-darmstadt.de> References: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> <441956F1.3070201@infrae.com> <44196E45.5020609@gkec.informatik.tu-darmstadt.de> <44198055.6010406@infrae.com> <4419B0DC.9080400@gkec.informatik.tu-darmstadt.de> Message-ID: <4419B5BC.9060906@infrae.com> Stefan Behnel wrote: > Martijn Faassen wrote: > >>Another question: How does error logging work in combination with >>threads? I noticed that the code in lxml that turned off the >>talkativeness of libxml2 actually only worked for the main thread, and >>that new threads that use lxml do become talkative again. > > According to the libxml2 docs, that's intentional. Each thread has to > configure that for itself. Currently, there isn't that much in lxml anyway > that takes care of threads. Everything that's module level will interfere. > A way to get around this would be to set an error log in each sensible > function. Hmm, I actually think that would be the right way. I'll code this up > and see how it turns out. Great! It'd be nice if threads worked with lxml, of course. One would like to have it work in a web server... [snip] >>That's all future music though. I think this is already a great step >>forward, I'm just pointing where I'd like to go. > > I also thought a bit more about this. It would be better to store more > information and then allow filtering based on domain and error codes. RNG > classes should only return RNG errors, for example (although earlier failures > may have contributed to the current error...). > > Maybe use a dedicated log entry class rather than plain strings? I haven't studied your new code, and the libxml2 error handling code was a maze of twisty passages I last looked at a long time ago, so I'm not sure I can say much that's sensible. :) If I understand you right, that sounds like the right direction though. Store more information and then in the particular exception filter out only the relevant information. You're right in that the earlier failures may have contributed to the current error, though one would expect another exception to be raised first anyway that case, right? >>>>We also have the case for RelaxNG/Schema reporting where no exception is >>>>raised if the XML is not valid according to the schema. >>> >>>I added error_log properties to the RelaxNG and XMLSchema classes. >>>That should >>>solve that problem. >> >>Another way that might be more consistent is to add new methods that >>either silently validate or, in case of validation errors, raise an >>exception. > > Hmmm, I don't know. If that's only for retrieving more precise error > information... Maybe a method like "assert" could be meaningful here. Yeah, calling it something like 'assert' would sense (of course that's a reserved word by itself; perhaps 'ensure' would be better as that avoids confusion with the built in assert and AssertionError). I think it makes a level of sense to write code like: relaxng.ensureValid(doc) ... do stuff with doc ... and then if it turns out your doc wasn't valid, you get an exception. It allows for writing quick and dirty code that ensures a document complies with the schema. Once you're ready for error handling, you add a try and except around it and handle the error. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 16 22:26:02 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 16 21:51:23 2006 Subject: [lxml-dev] Better error reporting In-Reply-To: <4419B5BC.9060906@infrae.com> References: <44195AF5.7090609@gkec.informatik.tu-darmstadt.de> <441956F1.3070201@infrae.com> <44196E45.5020609@gkec.informatik.tu-darmstadt.de> <44198055.6010406@infrae.com> <4419B0DC.9080400@gkec.informatik.tu-darmstadt.de> <4419B5BC.9060906@infrae.com> Message-ID: <4419D7EA.7030403@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Stefan Behnel wrote: >> A way to get around this would be to set an error log in each sensible >> function. Hmm, I actually think that would be the right way. I'll code >> this up >> and see how it turns out. > > Great! It'd be nice if threads worked with lxml, of course. One would > like to have it work in a web server... There is now a branch called error-reporting that implements this. It is partially untested, but at least the global logging and exception handling seems to work. It should potentially also work at a thread level, although the global logging may be broken there. My SVN log entry: large rewrite of the error handling API - use named class attributes for error domain, type and level - _LogEntry class represents xmlError structure - _ErrorLog collects error entries - global log collects all errors, rotates at 100 entries - API functions can be wrapped in log.connect() and log.disconnect() to provide a local error log (exemplified in XMLSchema and RelaxNG - untested) The last line refers to a local log that can be created at object instantiation time (i.e. at the end of RelaxNG.__init__() etc.). Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Mar 20 20:34:32 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon Mar 20 20:35:30 2006 Subject: [lxml-dev] Lxml 0.9 is out! Message-ID: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> Hello everyone, after almost five months of hacking, lxml 0.9 has finally seen the light of night. :) It has tons of new fancy features and several serious bug fixes. See the ChangeLog for details: http://codespeak.net/lxml/changes-0.9.html and the documentation for examples of the new features: http://codespeak.net/lxml/api.html http://codespeak.net/lxml/sax.html http://codespeak.net/lxml/namespace_extensions.html http://codespeak.net/lxml/extensions.html Installation has just become easier (at least on Linux): http://codespeak.net/lxml/installation.html Give it a try and keep spreading the word! Stefan From ogrisel at nuxeo.com Mon Mar 20 21:12:00 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Mon Mar 20 21:13:10 2006 Subject: [lxml-dev] Re: Lxml 0.9 is out! In-Reply-To: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> References: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> Message-ID: Stefan Behnel a ?crit : > Hello everyone, > > after almost five months of hacking, lxml 0.9 has finally seen the light of > night. :) Congrats and thank you very much for your and Martjin's hard work on lxml. Your benchmarking results are highly appreciated as well :) I just wanted to report that lxml is now directly easily_installable on my box (thanks to the cheeseshop registry): $ sudo easy_install lxml (you should add a -D option if you want that egg to replace any existing installation of lxml). I have just updated the INSTALL.txt file to make that more obvious. As a consequence, lxml can now get packaged as an egg:: $ python setup.py bdist_egg This can get uploaded to the cheeseshop (provided you are 'faasen') with:: $ python setup.py bdist_egg upload So, what do you think, should lxml be binary-packaged as an egg? For which platform? -- Olivier From behnel_ml at gkec.informatik.tu-darmstadt.de Mon Mar 20 21:28:43 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Mon Mar 20 21:29:15 2006 Subject: [lxml-dev] Re: Lxml 0.9 is out! In-Reply-To: References: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> Message-ID: <441F107B.1020809@gkec.informatik.tu-darmstadt.de> Olivier Grisel wrote: > I just wanted to report that lxml is now directly easily_installable on > my box (thanks to the cheeseshop registry): > > $ sudo easy_install lxml Its great to hear that, thanks. > I have just updated the INSTALL.txt file to make that more obvious. Good idea, we should update the web page (again). Martijn? > As a consequence, lxml can now get packaged as an egg:: > > $ python setup.py bdist_egg > > This can get uploaded to the cheeseshop (provided you are 'faasen') with:: > > $ python setup.py bdist_egg upload > > So, what do you think, should lxml be binary-packaged as an egg? For > which platform? That's one of the problems we had: which architecture. I can't currently provide x86 binaries, but it would be good to have them for Py2.3 and Py2.4. Eggs would be great for that. Debian is also a good place to put 0.9! :) Mac-OS binaries would be great, too. We had some discussion about Windows binaries, so if anyone could package up libxml2/libxslt and lxml in an easily installable binary package, that would be *very* much appreciated. Thanks for any contributions, Stefan From faassen at infrae.com Tue Mar 21 11:22:27 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 11:23:19 2006 Subject: [lxml-dev] Re: Lxml 0.9 is out! In-Reply-To: References: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> Message-ID: <441FD3E3.3050008@infrae.com> Hi there, Olivier Grisel wrote: > As a consequence, lxml can now get packaged as an egg:: > > $ python setup.py bdist_egg > > This can get uploaded to the cheeseshop (provided you are 'faasen') with:: > > $ python setup.py bdist_egg upload > So, what do you think, should lxml be binary-packaged as an egg? For > which platform? Sure! I've just tried what you typed and uploaded an egg. Do you know how to upload eggs contributed by others? For instance, what if a Windows user made an egg and I wanted to upload it? Regards, Martijn From ogrisel at nuxeo.com Tue Mar 21 11:37:20 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Tue Mar 21 11:38:22 2006 Subject: [lxml-dev] Re: Lxml 0.9 is out! In-Reply-To: <441FD3E3.3050008@infrae.com> References: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> <441FD3E3.3050008@infrae.com> Message-ID: <441FD760.7000509@nuxeo.com> Martijn Faassen a ?crit : > Hi there, > > Olivier Grisel wrote: > >> As a consequence, lxml can now get packaged as an egg:: >> >> $ python setup.py bdist_egg >> >> This can get uploaded to the cheeseshop (provided you are 'faasen') >> with:: >> >> $ python setup.py bdist_egg upload Great, I have just easy_installed your egg on my box and it's works : no build step required :) >> So, what do you think, should lxml be binary-packaged as an egg? For >> which platform? > > Sure! I've just tried what you typed and uploaded an egg. > > Do you know how to upload eggs contributed by others? For instance, what > if a Windows user made an egg and I wanted to upload it? I guess this should work:: http://cheeseshop.python.org/pypi?name=lxml&version=0.9&:action=files The weird thing is that apparently, I can upload packages on lxml with my account even if I'm not registered as the maintainer of lxml. I think you should gpg-sign the eggs you upload to be able to better trust binary packages. -- Olivier From ogrisel at nuxeo.com Tue Mar 21 11:46:36 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Tue Mar 21 11:47:45 2006 Subject: [lxml-dev] Re: Lxml 0.9 is out! In-Reply-To: <441FD3E3.3050008@infrae.com> References: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> <441FD3E3.3050008@infrae.com> Message-ID: Martijn Faassen a ?crit : > Hi there, > > Olivier Grisel wrote: > >> As a consequence, lxml can now get packaged as an egg:: >> >> $ python setup.py bdist_egg >> >> This can get uploaded to the cheeseshop (provided you are 'faasen') >> with:: >> >> $ python setup.py bdist_egg upload BTW, you can also upload the tar.gz on pypi if you want with:: $ python setup.py sdist upload -- Olivier From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 21 11:55:17 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 21 11:55:52 2006 Subject: [lxml-dev] Re: Lxml 0.9 is out! In-Reply-To: <441FD760.7000509@nuxeo.com> References: <441F03C8.20000@gkec.informatik.tu-darmstadt.de> <441FD3E3.3050008@infrae.com> <441FD760.7000509@nuxeo.com> Message-ID: <441FDB95.8030103@gkec.informatik.tu-darmstadt.de> Olivier Grisel wrote: > I guess this should work:: > > http://cheeseshop.python.org/pypi?name=lxml&version=0.9&:action=files > > The weird thing is that apparently, I can upload packages on lxml with > my account even if I'm not registered as the maintainer of lxml. I think > you should gpg-sign the eggs you upload to be able to better trust > binary packages. I just tried uploading the source tar, but it doesn't appear on the page. So, while anyone seems to be allowed to upload files, only the files sent by the maintainer seem to appear on the page. Signing is a good idea anyway. Stefan From gracinet at nuxeo.com Tue Mar 21 13:21:45 2006 From: gracinet at nuxeo.com (Georges Racinet) Date: Tue Mar 21 13:22:20 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX Message-ID: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> Hi, I just built lxml 0.9 on my OS X machine and made an egg: total 712 -rw-r--r-- 1 gracinet wheel 363037 Mar 21 12:48 lxml-0.9-py2.4- macosx-10.4-ppc.egg How do you want me to send it ? I didn't really try the package yet, but I ran the tests: $ make test python242 setup.py build_ext -i running build_ext python242 test.py -p -v 230/302 ( 76.2%): Doctest: extensions.txt ---------------------------------------------------------------------- Ran 230 tests in 1.594s OK Is it really normal to run only 230 of them ? Additional info: this is a G5 machine running OS X.4, with a fink (http:// fink.sourceforge.net) install on top of the base system $ xslt-config --version 1.1.14 I wonder if the egg could be used on a vanilla OSX machine (dynamic libs?). Here's what I've got from fink: $ ls /sw/lib/libxml2.* /sw/lib/libxml2.2.6.20.dylib /sw/lib/libxml2.dylib /sw/lib/libxml2.2.dylib /sw/lib/libxml2.la /sw/lib/libxml2.a --------- Georges Racinet Nuxeo SAS gracinet@nuxeo.com http://nuxeo.com Tel: +33 (0) 1 40 33 71 73 From faassen at infrae.com Tue Mar 21 13:28:20 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 13:28:50 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> Message-ID: <441FF164.5060802@infrae.com> Georges Racinet wrote: > Hi, I just built lxml 0.9 on my OS X machine and made an egg: > total 712 > -rw-r--r-- 1 gracinet wheel 363037 Mar 21 12:48 lxml-0.9-py2.4- > macosx-10.4-ppc.egg > > How do you want me to send it ? I didn't really try the package yet, > but I ran the tests: > > $ make test > python242 setup.py build_ext -i > running build_ext > python242 test.py -p -v > 230/302 ( 76.2%): Doctest: extensions.txt > ---------------------------------------------------------------------- > Ran 230 tests in 1.594s > > OK > > Is it really normal to run only 230 of them ? That's odd, I get 362 out of 434 on my machine. python test.py -p -v 362/434 ( 83.4%): Doctest: extensions.txt ---------------------------------------------------------------------- Ran 362 tests in 0.773s Possibly your test runner is picking up an older lxml installed into your site-packages that's picked up before the one src, and running those tests? Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 21 13:33:19 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 21 13:34:00 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <441FF164.5060802@infrae.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> Message-ID: <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Georges Racinet wrote: >> Hi, I just built lxml 0.9 on my OS X machine and made an egg: >> total 712 >> -rw-r--r-- 1 gracinet wheel 363037 Mar 21 12:48 lxml-0.9-py2.4- >> macosx-10.4-ppc.egg >> >> How do you want me to send it ? I didn't really try the package yet, >> but I ran the tests: >> >> $ make test >> python242 setup.py build_ext -i >> running build_ext >> python242 test.py -p -v >> 230/302 ( 76.2%): Doctest: extensions.txt >> >> ---------------------------------------------------------------------- >> Ran 230 tests in 1.594s >> >> OK >> >> Is it really normal to run only 230 of them ? > > That's odd, I get 362 out of 434 on my machine. He doesn't have ElementTree installed. Stefan From faassen at infrae.com Tue Mar 21 14:17:54 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 14:18:33 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> Message-ID: <441FFD02.5010800@infrae.com> Stefan Behnel wrote: > Martijn Faassen wrote: [snip] >>> >>>Is it really normal to run only 230 of them ? >> >>That's odd, I get 362 out of 434 on my machine. > > He doesn't have ElementTree installed. *slaps forehead* Heh, forgot all about that! Okay, next question about this egg: I expect it will only work on Mac OS X with fink, not vanilla Mac OS X as these have libxml2 libraries that are too old, I think. Can we do something about that? I guess it needs someone with a lot of insights in eggs and the like, but it doesn't sound hopeful. Regards, Martijn From ogrisel at nuxeo.com Tue Mar 21 14:45:38 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Tue Mar 21 14:46:51 2006 Subject: [lxml-dev] Re: lxml 0.9 on MacOSX In-Reply-To: <441FFD02.5010800@infrae.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> Message-ID: <44200382.80602@nuxeo.com> Martijn Faassen a ?crit : > Stefan Behnel wrote: >> He doesn't have ElementTree installed. > > *slaps forehead* > > Heh, forgot all about that! > > Okay, next question about this egg: I expect it will only work on Mac OS > X with fink, not vanilla Mac OS X as these have libxml2 libraries that > are too old, I think. Can we do something about that? I guess it needs > someone with a lot of insights in eggs and the like, but it doesn't > sound hopeful. You mean embedding or statically linking the required version of libxml2 and libxslt inside the egg? That might be a good idea for both MacOSX and Windows. How can we do that with distutils/setuptools? -- Olivier From gracinet at nuxeo.com Tue Mar 21 15:05:01 2006 From: gracinet at nuxeo.com (Georges Racinet) Date: Tue Mar 21 15:05:33 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <441FFD02.5010800@infrae.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> Message-ID: On Mar 21, 2006, at 2:17 PM, Martijn Faassen wrote: > Stefan Behnel wrote: >> Martijn Faassen wrote: > [snip] >>>> >>>> Is it really normal to run only 230 of them ? >>> >>> That's odd, I get 362 out of 434 on my machine. >> He doesn't have ElementTree installed. Thanks, I installed elementtree-1.2.6_20050316-py2.4.egg and the 362 tests ran fine. > Okay, next question about this egg: I expect it will only work on > Mac OS X with fink, not vanilla Mac OS X as these have libxml2 > libraries that are too old, I think. Can we do something about > that? I guess it needs someone with a lot of insights in eggs and > the like, but it doesn't sound hopeful. Well actually the install contains the full path to used libs, so my egg is really for fink users. I got this in tests after moving fink libs away: ImportError: Failure linking new module: /Users/Shared/PythonLibs/ lxml-0.9/src/lxml/etree.so: Library not loaded: /sw/lib/libxml2.2.dylib Referenced from: /Users/Shared/PythonLibs/lxml-0.9/src/lxml/ etree.so Reason: image not found from http://www.explain.com.au/oss/libxml2xslt.html: > Using Tiger? xmllint and libxml2 version 2.6.16 are built-in! > xsltproc and libxslt version 1.1.11 are also included. > > Using Panther? xmllint and libxml2 version 2.5.4 are built-in! Even > better, 10.3.9 has libxml2-2.6.16 and libxslt-1.1.9 built-in! Is that enough ? There's also this (loc.cit.) > Binary distributions for Mac OS X of the latest Gnome XML > processing libraries: libxml2 version 2.6.22 and libxslt version > 1.1.15 are now available. --------- Georges Racinet Nuxeo SAS gracinet@nuxeo.com http://nuxeo.com Tel: +33 (0) 1 40 33 71 73 From faassen at infrae.com Tue Mar 21 16:06:05 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 16:06:32 2006 Subject: [lxml-dev] Re: lxml 0.9 on MacOSX In-Reply-To: <44200382.80602@nuxeo.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <44200382.80602@nuxeo.com> Message-ID: <4420165D.8010900@infrae.com> Olivier Grisel wrote: > Martijn Faassen a ?crit : > >> Stefan Behnel wrote: >> >>> He doesn't have ElementTree installed. >> >> >> *slaps forehead* >> >> Heh, forgot all about that! >> >> Okay, next question about this egg: I expect it will only work on Mac >> OS X with fink, not vanilla Mac OS X as these have libxml2 libraries >> that are too old, I think. Can we do something about that? I guess it >> needs someone with a lot of insights in eggs and the like, but it >> doesn't sound hopeful. > > > You mean embedding or statically linking the required version of libxml2 > and libxslt inside the egg? That might be a good idea for both MacOSX > and Windows. How can we do that with distutils/setuptools? That'd be very nice of course! Unfortunately I do not know whether this can be done with setuptools - someone would need to research it. Regards, Martijn From faassen at infrae.com Tue Mar 21 16:07:50 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 16:08:14 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> Message-ID: <442016C6.7070102@infrae.com> Georges Racinet wrote: >> Using Tiger? xmllint and libxml2 version 2.6.16 are built-in! >> xsltproc and libxslt version 1.1.11 are also included. >> >> Using Panther? xmllint and libxml2 version 2.5.4 are built-in! Even >> better, 10.3.9 has libxml2-2.6.16 and libxslt-1.1.9 built-in! > > Is that enough ? Officially we require libxml2 2.6.16 and libxslt 1.1.12, so only the libxslt might be a problem for Tiger. With Panther, it's way too old. Regards, Martijn From gracinet at nuxeo.com Tue Mar 21 16:33:14 2006 From: gracinet at nuxeo.com (Georges Racinet) Date: Tue Mar 21 16:34:13 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <442016C6.7070102@infrae.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <442016C6.7070102@infrae.com> Message-ID: On Mar 21, 2006, at 4:07 PM, Martijn Faassen wrote: > Georges Racinet wrote: > >>> Using Tiger? xmllint and libxml2 version 2.6.16 are built-in! >>> xsltproc and libxslt version 1.1.11 are also included. >>> >>> Using Panther? xmllint and libxml2 version 2.5.4 are built-in! >>> Even better, 10.3.9 has libxml2-2.6.16 and libxslt-1.1.9 built-in! >> Is that enough ? > > Officially we require libxml2 2.6.16 and libxslt 1.1.12, so only > the libxslt might be a problem for Tiger. With Panther, it's way > too old. I tried with the system's libxslt, and unfortunately got 2 failures in tests: > ====================================================================== > FAIL: Doctest: api.txt > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/local/python242/lib/python2.4/unittest.py", line 260, > in run > testMethod() > File "/Users/Shared/PythonLibs/lxml-0.9/src/doctest.py", line > 2187, in runTest > raise self.failureException(self.format_failure(new.getvalue())) > AssertionError: Failed doctest test for api.txt > File "/Users/Shared/PythonLibs/lxml-0.9/src/lxml/tests/../../../ > doc/api.txt", line 0 > > ---------------------------------------------------------------------- > File "/Users/Shared/PythonLibs/lxml-0.9/src/lxml/tests/../../../doc/ > api.txt", line 299, in api.txt > Failed example: > print log.filter_from_errors() > Expected: > :1:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'c': > This element is not expected. Expected is ( b ). > Got: > :1:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element > 'a' [CT 'AType']: The element content is not valid. > > > ---------------------------------------------------------------------- > Ran 362 tests in 2.393s --------- Georges Racinet Nuxeo SAS gracinet@nuxeo.com http://nuxeo.com Tel: +33 (0) 1 40 33 71 73 From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 21 16:57:06 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 21 16:58:09 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <442016C6.7070102@infrae.com> Message-ID: <44202252.3050007@gkec.informatik.tu-darmstadt.de> Georges Racinet wrote: > On Mar 21, 2006, at 4:07 PM, Martijn Faassen wrote: > >> Georges Racinet wrote: >> >>>> Using Tiger? xmllint and libxml2 version 2.6.16 are built-in! >>>> xsltproc and libxslt version 1.1.11 are also included. >>>> >>>> Using Panther? xmllint and libxml2 version 2.5.4 are built-in! Even >>>> better, 10.3.9 has libxml2-2.6.16 and libxslt-1.1.9 built-in! >>> Is that enough ? >> >> Officially we require libxml2 2.6.16 and libxslt 1.1.12, so only the >> libxslt might be a problem for Tiger. With Panther, it's way too old. > > I tried with the system's libxslt, You're on Tiger, I assume? > and unfortunately got 2 failures in tests: > >> ====================================================================== >> FAIL: Doctest: api.txt >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "/usr/local/python242/lib/python2.4/unittest.py", line 260, in run >> testMethod() >> File "/Users/Shared/PythonLibs/lxml-0.9/src/doctest.py", line 2187, >> in runTest >> raise self.failureException(self.format_failure(new.getvalue())) >> AssertionError: Failed doctest test for api.txt >> File >> "/Users/Shared/PythonLibs/lxml-0.9/src/lxml/tests/../../../doc/api.txt", >> line 0 >> >> ---------------------------------------------------------------------- >> File >> "/Users/Shared/PythonLibs/lxml-0.9/src/lxml/tests/../../../doc/api.txt", >> line 299, in api.txt >> Failed example: >> print log.filter_from_errors() >> Expected: >> :1:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'c': >> This element is not expected. Expected is ( b ). >> Got: >> :1:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'a' [CT >> 'AType']: The element content is not valid. >> >> >> ---------------------------------------------------------------------- >> Ran 362 tests in 2.393s Hmm, it's unfortunate I used the literal error message in the doctest... That's just one error you got there, but it may mean two things: 1) it works, but the error message is different between different versions. 2) it doesn't work. I know, this is not very clean, but I changed the doctest in api.txt (SVN trunk) to not check the error message string returned by libxml2. The resulting error type is actually correct (the validation failed because of an element), but I guess the XML Schema validation code has changed a bit since that version. I attached the patch for the doctest. This solution means that we accept an older version of libxml2 than we normally should, but if it's only the XML Schema code that's outdated, then that may be acceptable. At least, we get a Mac-OS version of lxml that works with the default system libraries! Could you apply the patch and try again? (Note: if you run into problems with Pyrex or get "swig_sources" errors, please read INSTALL.txt) Thank you for the patience, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: api-doctest.patch Type: text/x-patch Size: 871 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060321/fdeadd23/api-doctest.bin From gracinet at nuxeo.com Tue Mar 21 17:35:12 2006 From: gracinet at nuxeo.com (Georges Racinet) Date: Tue Mar 21 17:36:00 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <44202252.3050007@gkec.informatik.tu-darmstadt.de> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <442016C6.7070102@infrae.com> <44202252.3050007@gkec.informatik.tu-darmstadt.de> Message-ID: <966EEEAD-0930-43A2-A45E-52AA9E5579EC@nuxeo.com> On Mar 21, 2006, at 4:57 PM, Stefan Behnel wrote: > > > Georges Racinet wrote: >> On Mar 21, 2006, at 4:07 PM, Martijn Faassen wrote: >> >>> Georges Racinet wrote: >>> >>>>> Using Tiger? xmllint and libxml2 version 2.6.16 are built-in! >>>>> xsltproc and libxslt version 1.1.11 are also included. >>>>> >>>>> Using Panther? xmllint and libxml2 version 2.5.4 are built-in! >>>>> Even >>>>> better, 10.3.9 has libxml2-2.6.16 and libxslt-1.1.9 built-in! >>>> Is that enough ? >>> >>> Officially we require libxml2 2.6.16 and libxslt 1.1.12, so only the >>> libxslt might be a problem for Tiger. With Panther, it's way too >>> old. >> >> I tried with the system's libxslt, > > > You're on Tiger, I assume? I am. > Hmm, it's unfortunate I used the literal error message in the > doctest... > > That's just one error you got there, but it may mean two things: > 1) it works, but the error message is different between different > versions. > 2) it doesn't work. > > I know, this is not very clean, but I changed the doctest in > api.txt (SVN > trunk) to not check the error message string returned by libxml2. The > resulting error type is actually correct (the validation failed > because of an > element), but I guess the XML Schema validation code has changed a > bit since > that version. > > I attached the patch for the doctest. This solution means that we > accept an > older version of libxml2 than we normally should, but if it's only > the XML > Schema code that's outdated, then that may be acceptable. At least, > we get a > Mac-OS version of lxml that works with the default system libraries! > > Could you apply the patch and try again? I did, and the 362 of them now pass. Basic manual tests showed no problem either. I laid a new egg, where to put it ? --------- Georges Racinet Nuxeo SAS gracinet@nuxeo.com http://nuxeo.com Tel: +33 (0) 1 40 33 71 73 From robert.kern at gmail.com Tue Mar 21 17:35:59 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Mar 21 17:37:30 2006 Subject: [lxml-dev] Re: lxml 0.9 on MacOSX In-Reply-To: <4420165D.8010900@infrae.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <44200382.80602@nuxeo.com> <4420165D.8010900@infrae.com> Message-ID: Martijn Faassen wrote: > Olivier Grisel wrote: > >> Martijn Faassen a ?crit : >> >>> Okay, next question about this egg: I expect it will only work on Mac >>> OS X with fink, not vanilla Mac OS X as these have libxml2 libraries >>> that are too old, I think. Can we do something about that? I guess it >>> needs someone with a lot of insights in eggs and the like, but it >>> doesn't sound hopeful. >> >> You mean embedding or statically linking the required version of >> libxml2 and libxslt inside the egg? That might be a good idea for both >> MacOSX and Windows. How can we do that with distutils/setuptools? > > That'd be very nice of course! Unfortunately I do not know whether this > can be done with setuptools - someone would need to research it. The easiest way to do this, by far, is to compile static libraries and make sure you link against those by fiddling with library_dirs. I'm pretty sure that Phillip recently added some capabilities to setuptools to store dynamic libraries in eggs as well, but I don't recall the details. -- Robert Kern robert.kern@gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From howe at carcass.dhs.org Tue Mar 21 17:39:56 2006 From: howe at carcass.dhs.org (Steve Howe) Date: Tue Mar 21 17:40:44 2006 Subject: [lxml-dev] lxml 0.9 Win32 build Message-ID: <444577111.20060321133956@carcass.dhs.org> Hello all, Here is a contribution of a Win32 lxml 0.9 binary build for Python 2.4: http://carcass.dhs.org/lxml-0.9.win32-py2.4.exe There are *no* libxml/libxslt dlls on purpose. Those who need these libraries, please refer to: http://www.zlatkovic.com/libxml.en.html Thanks Martijn, Stefan and all involved in the development of lxml. -- Best regards, Steve mailto:howe@carcass.dhs.org From faassen at infrae.com Tue Mar 21 18:01:51 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 18:02:31 2006 Subject: [lxml-dev] lxml 0.9 on MacOSX In-Reply-To: <966EEEAD-0930-43A2-A45E-52AA9E5579EC@nuxeo.com> References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <442016C6.7070102@infrae.com> <44202252.3050007@gkec.informatik.tu-darmstadt.de> <966EEEAD-0930-43A2-A45E-52AA9E5579EC@nuxeo.com> Message-ID: <4420317F.5080502@infrae.com> Georges Racinet wrote: [snip] > I did, and the 362 of them now pass. Basic manual tests showed no > problem either. > I laid a new egg, where to put it ? As Stefan said in another mail, send it along to me. I'll try to figure out where to put it. :) Regards, Martijn From faassen at infrae.com Tue Mar 21 18:06:16 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 18:06:54 2006 Subject: [lxml-dev] Re: lxml 0.9 on MacOSX In-Reply-To: References: <469CFC84-68CB-490B-9333-276198496891@nuxeo.com> <441FF164.5060802@infrae.com> <441FF28F.4090702@gkec.informatik.tu-darmstadt.de> <441FFD02.5010800@infrae.com> <44200382.80602@nuxeo.com> <4420165D.8010900@infrae.com> Message-ID: <44203288.5040305@infrae.com> Robert Kern wrote: > Martijn Faassen wrote: > >>Olivier Grisel wrote: >> >> >>>Martijn Faassen a ?crit : >>> >>> >>>>Okay, next question about this egg: I expect it will only work on Mac >>>>OS X with fink, not vanilla Mac OS X as these have libxml2 libraries >>>>that are too old, I think. Can we do something about that? I guess it >>>>needs someone with a lot of insights in eggs and the like, but it >>>>doesn't sound hopeful. >>> >>>You mean embedding or statically linking the required version of >>>libxml2 and libxslt inside the egg? That might be a good idea for both >>>MacOSX and Windows. How can we do that with distutils/setuptools? >> >>That'd be very nice of course! Unfortunately I do not know whether this >>can be done with setuptools - someone would need to research it. > > > The easiest way to do this, by far, is to compile static libraries and make sure > you link against those by fiddling with library_dirs. > > I'm pretty sure that Phillip recently added some capabilities to setuptools to > store dynamic libraries in eggs as well, but I don't recall the details. Cool! If someone can figure it out for us, I'm happy to upload the resultant eggs. :) Regards, Martijn From faassen at infrae.com Tue Mar 21 19:31:42 2006 From: faassen at infrae.com (Martijn Faassen) Date: Tue Mar 21 19:32:10 2006 Subject: [lxml-dev] eggs Message-ID: <4420468E.8000003@infrae.com> Hi there, I've uploaded lxml 0.9 eggs for both Windows (thanks Steve Howe) and Mac OS X (thanks Georges Racinet) to the Python cheeseshop. The source is there now too: http://cheeseshop.python.org/pypi/lxml/0.9 Thanks everybody! Oh, we should update INSTALL.txt to have a link to the cheeseshop as well. Regards, Martijn From ogrisel at nuxeo.com Tue Mar 21 19:58:47 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Tue Mar 21 20:02:15 2006 Subject: [lxml-dev] Re: eggs In-Reply-To: <4420468E.8000003@infrae.com> References: <4420468E.8000003@infrae.com> Message-ID: Martijn Faassen a ?crit : > Hi there, > > I've uploaded lxml 0.9 eggs for both Windows (thanks Steve Howe) and Mac > OS X (thanks Georges Racinet) to the Python cheeseshop. The source is > there now too: > > http://cheeseshop.python.org/pypi/lxml/0.9 > > Thanks everybody! Oh, we should update INSTALL.txt to have a link to the > cheeseshop as well. I cannot see the windows egg, are you sure you did upload it ? -- Olivier From delza at livingcode.org Tue Mar 21 22:24:57 2006 From: delza at livingcode.org (Dethe Elza) Date: Tue Mar 21 22:25:39 2006 Subject: [lxml-dev] eggs In-Reply-To: <4420468E.8000003@infrae.com> References: <4420468E.8000003@infrae.com> Message-ID: <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> That's great news, but I don't see the Windows egg either. --Dethe On 3/21/06, Martijn Faassen wrote: > Hi there, > > I've uploaded lxml 0.9 eggs for both Windows (thanks Steve Howe) and Mac > OS X (thanks Georges Racinet) to the Python cheeseshop. The source is > there now too: > > http://cheeseshop.python.org/pypi/lxml/0.9 > > Thanks everybody! Oh, we should update INSTALL.txt to have a link to the > cheeseshop as well. > > Regards, > > Martijn > > _______________________________________________ > lxml-dev mailing list > lxml-dev@codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From faassen at infrae.com Tue Mar 21 23:59:25 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 22 00:00:23 2006 Subject: [lxml-dev] Re: eggs In-Reply-To: References: <4420468E.8000003@infrae.com> Message-ID: <4420854D.6030409@infrae.com> Olivier Grisel wrote: > Martijn Faassen a ?crit : >> Hi there, >> >> I've uploaded lxml 0.9 eggs for both Windows (thanks Steve Howe) and >> Mac OS X (thanks Georges Racinet) to the Python cheeseshop. The source >> is there now too: >> >> http://cheeseshop.python.org/pypi/lxml/0.9 >> >> Thanks everybody! Oh, we should update INSTALL.txt to have a link to >> the cheeseshop as well. > > I cannot see the windows egg, are you sure you did upload it ? Heh, I just realized I didn't upload it and came to this mailing list nobody had noticed yet. You're right! I shall look whether I actually have a windows egg and upload it when I find it. :) Meanwhile there's a link on the installation page on our website that links to windows versions. Regards, Martijn From faassen at infrae.com Wed Mar 22 00:03:29 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 22 00:04:02 2006 Subject: [lxml-dev] eggs In-Reply-To: <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> Message-ID: <44208641.5020208@infrae.com> Dethe Elza wrote: > That's great news, but I don't see the Windows egg either. You're right, I was fooling myself in thinking I had a windows egg. Instead we do have a windows build, linked from here: http://codespeak.net/lxml/installation.html It'd be nice if Steve could make my unreality true by donating a Windows egg. :) Regards, Martijn From howe at carcass.dhs.org Wed Mar 22 02:26:27 2006 From: howe at carcass.dhs.org (Steve Howe) Date: Wed Mar 22 02:27:38 2006 Subject: [lxml-dev] eggs In-Reply-To: <44208641.5020208@infrae.com> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> Message-ID: <17610414400.20060321222627@carcass.dhs.org> Hello Martijn, Tuesday, March 21, 2006, 8:03:29 PM, you wrote: > Dethe Elza wrote: >> That's great news, but I don't see the Windows egg either. > You're right, I was fooling myself in thinking I had a windows egg. > Instead we do have a windows build, linked from here: > http://codespeak.net/lxml/installation.html > It'd be nice if Steve could make my unreality true by donating a Windows > egg. :) Sure, what should I do ? Is there some setuptools script I could base on ? I have no knowledge about it, it would take some time if I had to discover everything myself... -- Best regards, Steve mailto:howe@carcass.dhs.org From howe at carcass.dhs.org Wed Mar 22 08:26:02 2006 From: howe at carcass.dhs.org (Steve Howe) Date: Wed Mar 22 08:27:00 2006 Subject: [lxml-dev] eggs In-Reply-To: <17610414400.20060321222627@carcass.dhs.org> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> Message-ID: <15510497526.20060322042602@carcass.dhs.org> Hello Steve, Tuesday, March 21, 2006, 10:26:27 PM, you wrote: > Sure, what should I do ? Is there some setuptools script I could base on > ? I have no knowledge about it, it would take some time if I had to > discover everything myself... Nevermind, I found it out. The egg is in here: http://carcass.dhs.org/lxml-0.9-py2.4-win32.egg It is untested; someone please do it. -- Best regards, Steve mailto:howe@carcass.dhs.org From faassen at infrae.com Wed Mar 22 10:56:10 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 22 10:57:00 2006 Subject: [lxml-dev] eggs In-Reply-To: <15510497526.20060322042602@carcass.dhs.org> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> <15510497526.20060322042602@carcass.dhs.org> Message-ID: <44211F3A.3080506@infrae.com> Steve Howe wrote: > Hello Steve, > > Tuesday, March 21, 2006, 10:26:27 PM, you wrote: > > >>Sure, what should I do ? Is there some setuptools script I could base on >>? I have no knowledge about it, it would take some time if I had to >>discover everything myself... > > Nevermind, I found it out. The egg is in here: > > http://carcass.dhs.org/lxml-0.9-py2.4-win32.egg > > It is untested; someone please do it. Great! I'll upload it to the cheeseshop so it gets some test coverage. :) Regards, Martijn From faassen at infrae.com Wed Mar 22 10:58:16 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 22 10:58:50 2006 Subject: [lxml-dev] eggs In-Reply-To: <44211F3A.3080506@infrae.com> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> <15510497526.20060322042602@carcass.dhs.org> <44211F3A.3080506@infrae.com> Message-ID: <44211FB8.5090605@infrae.com> Martijn Faassen wrote: > Steve Howe wrote: > >> Hello Steve, >> >> Tuesday, March 21, 2006, 10:26:27 PM, you wrote: >> >> >>> Sure, what should I do ? Is there some setuptools script I could base on >>> ? I have no knowledge about it, it would take some time if I had to >>> discover everything myself... >> >> >> Nevermind, I found it out. The egg is in here: >> >> http://carcass.dhs.org/lxml-0.9-py2.4-win32.egg >> >> It is untested; someone please do it. > > Great! > > I'll upload it to the cheeseshop so it gets some test coverage. :) I just uploaded it, so please Windows users test it out. One thing I noticed is that it's significantly smaller than the mac or linux egg for some reason. Perhaps that's just a compiler producing more compact binaries. Regards, Martijn From howe at carcass.dhs.org Wed Mar 22 11:19:23 2006 From: howe at carcass.dhs.org (Steve Howe) Date: Wed Mar 22 11:20:17 2006 Subject: [lxml-dev] eggs In-Reply-To: <44211FB8.5090605@infrae.com> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> <15510497526.20060322042602@carcass.dhs.org> <44211F3A.3080506@infrae.com> <44211FB8.5090605@infrae.com> Message-ID: <666505837.20060322071923@carcass.dhs.org> Hello Martijn, Wednesday, March 22, 2006, 6:58:16 AM, you wrote: > I just uploaded it, so please Windows users test it out. One thing I > noticed is that it's significantly smaller than the mac or linux egg for > some reason. Perhaps that's just a compiler producing more compact binaries. Probably because it is dynamically linked, while on linux it is statically linked against libxml/libxslt ?.... -- Best regards, Steve mailto:howe@carcass.dhs.org From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 22 11:26:57 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 22 11:27:37 2006 Subject: [lxml-dev] eggs In-Reply-To: <44211FB8.5090605@infrae.com> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> <15510497526.20060322042602@carcass.dhs.org> <44211F3A.3080506@infrae.com> <44211FB8.5090605@infrae.com> Message-ID: <44212671.3090605@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: >> Steve Howe wrote: >>> http://carcass.dhs.org/lxml-0.9-py2.4-win32.egg > > I just uploaded it, so please Windows users test it out. One thing I > noticed is that it's significantly smaller than the mac or linux egg for > some reason. Perhaps that's just a compiler producing more compact > binaries. Both are dynamically linked, but maybe etree.so is stripped in the windows version? I just downloaded the Linux/i686 egg and the etree.so in there looks huge (1MB). When stripped, it jumps down to some 400K. Maybe you can rebuild it (with suitable CFLAGS, possibly -Os), then strip it and run the upload again so that it gets a little smaller. Just run CFLAGS="choose me" python setup.py bdist_egg strip build/*/lxml/etree.so python setup.py bdist_egg upload The last command will not rebuild the extension, just upload it. I just did that for an x86-64 egg. Oh, and: you have to remove the original upload via the web interface before you can replace the file. Stefan From gracinet at nuxeo.com Wed Mar 22 11:26:48 2006 From: gracinet at nuxeo.com (Georges Racinet) Date: Wed Mar 22 11:27:38 2006 Subject: [lxml-dev] eggs In-Reply-To: <666505837.20060322071923@carcass.dhs.org> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> <15510497526.20060322042602@carcass.dhs.org> <44211F3A.3080506@infrae.com> <44211FB8.5090605@infrae.com> <666505837.20060322071923@carcass.dhs.org> Message-ID: <89807BD2-7062-4045-A965-1C755B5999B3@nuxeo.com> On Mar 22, 2006, at 11:19 AM, Steve Howe wrote: > Hello Martijn, > > Wednesday, March 22, 2006, 6:58:16 AM, you wrote: > >> I just uploaded it, so please Windows users test it out. One thing I >> noticed is that it's significantly smaller than the mac or linux >> egg for >> some reason. Perhaps that's just a compiler producing more compact >> binaries. > Probably because it is dynamically linked, while on linux it is > statically linked against libxml/libxslt ?.... But I've seen on MacOSX that these libs are dynamically linked. There shouldn't be any difference in a Linux setup. Regards, --------- Georges Racinet Nuxeo SAS gracinet@nuxeo.com http://nuxeo.com Tel: +33 (0) 1 40 33 71 73 From ogrisel at nuxeo.com Wed Mar 22 11:29:12 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Wed Mar 22 11:30:20 2006 Subject: [lxml-dev] Re: eggs In-Reply-To: <666505837.20060322071923@carcass.dhs.org> References: <4420468E.8000003@infrae.com> <24d517dd0603211324h3ee5f7e9sc23bb5f75597dee1@mail.gmail.com> <44208641.5020208@infrae.com> <17610414400.20060321222627@carcass.dhs.org> <15510497526.20060322042602@carcass.dhs.org> <44211F3A.3080506@infrae.com> <44211FB8.5090605@infrae.com> <666505837.20060322071923@carcass.dhs.org> Message-ID: Steve Howe a ?crit : > Hello Martijn, > > Wednesday, March 22, 2006, 6:58:16 AM, you wrote: > >> I just uploaded it, so please Windows users test it out. One thing I >> noticed is that it's significantly smaller than the mac or linux egg for >> some reason. Perhaps that's just a compiler producing more compact binaries. > Probably because it is dynamically linked, while on linux it is > statically linked against libxml/libxslt ?.... > AFAIK it's not the case on linux: ogrisel@groyours:~/Desktop/lxml-0.9-py2.4-linux-i686.egg_FILES/lxml $ ldd etree.so linux-gate.so.1 => (0xffffe000) libxslt.so.1 => /usr/lib/libxslt.so.1 (0xb7f1b000) libxml2.so.2 => /usr/lib/libxml2.so.2 (0xb7e0d000) libz.so.1 => /usr/lib/libz.so.1 (0xb7df9000) libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7dd7000) libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb7dc5000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7c96000) libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7c92000) /lib/ld-linux.so.2 (0x80000000) By stripping etree.so we can gain more than 50% in size: $ ls -hl etree.so -rw-r--r-- 1 ogrisel ogrisel 1014K 2006-03-21 11:17 etree.so $ strip etree.so $ ls -hl etree.so -rw-r--r-- 1 ogrisel ogrisel 452K 2006-03-22 11:27 etree.so -- Olivier From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 22 17:38:50 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 22 17:41:59 2006 Subject: [lxml-dev] Callgrind tests Message-ID: <44217D9A.4090009@gkec.informatik.tu-darmstadt.de> Hello everyone, another one for the archives. I did a few tests with Callgrind and KCachegrind (if you don't know kcachegrind, install it, you'll love it), as I was suspecting the XPath wrapper to have become slow due to the global function registries. What I found was: 1) libxml2 performance is heavily bound by malloc calls (not sure if callgrind influences this). The XPath implementation is so incredibly fast that the registration of the /builtin/ XPath functions (xmlXPathRegisterAllFunctions) and the related hash table creation (two xmlHashCreate's per XPath context) were the major bottlenecks in my tests. The overhead added by lxml itself was negligible. 2) string formatting in Python was the other problem. The major bottleneck in tree setup in bench.py was the python function that builds the element names based on loop variables (PyString_Format). Meaning, the bottleneck was /outside/ the tested code this time. So, the major result is that, for the tested parts, lxml's performance is mainly bound by two factors: Python and libxml2. I guess I can safely assume that the code parts that I checked are pretty much too small an issue to merit any further optimization efforts. Have fun, Stefan From pete.forman at westerngeco.com Wed Mar 22 18:06:17 2006 From: pete.forman at westerngeco.com (Pete Forman) Date: Wed Mar 22 18:08:11 2006 Subject: [lxml-dev] Files missing from lxml 0.9 win32 Message-ID: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> I downloaded http://carcass.dhs.org/lxml-0.9.win32-py2.4.exe and ran some of its tests. It is missing some files. So far I've individually downloaded test1.rng and test2.rng. test_broken.xml and test_xinclude.xml are next. They seem to be missing from http://cheeseshop.python.org/packages/2.4/l/lxml/lxml-0.9-py2.4-win32.egg as well. The tgz has the files, I might try installing from that. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent pete.forman@westerngeco.com -./\.- opinion of Schlumberger, Baker http://petef.port5.com -./\.- Hughes or their divisions. From faassen at infrae.com Wed Mar 22 18:17:18 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 22 18:17:43 2006 Subject: [lxml-dev] Callgrind tests In-Reply-To: <44217D9A.4090009@gkec.informatik.tu-darmstadt.de> References: <44217D9A.4090009@gkec.informatik.tu-darmstadt.de> Message-ID: <4421869E.9060605@infrae.com> Stefan Behnel wrote: [snip] > What I > found was: > > 1) libxml2 performance is heavily bound by malloc calls (not sure if callgrind > influences this). The XPath implementation is so incredibly fast that the > registration of the /builtin/ XPath functions (xmlXPathRegisterAllFunctions) > and the related hash table creation (two xmlHashCreate's per XPath context) > were the major bottlenecks in my tests. The overhead added by lxml itself was > negligible. > > 2) string formatting in Python was the other problem. The major bottleneck in > tree setup in bench.py was the python function that builds the element names > based on loop variables (PyString_Format). Meaning, the bottleneck was > /outside/ the tested code this time. > > So, the major result is that, for the tested parts, lxml's performance is > mainly bound by two factors: Python and libxml2. I guess I can safely assume > that the code parts that I checked are pretty much too small an issue to merit > any further optimization efforts. I'm not sure what you mean about string formatting... The bottleneck being outside the tested code means that lxml makes calls to the Python API and that this is costing time, right? So, one possibility to speed up lxml is for it to call the Python API less often. One possibility towards optimization would be getting rid of the need to decode UTF-8 (libxml2) to Python unicode all the time (or to Plain python strings if they're ascii). This could be done by caching the Python unicode/strings somehow. I discussed this a long time ago with Daniel Veillard and he mentioned extending the string dictionary in libxml2 so it could add another payload field which could be our string. If that payload is already there, we can simply return this instead of regenerating it. Regards, Martijn From ogrisel at nuxeo.com Wed Mar 22 18:30:43 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Wed Mar 22 18:31:55 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> Message-ID: Pete Forman a ?crit : > I downloaded http://carcass.dhs.org/lxml-0.9.win32-py2.4.exe and ran > some of its tests. > > It is missing some files. So far I've individually downloaded test1.rng > and test2.rng. test_broken.xml and test_xinclude.xml are next. > > They seem to be missing from > http://cheeseshop.python.org/packages/2.4/l/lxml/lxml-0.9-py2.4-win32.egg > as well. The tgz has the files, I might try installing from that. > I think lxml's setup.py lacks some package_data directive: http://docs.python.org/dist/node11.html I can fix that tonight (GMT) in the trunk if nobody does it before. -- Olivier From faassen at infrae.com Wed Mar 22 19:11:58 2006 From: faassen at infrae.com (Martijn Faassen) Date: Wed Mar 22 19:12:27 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> Message-ID: <4421936E.3010907@infrae.com> Olivier Grisel wrote: > Pete Forman a ?crit : > >> I downloaded http://carcass.dhs.org/lxml-0.9.win32-py2.4.exe and ran >> some of its tests. >> >> It is missing some files. So far I've individually downloaded >> test1.rng and test2.rng. test_broken.xml and test_xinclude.xml are next. >> >> They seem to be missing from >> http://cheeseshop.python.org/packages/2.4/l/lxml/lxml-0.9-py2.4-win32.egg >> as well. The tgz has the files, I might try installing from that. >> > > I think lxml's setup.py lacks some package_data directive: > > http://docs.python.org/dist/node11.html > > I can fix that tonight (GMT) in the trunk if nobody does it before. That would be good, thanks! Unless Stefan has been hacking on the trunk to add features very recently, this can be a 0.9.1. Stefan, if you're going to hack features, perhaps branch off a 0.9 branch. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 22 19:10:52 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 22 19:13:44 2006 Subject: [lxml-dev] Callgrind tests In-Reply-To: <4421869E.9060605@infrae.com> References: <44217D9A.4090009@gkec.informatik.tu-darmstadt.de> <4421869E.9060605@infrae.com> Message-ID: <4421932C.5050705@gkec.informatik.tu-darmstadt.de> Martijn Faassen schrieb: > I'm not sure what you mean about string formatting... The bottleneck > being outside the tested code means that lxml makes calls to the Python > API and that this is costing time, right? No, actually I meant that it is the /calling/ code that becomes the bottleneck, not the code I tested. The API and everything behind it was plenty fast for the test. I know, that's not the best evaluation then (it doesn't really test the code itself), but it shows that the code that I tested is so fast that programs that use it will most likely have their performance problems elsewhere. > So, one possibility to speed up lxml is for it to call the Python API > less often. One possibility towards optimization would be getting rid of > the need to decode UTF-8 (libxml2) to Python unicode all the time (or to > Plain python strings if they're ascii). I didn't test unicode conversion at all. char*->String conversion did not seem to be a problem, I guess that's mainly a strlen, a memcopy and a C-object instantiation. All of those should be pretty fast. > This could be done by caching the Python unicode/strings somehow. I > discussed this a long time ago with Daniel Veillard and he mentioned > extending the string dictionary in libxml2 so it could add another > payload field which could be our string. If that payload is already > there, we can simply return this instead of regenerating it. I already thought about something like that when I went through my optimizations. When you look at some of the benchmark results, you will see that cElementTree is about 3 times faster for the element.text and element.tag benchmarks. As FL pointed out, that's due to exactly that optimization: build python strings only once. The main problem, however, is that lxml uses properties here. In my benchmarks, that turned out to produce more overhead than the actual conversion afterwards (I tested that by returning a constant string). Apart from that, there is no other point for tweaking the performance left in that part of lxml - I checked. :) Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 22 19:19:52 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 22 19:20:27 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: <4421936E.3010907@infrae.com> References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> <4421936E.3010907@infrae.com> Message-ID: <44219548.2040003@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Unless Stefan has been hacking on the trunk to add features very > recently, this can be a 0.9.1. Stefan, if you're going to hack features, > perhaps branch off a 0.9 branch. Right, we wanted to do that anyway, so I just did. There is now a branch called "lxml-0.9.x". It is branched from the current trunk and thus contains the little fixes and doc updates from the last two days. All 0.9 maintenance stuff can go in there. Stefan From ogrisel at nuxeo.com Wed Mar 22 19:25:29 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Wed Mar 22 19:26:45 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: <44219548.2040003@gkec.informatik.tu-darmstadt.de> References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> <4421936E.3010907@infrae.com> <44219548.2040003@gkec.informatik.tu-darmstadt.de> Message-ID: Stefan Behnel a ?crit : > Martijn Faassen wrote: >> Unless Stefan has been hacking on the trunk to add features very >> recently, this can be a 0.9.1. Stefan, if you're going to hack features, >> perhaps branch off a 0.9 branch. > > Right, we wanted to do that anyway, so I just did. There is now a branch > called "lxml-0.9.x". It is branched from the current trunk and thus contains > the little fixes and doc updates from the last two days. > > All 0.9 maintenance stuff can go in there. Ok so I'll double commit my setup.py fixes both in 0.9.x and trunk tonight. -- Olivier From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 22 19:43:25 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 22 19:44:09 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: <4421936E.3010907@infrae.com> References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> <4421936E.3010907@infrae.com> Message-ID: <44219ACD.3060403@gkec.informatik.tu-darmstadt.de> Martijn Faassen wrote: > Olivier Grisel wrote: >> Pete Forman wrote: >>> So far I've individually downloaded >>> test1.rng and test2.rng. test_broken.xml and test_xinclude.xml are >>> next. >> >> I think lxml's setup.py lacks some package_data directive: >> http://docs.python.org/dist/node11.html >> I can fix that tonight (GMT) in the trunk if nobody does it before. > > That would be good, thanks! Right, I always had them in MANIFEST.in, but actually, they are package data. On the other hand: These files are really part of the test suite. Maybe the right question here is: why is the test suite part of the eggs at all? Eggs are supposed to be installed into "python/site-packages". No-one needs the test suite there. So I think we should check if we can't remove the test suite from the eggs. Stefan From ogrisel at nuxeo.com Wed Mar 22 23:26:41 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Wed Mar 22 23:28:07 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: <44219ACD.3060403@gkec.informatik.tu-darmstadt.de> References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> <4421936E.3010907@infrae.com> <44219ACD.3060403@gkec.informatik.tu-darmstadt.de> Message-ID: Stefan Behnel a ?crit : > > Right, I always had them in MANIFEST.in, but actually, they are package data. > > On the other hand: > > These files are really part of the test suite. Maybe the right question here > is: why is the test suite part of the eggs at all? Eggs are supposed to be > installed into "python/site-packages". No-one needs the test suite there. > > So I think we should check if we can't remove the test suite from the eggs. Yes + the fact that 'package_data' is 2.4 specific and would have broken the 2.3 compat. So I removed the tests out of the egg (in trunk and the 0.9.x branch). I use that egg on my system and everything seems to work (362 tests ok). -- Olivier From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 23 07:26:25 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 23 07:27:09 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> <4421936E.3010907@infrae.com> <44219ACD.3060403@gkec.informatik.tu-darmstadt.de> Message-ID: <44223F91.3020405@gkec.informatik.tu-darmstadt.de> Olivier Grisel nous ?crivait pr?c?demment: > Stefan Behnel a ?crit : >> Right, I always had them in MANIFEST.in, but actually, they are >> package data. >> >> On the other hand: >> >> These files are really part of the test suite. Maybe the right >> question here >> is: why is the test suite part of the eggs at all? Eggs are supposed >> to be >> installed into "python/site-packages". No-one needs the test suite there. >> >> So I think we should check if we can't remove the test suite from the >> eggs. > > Yes + the fact that 'package_data' is 2.4 specific and would have broken > the 2.3 compat. > > So I removed the tests out of the egg (in trunk and the 0.9.x branch). I > use that egg on my system and everything seems to work (362 tests ok). Thanks. I think if people want to run the test suite, they either 1) have the tar ball anyway since they are building binaries or 2) are interested in getting bugs fixed and can therefore be expected to accept downloading the tar ball separately to run it. I just added a note on that on the install page. BTW: Since this does not have any impact on end-users, I don't think this is already worth a 0.9.1 -- although it may give us the chance to shrink the size of the Linux egg. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 23 08:08:55 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 23 08:09:42 2006 Subject: [lxml-dev] Callgrind tests In-Reply-To: <321590357.20060322142221@carcass.dhs.org> References: <44217D9A.4090009@gkec.informatik.tu-darmstadt.de> <321590357.20060322142221@carcass.dhs.org> Message-ID: <44224987.9090303@gkec.informatik.tu-darmstadt.de> Hi Steve, Steve Howe wrote: > Wednesday, March 22, 2006, 1:38:50 PM, you wrote: >> 2) string formatting in Python was the other problem. The major bottleneck in >> tree setup in bench.py was the python function that builds the element names >> based on loop variables (PyString_Format). Meaning, the bottleneck was >> /outside/ the tested code this time. > > I wonder if running the same tests on cElementTree would point similar > results in what concerns to the Python function calls. Go ahead, try, using KCachegrind is pure fun! :) > Do you have any results (or impressions) on this ? I didn't check, but I don't think it suffers so much from Python performance. As Fredrik said, cElementTree builds Python objects on the way in, so all you should see when /accessing/ data is Python's call overhead rather than any substantial calculations. I think that's totally the right optimization, but it is difficult to do something similar in lxml, since we also get entire trees from the parser. It wouldn't be a good idea to traverse them to build Python objects - we don't even know if they would be used. All we could do is cache Python objects once they were built. The Proxy mechanism would be the right place to keep references to text and tag objects. Also, you could to change the current way Python element proxies are deallocated to keep them alive as long as any of them is really used. But that's non-trivial. Anyway, to make me implement that, I would really have to be convinced that it's worth it - and I absolutely don't see enough of a speed-up behind these optimizations to encourage such a huge effort. Especially the text and tag properties are bound by call overhead, not by object creation time. Stefan From faassen at infrae.com Thu Mar 23 11:44:57 2006 From: faassen at infrae.com (Martijn Faassen) Date: Thu Mar 23 11:45:42 2006 Subject: [lxml-dev] Re: Files missing from lxml 0.9 win32 In-Reply-To: <44219548.2040003@gkec.informatik.tu-darmstadt.de> References: <6.2.1.2.2.20060322170515.04cdf0a8@westerngeco.com> <4421936E.3010907@infrae.com> <44219548.2040003@gkec.informatik.tu-darmstadt.de> Message-ID: <44227C29.80407@infrae.com> Stefan Behnel wrote: > Martijn Faassen wrote: > >>Unless Stefan has been hacking on the trunk to add features very >>recently, this can be a 0.9.1. Stefan, if you're going to hack features, >>perhaps branch off a 0.9 branch. > > > Right, we wanted to do that anyway, so I just did. There is now a branch > called "lxml-0.9.x". It is branched from the current trunk and thus contains > the little fixes and doc updates from the last two days. > > All 0.9 maintenance stuff can go in there. Small comment: a branch name of lxml-0.9 for the 0.9 line would be nicer in my opinion - seems to be the conventional way to do it. Regards, Martijn From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 23 13:03:28 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 23 13:04:12 2006 Subject: [lxml-dev] Callgrind tests In-Reply-To: <4421869E.9060605@infrae.com> References: <44217D9A.4090009@gkec.informatik.tu-darmstadt.de> <4421869E.9060605@infrae.com> Message-ID: <44228E90.9090304@gkec.informatik.tu-darmstadt.de> Hi Martijn, Martijn Faassen wrote: > So, one possibility to speed up lxml is for it to call the Python API > less often. One possibility towards optimization would be getting rid of > the need to decode UTF-8 (libxml2) to Python unicode all the time (or to > Plain python strings if they're ascii). > > This could be done by caching the Python unicode/strings somehow. I > discussed this a long time ago with Daniel Veillard and he mentioned > extending the string dictionary in libxml2 so it could add another > payload field which could be our string. If that payload is already > there, we can simply return this instead of regenerating it. I figured there is one place where implementing caching is cheap: element.tag. So I decided to add a _tag attribute to _Element. It is set to None at initialisation and to the tag value when the property is set. Getting the property then tests for None and returns the value if it was set before. Since we assure at most one proxy element per node, this should not bring in any inconsistencies. According to callgrind, the speedup is close to 95% for each subsequent call to element.tag after the first one - as long as the Python reference to the element persists. Obvious drawback: the first access is a little slower now, so accessing the tag names of tons of different objects will suffer. But that's somewhat acceptable - in that case, we really hit the Python string building overhead anyway. It's both in the trunk and 0.9.x (ok, perhaps that isn't really the best name, agreed). Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 28 07:40:01 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 28 07:40:46 2006 Subject: [lxml-dev] HTML parser support Message-ID: <4428CC31.5010309@gkec.informatik.tu-darmstadt.de> Hi, I created a branch "htmlparser" (as opposed to the previous "htmlparse") and used it to rewrite the current parser to support both the XML and HTML parser API of libxml2 (file src/lxml/parser.pxi). Problem: It doesn't work (yet), it crashes. I cut down the problem to find that it is a problem with the deallocation code. Deallocation of HTML trees (or at least "something" in their representation) seems to be different in libxml2 than for XML. The result is a double free of the document or its nodes - once when releasing an element (attemptDeallocation) and again when releasing the document. This is difficult to debug from Python as both usually happen in one step, when the last element is refcounted. And I still haven't found the actual reason for this. However, I found that removing the call to "attemptDeallocation" from _NodeBase.__dealloc__ for HTML trees solves it. So, I'm not sure how to handle this. It may mean that we have to handle object deallocation different depending on the initial parser - which would be very unfortunate. There may also be an additional tweak to be done at parse time, but I wouldn't know what else to try. (Kasimier?) Anyway, whoever wants to try it, just go ahead. Maybe someone else finds a twist into getting this to work. For testing, there are a few test cases in test_htmlparser.py. Note that they will crash, so I can't add them to the automated test suite. You have to run them manually: PYTHONPATH=src python src/lxml/tests/test_htmlparser.py I left a few debug prints in the source, so don't wonder where the output comes from. Any input on this is appreciated. Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Tue Mar 28 08:42:45 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Tue Mar 28 08:43:31 2006 Subject: Heureka! - Re: [lxml-dev] HTML parser support In-Reply-To: <4428CC31.5010309@gkec.informatik.tu-darmstadt.de> References: <4428CC31.5010309@gkec.informatik.tu-darmstadt.de> Message-ID: <4428DAE5.2070503@gkec.informatik.tu-darmstadt.de> Stefan Behnel wrote: > I created a branch "htmlparser" (as opposed to the previous "htmlparse") and > used it to rewrite the current parser to support both the XML and HTML parser > API of libxml2 (file src/lxml/parser.pxi). > > Problem: It doesn't work (yet), it crashes. Correction: It works *now*. :) There was a special case test for the document node in the lxml element deallocation code - for the /XML/ document node. The HTML document node has to be treated equally. http://codespeak.net/svn/lxml/branch/htmlparser/ I integrated the test case now. Any input on this is still appreciated. If it turns out to work well, this will be merged into the trunk to be integrated in 1.0. Stefan From d.w.morriss at gmail.com Wed Mar 29 18:03:10 2006 From: d.w.morriss at gmail.com (whit) Date: Wed Mar 29 18:03:57 2006 Subject: [lxml-dev] malloc issues Message-ID: <442AAFBE.8010700@longnow.org> after installing the latest egg, I have been having issues with seg faults, bus error and been get lots of errors like these: > > python(300) malloc: *** Deallocation of a pointer not malloced: > 0x628d30; This could be a double free(), or free() called with the > middle of an allocated block; Try setting environment variable > MallocHelp to see tools to help debug > python(300) malloc: *** error for object 0x629910: double free > python(300) malloc: *** set a breakpoint in szone_error to debug > > below are the style sheet and function that expose the problem. > > I'm using: > > libxslt 1.1.15 > libxml 2.6.22 > lxml 0.9(trunk) > > gcc-4.0 (osx, tiger) > pyrex (svn from codespeak) > > possibly diagnostic and extremely irritating is that I can't back out > to my previous version of lxml. > > -w > > ------------------------------------------------------------------------ > > # ganked from z0pt and sfive > import os > from StringIO import StringIO > > slug = """
> >
> """ > > def xstrip(text): > """ > strip out whitespace > >>> print xstrip(slug) >
> ... > """ > if not text: > return '' > from lxml import etree > xsltfile = os.path.join(os.path.dirname(__file__), 'strip.xsl') > xslt = open(xsltfile) > xslt_doc = etree.parse(xslt) > style = etree.XSLT(xslt_doc) > xslt.close() > doc = etree.fromstring(text) > result = style(doc) > return str(result) > > import unittest > from zope.testing import doctest > optionflags = doctest.REPORT_ONLY_FIRST_FAILURE | doctest.ELLIPSIS > def test_suite(): > > return unittest.TestSuite(( > doctest.DocTestSuite('xml', optionflags=optionflags) > )) > > if __name__=="__main__": > unittest.TextTestRunner().run(test_suite()) > > ------------------------------------------------------------------------ > > xmlns:xsl='http://www.w3.org/1999/XSL/Transform'> > > > > > > > -- | david "whit" morriss | | contact :: http://public.xdi.org/=whit "If you don't know where you are, you don't know anything at all" Dr. Edgar Spencer, Ph.D., 1995 "I like to write code like other ppl like to tune their cars or 10kW hifi equipment..." Christian Heimes, 2004 -------------- next part -------------- A non-text attachment was scrubbed... Name: whit.vcf Type: text/x-vcard Size: 181 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060329/494aeb84/whit-0001.vcf From ogrisel at nuxeo.com Wed Mar 29 18:09:42 2006 From: ogrisel at nuxeo.com (Olivier Grisel) Date: Wed Mar 29 18:10:47 2006 Subject: [lxml-dev] Re: malloc issues In-Reply-To: <442AAFBE.8010700@longnow.org> References: <442AAFBE.8010700@longnow.org> Message-ID: whit a ?crit : > after installing the latest egg, I have been having issues with seg > faults, bus error and been get lots of errors like these: >> >> python(300) malloc: *** Deallocation of a pointer not malloced: >> 0x628d30; This could be a double free(), or free() called with the >> middle of an allocated block; Try setting environment variable >> MallocHelp to see tools to help debug >> python(300) malloc: *** error for object 0x629910: double free >> python(300) malloc: *** set a breakpoint in szone_error to debug >> >> below are the style sheet and function that expose the problem. >> >> I'm using: >> >> libxslt 1.1.15 >> libxml 2.6.22 >> lxml 0.9(trunk) >> >> gcc-4.0 (osx, tiger) >> pyrex (svn from codespeak) Are you using the OSX Tiger egg from pypi or your own egg build from lxml trunk? >> possibly diagnostic and extremely irritating is that I can't back out >> to my previous version of lxml. You can still build lxml from the tarball since you have your own gcc / pyrex. -- Olivier From paul at zope-europe.org Wed Mar 29 18:10:57 2006 From: paul at zope-europe.org (Paul Everitt) Date: Wed Mar 29 18:12:11 2006 Subject: [lxml-dev] Re: Heureka! - Re: HTML parser support In-Reply-To: <4428DAE5.2070503@gkec.informatik.tu-darmstadt.de> References: <4428CC31.5010309@gkec.informatik.tu-darmstadt.de> <4428DAE5.2070503@gkec.informatik.tu-darmstadt.de> Message-ID: <442AB191.1050301@zope-europe.org> Stefan Behnel wrote: > Stefan Behnel wrote: >> I created a branch "htmlparser" (as opposed to the previous "htmlparse") and >> used it to rewrite the current parser to support both the XML and HTML parser >> API of libxml2 (file src/lxml/parser.pxi). >> >> Problem: It doesn't work (yet), it crashes. > > Correction: It works *now*. :) > > There was a special case test for the document node in the lxml element > deallocation code - for the /XML/ document node. The HTML document node has to > be treated equally. > > http://codespeak.net/svn/lxml/branch/htmlparser/ > > I integrated the test case now. Any input on this is still appreciated. > > If it turns out to work well, this will be merged into the trunk to be > integrated in 1.0. This is great news! I'll grab the branch and give it a try. Excellent, thanks, Stefan! --Paul From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 29 19:03:26 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 29 19:04:04 2006 Subject: [lxml-dev] malloc issues In-Reply-To: <442AAFBE.8010700@longnow.org> References: <442AAFBE.8010700@longnow.org> Message-ID: <442ABDDE.3010008@gkec.informatik.tu-darmstadt.de> whit wrote: > after installing the latest egg, I have been having issues with seg > faults, bus error and been get lots of errors like these: >> >> python(300) malloc: *** Deallocation of a pointer not malloced: >> 0x628d30; This could be a double free(), or free() called with the >> middle of an allocated block; Try setting environment variable >> MallocHelp to see tools to help debug >> python(300) malloc: *** error for object 0x629910: double free >> python(300) malloc: *** set a breakpoint in szone_error to debug >> >> below are the style sheet and function that expose the problem. >> >> I'm using: >> >> libxslt 1.1.15 >> libxml 2.6.22 >> lxml 0.9(trunk) >> >> gcc-4.0 (osx, tiger) >> pyrex (svn from codespeak) Hi, thanks for reporting this. I cleaned up your test case somewhat and came up with this: ---------------------------- slug = """
""" xslt = """\ """ from lxml import etree xslt_doc = etree.XML(xslt) style = etree.XSLT(xslt_doc) doc = etree.XML(slug) result = style(doc) print 1 del result print 2 ---------------------------- This triggers a double free in _Document.__dealloc__() calling xmlFreeDoc() when result is GCed on "del result". This is exactly the same problem I had with the HTML parser branch. It is triggered because you use "html" as output method, which makes libxslt output HTML trees with HTML document nodes instead of XML document nodes. I've applied the fix to trunk and 0.9 branch. Please test it. Stefan From d.w.morriss at gmail.com Wed Mar 29 19:05:46 2006 From: d.w.morriss at gmail.com (whit) Date: Wed Mar 29 19:06:19 2006 Subject: [lxml-dev] Re: malloc issues In-Reply-To: References: <442AAFBE.8010700@longnow.org> Message-ID: <442ABE6A.2050205@longnow.org> I've tried the following permutations:: easy_install lxml easy_install http://codespeak.net/lxml/lxml-0.9.tgz (installs without xslt.py and several other files) untarring and installing http://codespeak.net/lxml/lxml-0.9.tgz (python setup.py install) python setup.py bdist; easy_install dist/....egg building locally doesn't seem to help. same for trunk, and tags of .8 and .9 all get similar errors. -w Olivier Grisel wrote: > whit a ?crit : >> after installing the latest egg, I have been having issues with seg >> faults, bus error and been get lots of errors like these: >>> >>> python(300) malloc: *** Deallocation of a pointer not malloced: >>> 0x628d30; This could be a double free(), or free() called with the >>> middle of an allocated block; Try setting environment variable >>> MallocHelp to see tools to help debug >>> python(300) malloc: *** error for object 0x629910: double free >>> python(300) malloc: *** set a breakpoint in szone_error to debug >>> >>> below are the style sheet and function that expose the problem. >>> >>> I'm using: >>> >>> libxslt 1.1.15 >>> libxml 2.6.22 >>> lxml 0.9(trunk) >>> >>> gcc-4.0 (osx, tiger) >>> pyrex (svn from codespeak) > > Are you using the OSX Tiger egg from pypi or your own egg build from > lxml trunk? > >>> possibly diagnostic and extremely irritating is that I can't back >>> out to my previous version of lxml. > > You can still build lxml from the tarball since you have your own gcc > / pyrex. > -- | david "whit" morriss | | contact :: http://public.xdi.org/=whit "If you don't know where you are, you don't know anything at all" Dr. Edgar Spencer, Ph.D., 1995 "I like to write code like other ppl like to tune their cars or 10kW hifi equipment..." Christian Heimes, 2004 -------------- next part -------------- A non-text attachment was scrubbed... Name: whit.vcf Type: text/x-vcard Size: 181 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060329/873a31cd/whit.vcf From d.w.morriss at gmail.com Wed Mar 29 19:32:57 2006 From: d.w.morriss at gmail.com (whit) Date: Wed Mar 29 19:33:35 2006 Subject: [lxml-dev] malloc issues In-Reply-To: <442ABDDE.3010008@gkec.informatik.tu-darmstadt.de> References: <442AAFBE.8010700@longnow.org> <442ABDDE.3010008@gkec.informatik.tu-darmstadt.de> Message-ID: <442AC4C9.8080608@longnow.org> awesome. that passes on my machine. -w Stefan Behnel wrote: > whit wrote: > >> after installing the latest egg, I have been having issues with seg >> faults, bus error and been get lots of errors like these: >> >>> python(300) malloc: *** Deallocation of a pointer not malloced: >>> 0x628d30; This could be a double free(), or free() called with the >>> middle of an allocated block; Try setting environment variable >>> MallocHelp to see tools to help debug >>> python(300) malloc: *** error for object 0x629910: double free >>> python(300) malloc: *** set a breakpoint in szone_error to debug >>> >>> below are the style sheet and function that expose the problem. >>> >>> I'm using: >>> >>> libxslt 1.1.15 >>> libxml 2.6.22 >>> lxml 0.9(trunk) >>> >>> gcc-4.0 (osx, tiger) >>> pyrex (svn from codespeak) >>> > > Hi, thanks for reporting this. I cleaned up your test case somewhat and came > up with this: > > ---------------------------- > slug = """
> >
> """ > > xslt = """\ > > > > > > > > """ > > from lxml import etree > xslt_doc = etree.XML(xslt) > style = etree.XSLT(xslt_doc) > > doc = etree.XML(slug) > result = style(doc) > print 1 > del result > print 2 > ---------------------------- > > This triggers a double free in _Document.__dealloc__() calling xmlFreeDoc() > when result is GCed on "del result". > > This is exactly the same problem I had with the HTML parser branch. It is > triggered because you use "html" as output method, which makes libxslt output > HTML trees with HTML document nodes instead of XML document nodes. I've > applied the fix to trunk and 0.9 branch. Please test it. > > Stefan > > > -- | david "whit" morriss | | contact :: http://public.xdi.org/=whit "If you don't know where you are, you don't know anything at all" Dr. Edgar Spencer, Ph.D., 1995 "I like to write code like other ppl like to tune their cars or 10kW hifi equipment..." Christian Heimes, 2004 -------------- next part -------------- A non-text attachment was scrubbed... Name: whit.vcf Type: text/x-vcard Size: 181 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20060329/21934242/whit.vcf From behnel_ml at gkec.informatik.tu-darmstadt.de Wed Mar 29 20:31:34 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Wed Mar 29 20:32:11 2006 Subject: [lxml-dev] malloc issues In-Reply-To: <442ACF19.70401@longnow.org> References: <442AAFBE.8010700@longnow.org> <442ABDDE.3010008@gkec.informatik.tu-darmstadt.de> <442ACF19.70401@longnow.org> Message-ID: <442AD286.3080604@gkec.informatik.tu-darmstadt.de> whit wrote: > the malloc error returns when I call the my function repeated times in a > doctest. only one warning this time. Sorry, I can't reproduce that. Could you send the code that triggers the warning to the list so that I can check it? Or, even better, could you try to cut it down to a simpler test case that shows the same problems? Stefan From behnel_ml at gkec.informatik.tu-darmstadt.de Thu Mar 30 20:57:09 2006 From: behnel_ml at gkec.informatik.tu-darmstadt.de (Stefan Behnel) Date: Thu Mar 30 20:55:57 2006 Subject: [lxml-dev] 0.9.1 - bug fix release Message-ID: <442C2A05.3090508@gkec.informatik.tu-darmstadt.de> Hello everyone, I just released 0.9.1, mainly as a bug fix release. Cheeseshop has the source: http://cheeseshop.python.org/pypi/lxml Features added: * lxml.sax.ElementTreeContentHandler checks closing elements and raises SaxError on mismatch * lxml.sax.ElementTreeContentHandler now supports namespace-less SAX events (startElement, endElement) and defaults to empty attributes (keyword argument) * zip_safe flag allows setuptools to install lxml as zipped egg * Speedup for repeatedly accessing element tag names * Minor API performance improvements Bugs fixed: * Memory deallocation bug: crash when using XSLT output method "html" * sax.py was handling UTF-8 encoded tag names where it shouldn't * lxml.tests package will no longer be installed (is still in source tar) Martijn and I were very happy with the eggs we received for 0.9, so we kindly hope for a similarly overwhelming response to 0.9.1. :) Have fun, Stefan From faassen at infrae.com Fri Mar 31 17:37:14 2006 From: faassen at infrae.com (Martijn Faassen) Date: Fri Mar 31 17:37:42 2006 Subject: [lxml-dev] 0.9.1 - bug fix release In-Reply-To: <442C2A05.3090508@gkec.informatik.tu-darmstadt.de> References: <442C2A05.3090508@gkec.informatik.tu-darmstadt.de> Message-ID: <442D4CAA.5020104@infrae.com> Stefan Behnel wrote: [snip] > Martijn and I were very happy with the eggs we received for 0.9, so we kindly > hope for a similarly overwhelming response to 0.9.1. :) So far so good; I looked at a nicely overwhelming list of things at the 0.9.1 already today. I've just uploaded 32 bit linux eggs for both Python 2.3 and Python 2.4. I've also updated the website so it has the new release information on it as well. Regards, Martijn