From maxkisselew at googlemail.com Sat Oct 2 00:15:16 2010 From: maxkisselew at googlemail.com (Max Kisselew) Date: Sat, 2 Oct 2010 00:15:16 +0200 Subject: [lxml-dev] Performance gets bad when parsing xml with namespaces In-Reply-To: <4CA4B0F9.40407@behnel.de> References: <4CA4B0F9.40407@behnel.de> Message-ID: I have also tried the new version of libxml2 but it wasn't successful either. I've uploaded my test programm and the xml file to be parsed to: ftp://ftp.ims.uni-stuttgart.de/pub/outgoing/xpath_perf_problem.tar.gz I would be very thankful if you could look through and give me some hints how to solve this problem. No matter which approach I choose it's always the same. Max 2010/9/30 Stefan Behnel : > Max Kisselew, 24.09.2010 01:51: >> >> recently I discovered a problem with lxml/LibXML2. >> I guess it's likely that the problem comes from Libxml2. > > What version are you using? If it's a 2.6.x version, try 2.7 instead. > > >> I'm working on a university project where I use Python and lxml for >> xml parsing and processing. First there were no namespace definitions >> in the xml files >> we used but recently the format has slightly changed and some namespace >> definitions were added. Here the xml format as it was in the beginning: >> >> >> >> >> IMS, Uni Stuttgart >> >> >> European Medicines Agency >> EMEA/H/C/471 [...] >> Wegen >> >> European >> Medicines >> [...] >> Wegen >> >> >> [...] >> >> >> >> >> >> And here the xml with the recently added namespace definitions: >> >> >> >> >> IMS, Uni Stuttgart >> >> >> European Medicines Agency >> EMEA/H/C/471 [...] >> Wegen >> >> European >> Medicines >> [...] >> Wegen >> >> >> [...] >> >> >> >> >> >> I wanted to extract all the content from the ?elements. In the xml >> file without the namespace definitions that takes just a moment (less >> that 30 seconds). >> But when I tried to perform the same on the new file with namespaces, it >> took much longer, more that 30 minutes (!). The xml file was about 7 MB. > > 7 MB is pretty small, so I'm surprised about that difference (although 30 > seconds sounds pretty long already). Did you try to declare all three > namespaces on the root element using different prefixes, instead of > redeclaring them without a prefix all over the place? > > What code do you use for parsing and searching? There are many ways to do > the above in lxml.etree, and some of them are much faster than others. > > Have a look here: > > http://codespeak.net/lxml/performance.html > > and especially here: > > http://codespeak.net/lxml/performance.html#a-longer-example > > Stefan > From lists at cheimes.de Sun Oct 3 19:27:50 2010 From: lists at cheimes.de (Christian Heimes) Date: Sun, 03 Oct 2010 19:27:50 +0200 Subject: [lxml-dev] DLL issue with win32 static builds of lxml In-Reply-To: References: <303CBE02A7EFC949B6A71CBF4FC86B752BFC95BD5E@MBX2.EXCHPROD.USA.NET> Message-ID: Am 30.09.2010 22:47, schrieb Sidnei da Silva: > On Thu, Sep 30, 2010 at 4:26 PM, Scott Smith wrote: >> Does anyone know what I need to do? What changed after the 2.2.4 lxml >> builds? > > There's actually quite some things that changed. The laptop where I > built the 2.2.4 version was stolen so I had to do everything from > scratch. The version of libxml2 was changed as well, and things like > that. The issue smells like a problem with embedded manifests and dependency on the MSVCRT assembly. Are you embedding a manifest in the .pyd files or libxml2 and libxslt DLLs? During the development of Python 2.6 MvL has removed all manifests from Python extensions. It solved several issues related to not finding MSVCRT. Since all Python extensions depend on pythonXX.dll and the Python DLL depends on MSVCRT, the MSVCRT assembly is loaded anyway. Christian From sidnei.da.silva at canonical.com Mon Oct 4 15:17:44 2010 From: sidnei.da.silva at canonical.com (Sidnei da Silva) Date: Mon, 4 Oct 2010 10:17:44 -0300 Subject: [lxml-dev] DLL issue with win32 static builds of lxml In-Reply-To: References: <303CBE02A7EFC949B6A71CBF4FC86B752BFC95BD5E@MBX2.EXCHPROD.USA.NET> Message-ID: On Sun, Oct 3, 2010 at 2:27 PM, Christian Heimes wrote: > The issue smells like a problem with embedded manifests and dependency > on the MSVCRT assembly. Are you embedding a manifest in the .pyd files > or libxml2 and libxslt DLLs? During the development of Python 2.6 MvL > has removed all manifests from Python extensions. It solved several > issues related to not finding MSVCRT. Since all Python extensions depend > on pythonXX.dll and the Python DLL depends on MSVCRT, the MSVCRT > assembly is loaded anyway. Now that you mention it, I think I had manually patched by local installation to disable the manifest embedding like MvL's patch on the laptop that was stolen, but not on the new one. I'll double check. -- Sidnei From agroszer at gmail.com Mon Oct 4 16:35:57 2010 From: agroszer at gmail.com (Adam GROSZER) Date: Mon, 4 Oct 2010 16:35:57 +0200 Subject: [lxml-dev] DLL issue with win32 static builds of lxml In-Reply-To: References: <303CBE02A7EFC949B6A71CBF4FC86B752BFC95BD5E@MBX2.EXCHPROD.USA.NET> Message-ID: <1856919130.20101004163557@gmail.com> Hello Sidnei, You mind updating the build instructions on the webpage? Monday, October 4, 2010, 3:17:44 PM, you wrote: SdS> On Sun, Oct 3, 2010 at 2:27 PM, Christian Heimes wrote: >> The issue smells like a problem with embedded manifests and dependency >> on the MSVCRT assembly. Are you embedding a manifest in the .pyd files >> or libxml2 and libxslt DLLs? During the development of Python 2.6 MvL >> has removed all manifests from Python extensions. It solved several >> issues related to not finding MSVCRT. Since all Python extensions depend >> on pythonXX.dll and the Python DLL depends on MSVCRT, the MSVCRT >> assembly is loaded anyway. SdS> Now that you mention it, I think I had manually patched by local SdS> installation to disable the manifest embedding like MvL's patch on the SdS> laptop that was stolen, but not on the new one. I'll double check. SdS> -- Sidnei SdS> _______________________________________________ SdS> lxml-dev mailing list SdS> lxml-dev at codespeak.net SdS> http://codespeak.net/mailman/listinfo/lxml-dev -- Best regards, Adam GROSZER mailto:agroszer at gmail.com -- Quote of the day: Ceremony and great professing renders friendship as much suspect as it does religion. - William Wycherley From l at lrowe.co.uk Tue Oct 5 12:23:43 2010 From: l at lrowe.co.uk (Laurence Rowe) Date: Tue, 5 Oct 2010 11:23:43 +0100 Subject: [lxml-dev] lxml-dev Digest, Vol 72, Issue 4 In-Reply-To: <4C976D53.70202@extremepro.gr> References: <4C976D53.70202@extremepro.gr> Message-ID: On 20 September 2010 15:18, Dimitrios Pritsos wrote: > On 20/09/10 14:46, lxml-dev-request at codespeak.net wrote: >> Send lxml-dev mailing list submissions to >> ? ? ? lxml-dev at codespeak.net >> >> To subscribe or unsubscribe via the World Wide Web, visit >> ? ? ? http://codespeak.net/mailman/listinfo/lxml-dev >> or, via email, send a message with subject or body 'help' to >> ? ? ? lxml-dev-request at codespeak.net >> >> You can reach the person managing the list at >> ? ? ? lxml-dev-owner at codespeak.net >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of lxml-dev digest..." >> >> >> Today's Topics: >> >> ? ? 1. Re: lxml 2.3 beta 1 released (Pascal) >> ? ? 2. Finding the media-type and method of an XSLT (Laurence Rowe) >> ? ? 3. lxml can't work with html5lib (flya flya) >> ? ? 4. Re: Access to ElementTree for XML schema (Dave Kuhlman) >> ? ? 5. lxml.html.submit_form and unicode values (Eugene Van den Bulke) >> ? ? 6. Re: lxml.html.submit_form and unicode values (Vojt?ch Rylko) >> ? ? 7. Re: lxml.html.submit_form and unicode values >> ? ? ? ?(Eugene Van den Bulke) >> ? ? 8. Re: lxml.html.submit_form and unicode values >> ? ? ? ?(Eugene Van den Bulke) >> ? ? 9. Problem with utf-8 and etree (Vetoshkin Nikita) >> > > >> ? ?10. Concurrent Programming and lxml but recipe for Newbies like >> ? ? ? ?me - any solution? (Dimitrios Pritsos) >> > > Ok I have a correction to make in 10. I posted earlier: In fact the > Threading Setup is working Fine STILL the MultiProcessing Set up is NOT. > One problem that occured is that Queue.Queue() with in a Manager() > object has some kind of problem to trasfere Trees, While Queue.Queue() > works fine with Threads. > > Any Idea how the MultiProcessing setup could work ? Serialise the tree before passing it between processes. Serialising / parsing with libxml2 is normally very fast. Laurence From l at lrowe.co.uk Tue Oct 5 12:44:20 2010 From: l at lrowe.co.uk (Laurence Rowe) Date: Tue, 5 Oct 2010 11:44:20 +0100 Subject: [lxml-dev] lxml-dev Digest, Vol 72, Issue 4 In-Reply-To: <4CAAFFD9.3020707@extremepro.gr> References: <4C976D53.70202@extremepro.gr> <4CAAFFD9.3020707@extremepro.gr> Message-ID: On 5 October 2010 11:37, Dimitrios Pritsos wrote: > On 05/10/10 13:23, Laurence Rowe wrote: >> >> On 20 September 2010 15:18, Dimitrios Pritsos >> ?wrote: >> >>> >>> On 20/09/10 14:46, lxml-dev-request at codespeak.net wrote: >>> >>>> >>>> Send lxml-dev mailing list submissions to >>>> ? ? ? lxml-dev at codespeak.net >>>> >>>> To subscribe or unsubscribe via the World Wide Web, visit >>>> ? ? ? http://codespeak.net/mailman/listinfo/lxml-dev >>>> or, via email, send a message with subject or body 'help' to >>>> ? ? ? lxml-dev-request at codespeak.net >>>> >>>> You can reach the person managing the list at >>>> ? ? ? lxml-dev-owner at codespeak.net >>>> >>>> When replying, please edit your Subject line so it is more specific >>>> than "Re: Contents of lxml-dev digest..." >>>> >>>> >>>> Today's Topics: >>>> >>>> ? ? 1. Re: lxml 2.3 beta 1 released (Pascal) >>>> ? ? 2. Finding the media-type and method of an XSLT (Laurence Rowe) >>>> ? ? 3. lxml can't work with html5lib (flya flya) >>>> ? ? 4. Re: Access to ElementTree for XML schema (Dave Kuhlman) >>>> ? ? 5. lxml.html.submit_form and unicode values (Eugene Van den Bulke) >>>> ? ? 6. Re: lxml.html.submit_form and unicode values (Vojt?ch Rylko) >>>> ? ? 7. Re: lxml.html.submit_form and unicode values >>>> ? ? ? ?(Eugene Van den Bulke) >>>> ? ? 8. Re: lxml.html.submit_form and unicode values >>>> ? ? ? ?(Eugene Van den Bulke) >>>> ? ? 9. Problem with utf-8 and etree (Vetoshkin Nikita) >>>> >>>> >>> >>> >>>> >>>> ? ?10. Concurrent Programming and lxml but recipe for Newbies like >>>> ? ? ? ?me - any solution? (Dimitrios Pritsos) >>>> >>>> >>> >>> Ok I have a correction to make in 10. I posted earlier: In fact the >>> Threading Setup is working Fine STILL the MultiProcessing Set up is NOT. >>> One problem that occured is that Queue.Queue() with in a Manager() >>> object has some kind of problem to trasfere Trees, While Queue.Queue() >>> works fine with Threads. >>> >>> Any Idea how the MultiProcessing setup could work ? >>> >> >> Serialise the tree before passing it between processes. Serialising / >> parsing with libxml2 is normally very fast. >> >> Laurence >> >> > > Thank you very much Laurence > > but what exacly do you mean? to .tostring() it or any other function that I > am not aware of? lxml.etree.tostring() should be just fine for most cases. If you have really large documents then you may want to stream the data over a socket with ElementTree.write() and use feed parsing the other end. Laurence Laurence From l at lrowe.co.uk Tue Oct 5 15:40:23 2010 From: l at lrowe.co.uk (Laurence Rowe) Date: Tue, 5 Oct 2010 14:40:23 +0100 Subject: [lxml-dev] lxml-dev Digest, Vol 72, Issue 4 In-Reply-To: <4CAB08F5.4000505@extremepro.gr> References: <4C976D53.70202@extremepro.gr> <4CAAFFD9.3020707@extremepro.gr> <4CAB08F5.4000505@extremepro.gr> Message-ID: On 5 October 2010 12:16, Dimitrios Pritsos wrote: > On 05/10/10 13:44, Laurence Rowe wrote: ... >> lxml.etree.tostring() should be just fine for most cases. If you have >> really large documents then you may want to stream the data over a >> socket with ElementTree.write() and use feed parsing the other end. > > Thank you very much the socket solution it seems more neat solution even if > my trees are not very big, since most of them are regular webpages... That really depends on what you're doing with them. In your other post you use a queue (presumably with multiple consumers to make use of the concurrency,) that's not really going to work with a stream based approach. Also, remember that lxml releases the GIL while libxml2 / libxslt are doing their work, so you may not need to use multiple processes to make use of multiple cores in your program (at least if most of the work is done by lxml rather than your python code.) Laurence From stefan_ml at behnel.de Tue Oct 5 16:29:24 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 05 Oct 2010 16:29:24 +0200 Subject: [lxml-dev] lxml-dev Digest, Vol 72, Issue 4 In-Reply-To: References: <4C976D53.70202@extremepro.gr> <4CAAFFD9.3020707@extremepro.gr> Message-ID: <4CAB3644.9080106@behnel.de> Laurence Rowe, 05.10.2010 12:44: > lxml.etree.tostring() should be just fine for most cases. If you have > really large documents then you may want to stream the data over a > socket with ElementTree.write() and use feed parsing the other end. However, note that tostring() is usually much faster than writing to a socket (or pipe) incrementally using ET.write(), even if you still write the entire result string to the socket after serialisation. Plus, it allows you to clean up the in-memory tree *before* writing the bytes sequence to the socket, so that the receiving side (assuming it's on the same machine) will have more memory available to parse it back into a tree. Stefan From njriley at illinois.edu Wed Oct 6 06:26:12 2010 From: njriley at illinois.edu (Nicholas Riley) Date: Tue, 5 Oct 2010 23:26:12 -0500 Subject: [lxml-dev] Unexpected lxml.etree.XMLSyntaxError: Attempt to load network entity... Message-ID: Hi, I'm getting "lxml.etree.XMLSyntaxError: Attempt to load network entity " when I try to use lxml.etree.parse(), with lxml 2.2.8 and 2.3 beta 1 (in Apple's Python 2.6.1 on Mac OS X 10.6.4 with libxml 2.7.7). I tried building from Subversion, but that did not work (see the bug I just filed). As far as I can tell, this should not happen - I should be able to parse an XML file from the network with the default lxml configuration. And in fact, it did not happen when parsing URLs for several days. Then at some point today, it stopped working. I could not replicate this myself by simply parsing the same set of URLs outside my program (4 URLs parse successfully then one fails), but before I go any further trying to isolate the problem, does anyone have an idea why this is happening? I can work around it with set_default_parser(XMLParser(no_network=False)) for the moment, so this is not urgent. Thanks, -- Nicholas Riley From shigin at rambler-co.ru Wed Oct 6 09:40:13 2010 From: shigin at rambler-co.ru (Alexander Shigin) Date: Wed, 06 Oct 2010 11:40:13 +0400 Subject: [lxml-dev] XSLT: Issues encountered when transforming docbook In-Reply-To: <20100903161533.20e66719@Bidule.intranet.cs> References: <20100903161533.20e66719@Bidule.intranet.cs> Message-ID: <1286350813.29843.587028.camel@atlas> ? ???, 03/09/2010 ? 16:15 -0400, J?r?me Carretero ?????: > Hi all, > ...... > Everything works fine when using xsltproc. > > I tried the following : > ...... > kw = { > "olink.base.uri" : "doc.html", > "collect.xref.targets" : "yes", > "targets.filename" : "doc.html.db", > "target.database.document" : "olinkdb-html.xml", > } > > It seems that the silent failure is a bug, and I may be missing some stuff in order to convert the documents properly. > > More complete test cases available upon request. I've checked your code from the git repo. The difference between your python code and xsltproc is that you do use stringparam with xsltproc but not with lxml. It looks like the usage of XSLT.strparam solves your problem. Please look at attached patch. -------------- next part -------------- A non-text attachment was scrubbed... Name: strparam.patch Type: text/x-patch Size: 897 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20101006/5c2d87c0/attachment.bin From steven.vereecken at gmail.com Wed Oct 6 10:47:42 2010 From: steven.vereecken at gmail.com (Steven Vereecken) Date: Wed, 6 Oct 2010 10:47:42 +0200 Subject: [lxml-dev] Unexpected lxml.etree.XMLSyntaxError: Attempt to load network entity... In-Reply-To: References: Message-ID: Are the files that are giving you problems now different than before, more specifically: do they contain a reference to a dtd while they did not before (or using a different system identifier or something)? "no_network" (as I understand it) only applies to loading external, "secondary" documents (like a referenced dtd), and not to the document itself. But the doctype declaration could trigger this case. Steven 2010/10/6 Nicholas Riley : > Hi, > > I'm getting "lxml.etree.XMLSyntaxError: Attempt to load network entity " when I try to use lxml.etree.parse(), with lxml 2.2.8 and 2.3 beta 1 (in Apple's Python 2.6.1 on Mac OS X 10.6.4 with libxml 2.7.7). ?I tried building from Subversion, but that did not work (see the bug I just filed). > > As far as I can tell, this should not happen - I should be able to parse an XML file from the network with the default lxml configuration. ?And in fact, it did not happen when parsing URLs for several days. ?Then at some point today, it stopped working. ?I could not replicate this myself by simply parsing the same set of URLs outside my program (4 URLs parse successfully then one fails), but before I go any further trying to isolate the problem, does anyone have an idea why this is happening? > > I can work around it with set_default_parser(XMLParser(no_network=False)) for the moment, so this is not urgent. > > Thanks, > > -- > Nicholas Riley > > _______________________________________________ > lxml-dev mailing list > lxml-dev at codespeak.net > http://codespeak.net/mailman/listinfo/lxml-dev > From noah at mahalo.com Wed Oct 6 19:08:51 2010 From: noah at mahalo.com (Silas, Noah) Date: Wed, 6 Oct 2010 10:08:51 -0700 Subject: [lxml-dev] Unofficial GitHub Mirror Message-ID: Hi lxml-developers! I've set up a github mirror of the lxml svn repo; It updates the branch 'master' to reflect the current state of lxml trunk every 15 minutes. ( Thanks to Mahalo.com for supplying the server performing that sync. ) http://github.com/noah256/lxml This is an unofficial mirror, and it is not currently tracking any branches in the lxml svn (if there is demand for this, please let me know.) This mirror is provided simply to enable developers to easily grab and work with the code; I will not be respecting pull requests to this repo, so you will still need to use the normal process to get your code into SVN. Happy Coding! ~Noah Silas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20101006/a079cfa4/attachment.htm From hop at g.pl Fri Oct 8 21:59:18 2010 From: hop at g.pl (hubert poduszczak) Date: Fri, 8 Oct 2010 21:59:18 +0200 Subject: [lxml-dev] IOError Message-ID: I constantly have an error: IOError: Error reading file 'http://allegro.pl/special_listing.php?type=new&buy=2': failed to load external entity "http://allegro.pl/special_listing.php?type=new&buy=2" when I try to run this program: import lxml.html def output(): parser = lxml.html.HTMLParser() tree = lxml.html.parse('http://allegro.pl/special_listing.php?type=new&buy=2') result = lxml.html.tostring(tree) r = tree.xpath("//div[@id='tabMainBox']//a[@class='alleLink']/span/text()") return r What's wrong with it? Similar program written in PHP works perfectly on server and even this program works on my machine having installed LXML run from the console,but fails to run on remote server.Why. Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20101008/87164276/attachment.htm From svetlyak.40wt at gmail.com Sat Oct 9 14:30:53 2010 From: svetlyak.40wt at gmail.com (Alexander Artemenko) Date: Sat, 9 Oct 2010 16:30:53 +0400 Subject: [lxml-dev] Html cleaner improvement Message-ID: Hi all! Let me suggest a HTML Cleaner improvement. It allows to switch between HTML and XHTML output serialisation. Please, look at the patch in the attachement. -- Alexander Artemenko (a.k.a. Svetlyak 40wt) Blog: http://aartemenko.com Photos: http://svetlyak.ru Jabber: svetlyak.40wt at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: clean.patch Type: application/octet-stream Size: 2213 bytes Desc: not available Url : http://codespeak.net/pipermail/lxml-dev/attachments/20101009/39c0330f/attachment.obj From stefan_ml at behnel.de Mon Oct 11 18:52:18 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Oct 2010 18:52:18 +0200 Subject: [lxml-dev] Attribute Whitelist in lxml.html.clean In-Reply-To: References: Message-ID: <4CB340C2.7020202@behnel.de> Richard Zurad, 21.09.2010 21:31: > In the __call__ method where we do javascript > sanitization if the contructor was called with javascript=True (or the > default behavior), there is code that simply skips javascript sanitization > on attributes if the safe_attrs_only is True since the feedparser whitelist > does not include event attributes. This leads to a question of what to do if > the object is instantiated with, for example, an attribute whitelist that > includes 'onchange'? Would it be safe to assume that if an event attribute > is passed in as part of the whitelist, we want to allow javascript on that > attribute and that attribute only? Or should we raise an error because it > doesn't make sense to pass in an event attribute in the whitelist along with > javascript=True? Absolutely not an error case. I think a whitelist naturally takes precedence over a more generic option that deletes "everything Javascript" (and other generic options that delete some attributes). Stefan From stefan_ml at behnel.de Mon Oct 11 18:59:14 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Oct 2010 18:59:14 +0200 Subject: [lxml-dev] lxml.html.submit_form and unicode values In-Reply-To: References: Message-ID: <4CB34262.6010509@behnel.de> Eugene Van den Bulke, 15.09.2010 08:08: > I encountered a unicode problem trying to submit the following form. > >
action="http://recherche2.assemblee-nationale.fr/resultats_tribun.jsp" > id="Lien1"> > > > > >
Hmm, yes, looks like the form handling code doesn't properly encode the values. That's a bug. Does anyone know what the correct encoding is for submitting the form? Is it the original encoding of the page? And: what should happen if the values cannot be encoded? Maybe an explicit encoding option would take care of this case. Stefan From brecht123 at gmail.com Fri Oct 15 11:14:24 2010 From: brecht123 at gmail.com (Brecht Schoolmeesters) Date: Fri, 15 Oct 2010 11:14:24 +0200 Subject: [lxml-dev] Bug in Xpath class? Message-ID: <4CB81B70.3010606@gmail.com> Hello, I am using version lxml-2.2.8-py2.6. I think I have found a bug in the Xpath class. When I use this tree: 1 2 3 and use this to evaluate an xpath expression: root = etree.XML("123") find = etree.XPath("//a") print(find(root)[0].text) it returns "None". Shouldn't this return "2" ? When I run "//a/i/text(), it returns the correct values. The "2" is nowhere to be found, not in the children of the "a" attribute or anywhere else. I really need this code to work. Is it possible to provide a solution for this? thanks in advance, Brecht From stefan_ml at behnel.de Fri Oct 15 11:29:04 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Oct 2010 11:29:04 +0200 Subject: [lxml-dev] Bug in Xpath class? In-Reply-To: <4CB81B70.3010606@gmail.com> References: <4CB81B70.3010606@gmail.com> Message-ID: <4CB81EE0.6020309@behnel.de> Brecht Schoolmeesters, 15.10.2010 11:14: > > > 1 > 2 > 3 > > > [...] > The "2" is nowhere to be found, not in the children of the "a" attribute > or anywhere else. http://codespeak.net/lxml/tutorial.html#elements-contain-text Stefan From nicolas at nexedi.com Fri Oct 15 16:35:11 2010 From: nicolas at nexedi.com (Nicolas Delaby) Date: Fri, 15 Oct 2010 16:35:11 +0200 Subject: [lxml-dev] patch: BeautifulSoup compatibility against trunk Message-ID: <4CB8669F.2050401@nexedi.com> Hi, It seems that beautifulsoup dev team modify the namespace and API. of module. I'm proposing a patch (against @77987) who address this issue. Regards, Nicolas Index: soupparser.py =================================================================== --- soupparser.py (r?vision 77987) +++ soupparser.py (copie de travail) @@ -4,8 +4,13 @@ __all__ = ["fromstring", "parse", "convert_tree"] from lxml import etree, html -from BeautifulSoup import \ - BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString +try: + from BeautifulSoup import \ + BeautifulSoup, Tag, Comment, ProcessingInstruction, NavigableString +except ImportError: + from beautifulsoup import BeautifulSoup + from beautifulsoup.builder import HTMLParserTreeBuilder + from beautifulsoup.element import Tag, Comment, ProcessingInstruction, NavigableString def fromstring(data, beautifulsoup=None, makeelement=None, **bsargs): @@ -63,7 +68,8 @@ makeelement = html.html_parser.makeelement if 'convertEntities' not in bsargs: bsargs['convertEntities'] = 'html' - tree = beautifulsoup(source, **bsargs) + builder = HTMLParserTreeBuilder(**bsargs) + tree = beautifulsoup(source, builder=builder) root = _convert_tree(tree, makeelement) # from ET: wrap the document in a html root element, if necessary if len(root) == 1 and root[0].tag == "html": -- Nicolas Delaby Nexedi: Consulting and Development of Libre / Open Source Software http://www.nexedi.com/ From stefan_ml at behnel.de Fri Oct 15 17:07:28 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Oct 2010 17:07:28 +0200 Subject: [lxml-dev] patch: BeautifulSoup compatibility against trunk In-Reply-To: <4CB8669F.2050401@nexedi.com> References: <4CB8669F.2050401@nexedi.com> Message-ID: <4CB86E30.3010107@behnel.de> Nicolas Delaby, 15.10.2010 16:35: > It seems that beautifulsoup dev team modify the namespace and API. > of module. In what version of BS is that? Stefan From nicolas at nexedi.com Fri Oct 15 17:25:13 2010 From: nicolas at nexedi.com (Nicolas Delaby) Date: Fri, 15 Oct 2010 17:25:13 +0200 Subject: [lxml-dev] patch: BeautifulSoup compatibility against trunk In-Reply-To: <4CB86E30.3010107@behnel.de> References: <4CB8669F.2050401@nexedi.com> <4CB86E30.3010107@behnel.de> Message-ID: <4CB87259.2030107@nexedi.com> Le 15/10/2010 17:07, Stefan Behnel a ?crit : > Nicolas Delaby, 15.10.2010 16:35: >> It seems that beautifulsoup dev team modify the namespace and API. >> of module. > > In what version of BS is that? > Hi, from http://www.crummy.com/software/BeautifulSoup/ it is explained that source code is hosted by launchpad. And my patch is applied against trunk branch http://bazaar.launchpad.net/~leonardr/beautifulsoup/trunk/files Sorry if I'm wrong, just discard the patch. Regards, Nicolas -- Nicolas Delaby Nexedi: Consulting and Development of Libre / Open Source Software http://www.nexedi.com/ From john at milo.com Sun Oct 17 07:20:53 2010 From: john at milo.com (John Evans) Date: Sat, 16 Oct 2010 22:20:53 -0700 Subject: [lxml-dev] smart_strings=False not working? Message-ID: Hi, It's my understanding that if you pass the smart_strings=False flag to tree.xpath it should return regular python str objects but that does not seem to be the case: In [1]: from lxml import etree In [2]: etree.LXML_VERSION Out[2]: (2, 2, 2, 0) In [3]: from lxml import html In [4]: tree = html.fromstring("foo") In [5]: type(tree.xpath("//b")[0].text_content()) Out[5]: In [6]: type(tree.xpath("//b", smart_strings=False)[0].text_content()) Out[6]: Am I missing something or is this a bug? Thanks, - John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20101016/24ab65d9/attachment.htm From piet at vanoostrum.org Sun Oct 17 16:02:25 2010 From: piet at vanoostrum.org (Piet van Oostrum) Date: Sun, 17 Oct 2010 10:02:25 -0400 Subject: [lxml-dev] smart_strings=False not working? In-Reply-To: References: Message-ID: <19643.497.646279.615891@cochabamba.vanoostrum.org> John Evans wrote: > Hi, > > It's my understanding that if you pass the smart_strings=False flag > to tree.xpath it should return regular python str objects but that > does not seem to be the case: > > In [1]: from lxml import etree > > In [2]: etree.LXML_VERSION Out[2]: (2, 2, 2, 0) > > In [3]: from lxml import html > > In [4]: tree = html.fromstring("foo") > > In [5]: type(tree.xpath("//b")[0].text_content()) Out[5]: 'lxml.etree._ElementStringResult'> > > In [6]: type(tree.xpath("//b", > smart_strings=False)[0].text_content()) Out[6]: 'lxml.etree._ElementStringResult'> > > Am I missing something or is this a bug? > smart_strings applies only to those results of XPath that are strings. Your XPath expression returns an Element (HtmlElement tp be precise) and that doesn't have any information about the smart_string parameter. The following works: >>> r1 = tree.xpath("//b/text()")[0] >>> type(r1) >>> r2 = tree.xpath("//b/text()", smart_strings=False)[0] >>> type(r2) >>> -- Piet van Oostrum Cochabamba. URL: http://pietvanoostrum.com/ Nu Fair Trade woonartikelen op http://www.zylja.com From frank at chagford.com Tue Oct 19 09:48:23 2010 From: frank at chagford.com (Frank Millman) Date: Tue, 19 Oct 2010 09:48:23 +0200 Subject: [lxml-dev] How to create objectify SubElement with text Message-ID: <20101019081932.179AE36C228@codespeak.net> Hi all I can create an objectify SubElement ok, but I cannot see how to add text to it. As the docs say, it is immutable, so you cannot add text after it has been created. But I cannot see anywhere in the constructor where you can include an argument for a text string. If I add 'text=xxx', it creates an attribute called 'text', which is not what is required. What am I missing? Thanks Frank Millman From stefan_ml at behnel.de Tue Oct 19 10:34:13 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 19 Oct 2010 10:34:13 +0200 Subject: [lxml-dev] How to create objectify SubElement with text In-Reply-To: <20101019081932.179AE36C228@codespeak.net> References: <20101019081932.179AE36C228@codespeak.net> Message-ID: <4CBD5805.5030301@behnel.de> Frank Millman, 19.10.2010 09:48: > I can create an objectify SubElement ok, but I cannot see how to add text to > it. > > As the docs say, it is immutable, so you cannot add text after it has been > created. > > But I cannot see anywhere in the constructor where you can include an > argument for a text string. > > If I add 'text=xxx', it creates an attribute called 'text', which is not > what is required. > > What am I missing? The API, basically. ;) root.child = "5" creates a "child" subelement with text "5". Stefan From frank at chagford.com Tue Oct 19 11:56:24 2010 From: frank at chagford.com (Frank Millman) Date: Tue, 19 Oct 2010 11:56:24 +0200 Subject: [lxml-dev] How to create objectify SubElement with text In-Reply-To: <4CBD5805.5030301@behnel.de> Message-ID: <20101019095647.B26B8282B90@codespeak.net> Stefan Behnel wrote: > > Frank Millman, 19.10.2010 09:48: > > I can create an objectify SubElement ok, but I cannot see > how to add text to > > it. > > > > As the docs say, it is immutable, so you cannot add text > after it has been > > created. > > > > But I cannot see anywhere in the constructor where you can > include an > > argument for a text string. > > > > If I add 'text=xxx', it creates an attribute called 'text', > which is not > > what is required. > > > > What am I missing? > > The API, basically. ;) > > root.child = "5" > > creates a "child" subelement with text "5". > Thanks, Stefan. It works, but it does not do what I want. Firstly, hypothetically, how would you add a SubElement with text *and* with attributes? Secondly, and this is my actual problem, if I convert the result to a string using 'tostring', it adds the 'pytype' namespace information to each sub-element, which I don't really want. The original tree is parsed in from a file, and if I convert that back to a string using 'tostring' before adding any sub-elements, the namespace information does not appear. In this case it is not a serious problem, as I can easily create the required string programatically, without using lxml. But it would be nice to know if it is possible. Thanks Frank From jholg at gmx.de Tue Oct 19 14:26:20 2010 From: jholg at gmx.de (jholg at gmx.de) Date: Tue, 19 Oct 2010 14:26:20 +0200 Subject: [lxml-dev] How to create objectify SubElement with text In-Reply-To: <20101019095647.B26B8282B90@codespeak.net> References: <20101019095647.B26B8282B90@codespeak.net> Message-ID: <20101019122620.117100@gmx.net> Hi, > Thanks, Stefan. It works, but it does not do what I want. > > Firstly, hypothetically, how would you add a SubElement with text *and* > with > attributes? >>> root = objectify.Element('root') >>> root.s = objectify.DataElement("5", foo='bar') >>> print objectify.dump(root) root = None [ObjectifiedElement] s = '5' [StringElement] * py:pytype = 'str' * foo = 'bar' >>> print etree.tostring(root, pretty_print=True) 5 >>> > Secondly, and this is my actual problem, if I convert the result to a > string > using 'tostring', it adds the 'pytype' namespace information to each > sub-element, which I don't really want. The original tree is parsed in > from > a file, and if I convert that back to a string using 'tostring' before > adding any sub-elements, the namespace information does not appear. > > In this case it is not a serious problem, as I can easily create the > required string programatically, without using lxml. But it would be nice > to > know if it is possible. You can completely deannotate py:pytype information using objectify.deannotate() and cleanup namespaces using etree.cleanup_namespaces(). Holger -- Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief! Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail From frank at chagford.com Tue Oct 19 14:44:45 2010 From: frank at chagford.com (Frank Millman) Date: Tue, 19 Oct 2010 14:44:45 +0200 Subject: [lxml-dev] How to create objectify SubElement with text In-Reply-To: <20101019122620.117100@gmx.net> Message-ID: <20101019124508.839E9282B90@codespeak.net> Hi Holger > > Firstly, hypothetically, how would you add a SubElement > > with text *and* with attributes? > > >>> root = objectify.Element('root') > >>> root.s = objectify.DataElement("5", foo='bar') Works perfectly, thanks. > > Secondly, and this is my actual problem, if I convert the result > > to a string using 'tostring', it adds the 'pytype' namespace > > information to each sub-element, which I don't really want. > > > You can completely deannotate py:pytype information using > objectify.deannotate() and cleanup namespaces using > etree.cleanup_namespaces(). > Also works perfectly. Many thanks for the information. Frank From lists at beanalby.net Wed Oct 20 17:48:06 2010 From: lists at beanalby.net (Jason Viers) Date: Wed, 20 Oct 2010 11:48:06 -0400 Subject: [lxml-dev] Potential bug in apihelpers.pix _utf8()? Message-ID: <4CBF0F36.10700@beanalby.net> I'm new to lxml, but while investigating a problem I saw the following code in src/lxml/apiheaders.pxi: ------------------------------------- cdef bytes _utf8(object s): cdef int invalid if python.PyBytes_CheckExact(s): invalid = check_string_utf8(s) elif python.PyUnicode_CheckExact(s) or python.PyUnicode_Check(s): s = python.PyUnicode_AsUTF8String(s) invalid = check_string_utf8(s) == -1 elif python.PyBytes_Check(s): s = bytes(s) invalid = check_string_utf8(s) else: raise TypeError, u"Argument must be string or unicode." if invalid: raise ValueError, \ u"All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters" return s ------------------------------------- check_string_utf8() is called 3 times and once its results are compared to -1, the other two stored directly. check_string_utf8()'s docs & code look like it returns -1 for invalid. I'm guessing the other two assignment to invalid should also compare to -1? Jason From stefan_ml at behnel.de Wed Oct 20 19:36:27 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 20 Oct 2010 19:36:27 +0200 Subject: [lxml-dev] Potential bug in apihelpers.pix _utf8()? In-Reply-To: <4CBF0F36.10700@beanalby.net> References: <4CBF0F36.10700@beanalby.net> Message-ID: <4CBF289B.7050601@behnel.de> Jason Viers, 20.10.2010 17:48: > I'm new to lxml, but while investigating a problem I saw the following > code in src/lxml/apiheaders.pxi: > > ------------------------------------- > cdef bytes _utf8(object s): > cdef int invalid > if python.PyBytes_CheckExact(s): > invalid = check_string_utf8(s) > elif python.PyUnicode_CheckExact(s) or python.PyUnicode_Check(s): > s = python.PyUnicode_AsUTF8String(s) > invalid = check_string_utf8(s) == -1 > elif python.PyBytes_Check(s): > s = bytes(s) > invalid = check_string_utf8(s) > else: > raise TypeError, u"Argument must be string or unicode." > if invalid: > raise ValueError, \ > u"All strings must be XML compatible: Unicode or ASCII, no > NULL bytes or control characters" > return s > ------------------------------------- > > check_string_utf8() is called 3 times and once its results are compared > to -1, the other two stored directly. check_string_utf8()'s docs& code > look like it returns -1 for invalid. I'm guessing the other two > assignment to invalid should also compare to -1? Nope, all fine. Byte strings are invalid when they contain non-ASCII characters (which is what check_string_utf8() tests for) or invalid characters (not allowed by the XML standard, which is what a return value of "-1" means). Unicode strings are invalid only when they contain invalid characters. So any -1 return value must always be rejected, whereas for byte strings all non-zero values make the string invalid. I think a docstring would help in that function, given how important it is. Stefan From lists at beanalby.net Wed Oct 20 23:17:56 2010 From: lists at beanalby.net (Jason Viers) Date: Wed, 20 Oct 2010 17:17:56 -0400 Subject: [lxml-dev] Potential bug in apihelpers.pix _utf8()? In-Reply-To: <4CBF289B.7050601@behnel.de> References: <4CBF0F36.10700@beanalby.net> <4CBF289B.7050601@behnel.de> Message-ID: <4CBF5C84.3030105@beanalby.net> On 10/20/2010 13:36, Stefan Behnel wrote: > So any -1 return value must always be rejected, whereas for byte > strings all non-zero values make the string invalid. Ok, that makes perfect sense. Thanks for the clarification! Jason From agroszer at gmail.com Thu Oct 21 15:31:59 2010 From: agroszer at gmail.com (Adam GROSZER) Date: Thu, 21 Oct 2010 15:31:59 +0200 Subject: [lxml-dev] someone please please upload a py2.5 binary for win32 to pypi, thanks! Message-ID: <63177602.20101021153159@gmail.com> From stefan_ml at behnel.de Thu Oct 21 15:37:52 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 Oct 2010 15:37:52 +0200 Subject: [lxml-dev] someone please please upload a py2.5 binary for win32 to pypi, thanks! In-Reply-To: <63177602.20101021153159@gmail.com> References: <63177602.20101021153159@gmail.com> Message-ID: <4CC04230.7050607@behnel.de> http://pypi.python.org/pypi/lxml/2.2.6 Stefan From agroszer at gmail.com Thu Oct 21 16:57:42 2010 From: agroszer at gmail.com (Adam GROSZER) Date: Thu, 21 Oct 2010 16:57:42 +0200 Subject: [lxml-dev] someone please please upload a py2.5 binary for win32 to pypi, thanks! In-Reply-To: <4CC04230.7050607@behnel.de> References: <63177602.20101021153159@gmail.com> <4CC04230.7050607@behnel.de> Message-ID: <839293463.20101021165742@gmail.com> Hello Stefan, Gaah missed the version number... 2.2.8 Thursday, October 21, 2010, 3:37:52 PM, you wrote: SB> http://pypi.python.org/pypi/lxml/2.2.6 SB> Stefan -- Best regards, Adam GROSZER mailto:agroszer at gmail.com -- Quote of the day: The man who makes no mistakes does not usually make anything. - Bishop W.C. Magee From crucialfelix at gmail.com Tue Oct 26 15:28:25 2010 From: crucialfelix at gmail.com (felix) Date: Tue, 26 Oct 2010 15:28:25 +0200 Subject: [lxml-dev] Compile failure Message-ID: According to this: http://codespeak.net/lxml/build.html we should avoid installing Cython but using easy_install to build fails saying the cython generated file is missing I tried to build from source (svn checkout) and had the same issue Last Changed Author: scoder Last Changed Rev: 78186 Last Changed Date: 2010-10-21 13:48:54 -0400 (Thu, 21 Oct 2010) so I install cython anyway and try to compile the source, this also fails : Error converting Pyrex file to C: ------------------------------------------------------------ ... c_child = _findChildForwards(c_node, 0) while c_child is not NULL: if c_child.type == tree.XML_ELEMENT_NODE: for i in range(c_tag_count): if _tagMatchesExactly(c_child, c_ns_tags[2*i], c_ns_tags[2*i+1]): c_next = _findChildForwards(c_child, 0) or _nextElement(c_child) ^ ------------------------------------------------------------ /home/crucial/tmp/lxml/src/lxml/cleanup.pxi:246:64: Cannot assign type 'int' to 'xmlNode *' building 'lxml.etree' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include -I/usr/local/include/libxml2 -I/usr/include/python2.6 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.6/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation. *but then I succeeded with the old sudo easy_install lxml* because now I have Cython just FYI. so the advice on the build page says not to install Cython but that was the only way I got it to work. building from source in that checkout failed -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20101026/2b84be7d/attachment.htm From tomw at ubilix.com Wed Oct 27 11:51:04 2010 From: tomw at ubilix.com (tomw) Date: Wed, 27 Oct 2010 11:51:04 +0200 Subject: [lxml-dev] Python object serialization with lxml Message-ID: <1288173064.2404.13.camel@twlaptop-2> I was thinking about the serialization of Python objects into xml and back. Similar to what can be done with pyxser [1], but using the power of lxml. My first approach after reading the docs would be to use objectify with perhaps some additional Element Classes and the respective ElementClassLookup. Is this the right approach? Has that been done already somewhere (at least I couldn't find anything related so far)? Any idea would be appreciated. cheers [1] http://coder.cl/category/projects/pyxser/ From nicolas at nexedi.com Wed Oct 27 12:07:53 2010 From: nicolas at nexedi.com (Nicolas Delaby) Date: Wed, 27 Oct 2010 12:07:53 +0200 Subject: [lxml-dev] Python object serialization with lxml In-Reply-To: <1288173064.2404.13.camel@twlaptop-2> References: <1288173064.2404.13.camel@twlaptop-2> Message-ID: <4CC7F9F9.9080102@nexedi.com> Le 27/10/2010 11:51, tomw a ?crit : > I was thinking about the serialization of Python objects into xml and > back. Similar to what can be done with pyxser [1], but using the power > of lxml. My first approach after reading the docs would be to use > objectify with perhaps some additional Element Classes and the > respective ElementClassLookup. Is this the right approach? Has that been > done already somewhere (at least I couldn't find anything related so > far)? Any idea would be appreciated. > > cheers > > [1] http://coder.cl/category/projects/pyxser/ > Hi, I'm maintaining a utility calls xml_marshaller which comes from PyXML project. It is now based on lxml. eggs are available on pypi: http://pypi.python.org/pypi/xml_marshaller/ Regards, Nicolas -- Nicolas Delaby Nexedi: Consulting and Development of Libre / Open Source Software http://www.nexedi.com/ From jwashin at vt.edu Wed Oct 27 12:57:26 2010 From: jwashin at vt.edu (James Washington) Date: Wed, 27 Oct 2010 06:57:26 -0400 Subject: [lxml-dev] Python object serialization with lxml In-Reply-To: <1288173064.2404.13.camel@twlaptop-2> References: <1288173064.2404.13.camel@twlaptop-2> Message-ID: <1288177046.3019.45.camel@zif.hillstreet.home> On Wed, 2010-10-27 at 11:51 +0200, tomw wrote: > I was thinking about the serialization of Python objects into xml and > back. Similar to what can be done with pyxser [1], but using the power > of lxml. My first approach after reading the docs would be to use > objectify with perhaps some additional Element Classes and the > respective ElementClassLookup. Is this the right approach? Has that been > done already somewhere (at least I couldn't find anything related so > far)? Any idea would be appreciated. Hi, Tom I also wrote one, hidden in the zif.sedna repository on sourceforge. http://zif.svn.sourceforge.net/viewvc/zif/zif.sedna/trunk/src/zif/sedna/persistence/pickle.py?revision=182&view=markup - Jim Washington From saul at ag-projects.com Thu Oct 28 19:46:25 2010 From: saul at ag-projects.com (=?UTF-8?B?U2HDumwgSWJhcnJhIENvcnJldGfDqQ==?=) Date: Thu, 28 Oct 2010 19:46:25 +0200 Subject: [lxml-dev] Python object serialization with lxml In-Reply-To: <1288173064.2404.13.camel@twlaptop-2> References: <1288173064.2404.13.camel@twlaptop-2> Message-ID: <4CC9B6F1.3030604@ag-projects.com> Hi, On 27/10/10 11:51 AM, tomw wrote: > I was thinking about the serialization of Python objects into xml and > back. Similar to what can be done with pyxser [1], but using the power > of lxml. My first approach after reading the docs would be to use > objectify with perhaps some additional Element Classes and the > respective ElementClassLookup. Is this the right approach? Has that been > done already somewhere (at least I couldn't find anything related so > far)? Any idea would be appreciated. > Not long ago I had to do the same an I used the objectify API, it worked just fine. However, there was no Debian package which included some recent fix needed, so I had to compile it myself. I'm not sure if there are more up to date packages in other platforms though. Regards, -- Sa?l Ibarra Corretg? AG Projects From stefan_ml at behnel.de Sat Oct 30 10:49:06 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 30 Oct 2010 10:49:06 +0200 Subject: [lxml-dev] Compile failure In-Reply-To: References: Message-ID: <4CCBDC02.3080305@behnel.de> felix, 26.10.2010 15:28: > According to this: > http://codespeak.net/lxml/build.html > > we should avoid installing Cython > > but using easy_install to build fails saying the cython generated file is > missing I doubt that it's failing because of that. However, you didn't provide the output of the build, so I can't guess what happened that actually made the build fail. > I tried to build from source (svn checkout) and had the same issue > > Last Changed Author: scoder > Last Changed Rev: 78186 > Last Changed Date: 2010-10-21 13:48:54 -0400 (Thu, 21 Oct 2010) > > > so I install cython anyway Well, the page above also says: """ Only if you are interested in building lxml from a Subversion checkout (e.g. to test a bug fix that has not been release yet) or if you want to be an lxml developer, then you do need a working Cython installation. """ so you actually *need* Cython for a build from developer sources. > and try to compile the source, this also fails : > > Error converting Pyrex file to C: > ------------------------------------------------------------ > ... > c_child = _findChildForwards(c_node, 0) > while c_child is not NULL: > if c_child.type == tree.XML_ELEMENT_NODE: > for i in range(c_tag_count): > if _tagMatchesExactly(c_child, c_ns_tags[2*i], > c_ns_tags[2*i+1]): > c_next = _findChildForwards(c_child, 0) or > _nextElement(c_child) > ^ > ------------------------------------------------------------ > > /home/crucial/tmp/lxml/src/lxml/cleanup.pxi:246:64: Cannot assign type 'int' > to 'xmlNode *' Looks like you installed an older version of Cython, likely pre-0.12. The current trunk of lxml requires 0.13. The latest build instructions for the SVN trunk are in the SVN trunk as "doc/build.txt", or (not always completely up-to-date) here: http://codespeak.net/lxml/dev/build.html > *but then I succeeded with the old sudo easy_install lxml* > > because now I have Cython Again, I doubt that this is the reason. Stefan