From lxml-checkins at codespeak.net Thu Jan 1 11:09:23 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Thu, 1 Jan 2009 11:09:23 +0100 (CET) Subject: [Lxml-checkins] Pleasure her in all the right ways Message-ID: <20090101100923.D443316841C@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090101/4ad7f835/attachment-0001.htm From lxml-checkins at codespeak.net Fri Jan 2 00:11:13 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 2 Jan 2009 00:11:13 +0100 (CET) Subject: [Lxml-checkins] your stock redemption Message-ID: <20090101231113.86DCF168406@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090102/2c0a1c17/attachment.htm From service at codespeak.net Fri Jan 2 06:22:36 2009 From: service at codespeak.net (codespeak.net Associates Pr.) Date: Fri, 2 Jan 2009 14:22:36 +0900 Subject: [Lxml-checkins] A new survey from codespeak.net Message-ID: <01c96ce5$8f0bde00$8716cfd3@xyyu> Lovely, Losing weight is possible. Don't despair. Take back control of your weight and most importantly, your life. If others can do it, why not you? FatLoss4Idiots program helps you to lose weight, and it does that in the most healthy-way, unlike other fad diets in the market. Also, with fatloss4idiots, you are able to generate custom diet plans that compute all of your calories. But as we said, the decision is yours. Fatloss4idiots has proved to work for a lot of people. So, it has a fair chance of working for you as well, but, only if you are 100% dedicated to follow the diet - and if you don't get bored of it. So, if you are SERIOUS to lose weight, it is HIGHLY RECOMMENDED... And, with a 100% money-back guarantee you can't go wrong. Can you? So, go visit http://www.4idiotsweightone.com if you have made up your mind to buy it. Very thankful to you, codespeak.net Associates Pr. From ianb at codespeak.net Fri Jan 2 22:43:48 2009 From: ianb at codespeak.net (ianb at codespeak.net) Date: Fri, 2 Jan 2009 22:43:48 +0100 (CET) Subject: [Lxml-checkins] r60760 - in lxml/trunk: . src/lxml/html src/lxml/html/tests Message-ID: <20090102214348.8A2681683F3@codespeak.net> Author: ianb Date: Fri Jan 2 22:43:46 2009 New Revision: 60760 Modified: lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/html/__init__.py lxml/trunk/src/lxml/html/tests/test_rewritelinks.txt Log: Fix link reading when using CSS like url('link') Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri Jan 2 22:43:46 2009 @@ -2,6 +2,16 @@ lxml changelog ============== +Under Development +================= + +Bugs fixed +---------- + +* ``iter_links`` (and related link-rewriting functions) in + ``lxml.html`` would interpret CSS like ``url("link")`` incorrectly + (treating the quotation marks as part of the link). + 2.2beta1 (2008-12-12) ===================== Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Fri Jan 2 22:43:46 2009 @@ -67,7 +67,7 @@ _class_xpath = etree.XPath("descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), concat(' ', $class_name, ' '))]") _id_xpath = etree.XPath("descendant-or-self::*[@id=$id]") _collect_string_content = etree.XPath("string()") -_css_url_re = re.compile(r'url\((.*?)\)', re.I) +_css_url_re = re.compile(r'url\([QUOTE"]?(.*?)[QUOTE"]?\)'.replace('QUOTE', "'"), re.I) _css_import_re = re.compile(r'@import "(.*?)"') _label_xpath = etree.XPath("//label[@for=$id]|//x:label[@for=$id]", namespaces={'x':XHTML_NAMESPACE}) Modified: lxml/trunk/src/lxml/html/tests/test_rewritelinks.txt ============================================================================== --- lxml/trunk/src/lxml/html/tests/test_rewritelinks.txt (original) +++ lxml/trunk/src/lxml/html/tests/test_rewritelinks.txt Fri Jan 2 22:43:46 2009 @@ -45,6 +45,15 @@ body {background-image: url(https://new/image.gif)}; @import "https://new/other-style.css"; + >>> print(rewrite_links(''' + ... ''', relocate_href)) + Those links in style attributes are also rewritten:: From lxml-checkins at codespeak.net Sun Jan 4 19:16:26 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Sun, 4 Jan 2009 19:16:26 +0100 (CET) Subject: [Lxml-checkins] Now she can't get her hands off me Message-ID: <20090104181626.E31491684F7@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090104/86a30368/attachment.htm From lxml-checkins at codespeak.net Mon Jan 5 03:30:27 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 5 Jan 2009 03:30:27 +0100 (CET) Subject: [Lxml-checkins] Be 9 inches in 2009 Message-ID: <20090105023027.E157C1684C9@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090105/f711eb22/attachment.htm From lxml-checkins at codespeak.net Tue Jan 6 12:32:26 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 6 Jan 2009 12:32:26 +0100 (CET) Subject: [Lxml-checkins] Massive python in your pants Message-ID: <20090106113226.824C316845F@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090106/99cfe3af/attachment.htm From scoder at codespeak.net Tue Jan 6 20:31:34 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Jan 2009 20:31:34 +0100 (CET) Subject: [Lxml-checkins] r60815 - lxml/branch/lxml-2.1/src/lxml Message-ID: <20090106193134.4CA05168527@codespeak.net> Author: scoder Date: Tue Jan 6 20:31:33 2009 New Revision: 60815 Modified: lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx Log: make try-except import for StringIO/BytesIO more robust Modified: lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx (original) +++ lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx Tue Jan 6 20:31:33 2009 @@ -39,9 +39,11 @@ cdef object BytesIO, StringIO try: - from io import BytesIO, StringIO + from StringIO import StringIO + BytesIO = StringIO except ImportError: - from StringIO import StringIO, StringIO as BytesIO + # Python 3 + from io import BytesIO, StringIO cdef object _elementpath import _elementpath From scoder at codespeak.net Tue Jan 6 20:36:08 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Jan 2009 20:36:08 +0100 (CET) Subject: [Lxml-checkins] r60816 - lxml/branch/lxml-2.1 Message-ID: <20090106193608.2070416852A@codespeak.net> Author: scoder Date: Tue Jan 6 20:36:07 2009 New Revision: 60816 Modified: lxml/branch/lxml-2.1/CHANGES.txt Log: changelog Modified: lxml/branch/lxml-2.1/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.1/CHANGES.txt (original) +++ lxml/branch/lxml-2.1/CHANGES.txt Tue Jan 6 20:36:07 2009 @@ -2,6 +2,15 @@ lxml changelog ============== +Under development +================= + +Bugs fixed +---------- + +* Failing import on systems that have an ``io`` module. + + 2.1.4 (2008-12-12) ================== From scoder at codespeak.net Tue Jan 6 20:37:42 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Jan 2009 20:37:42 +0100 (CET) Subject: [Lxml-checkins] r60817 - in lxml/trunk: . src/lxml Message-ID: <20090106193742.1C1D916852A@codespeak.net> Author: scoder Date: Tue Jan 6 20:37:41 2009 New Revision: 60817 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r4943 at delle: sbehnel | 2009-01-06 20:35:58 +0100 safer fix for io-import problem Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Tue Jan 6 20:37:41 2009 @@ -31,9 +31,11 @@ cdef object BytesIO, StringIO try: - from io import BytesIO, StringIO + from StringIO import StringIO + BytesIO = StringIO except (ImportError, AttributeError): - from StringIO import StringIO, StringIO as BytesIO + # Python 3 + from io import BytesIO, StringIO cdef object _elementpath import _elementpath From scoder at codespeak.net Tue Jan 6 20:45:04 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Jan 2009 20:45:04 +0100 (CET) Subject: [Lxml-checkins] r60818 - lxml/branch/lxml-2.1/src/lxml Message-ID: <20090106194504.CC017168546@codespeak.net> Author: scoder Date: Tue Jan 6 20:45:04 2009 New Revision: 60818 Modified: lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx Log: use fix from trunk instead for Py2.6 compatibility Modified: lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx (original) +++ lxml/branch/lxml-2.1/src/lxml/lxml.etree.pyx Tue Jan 6 20:45:04 2009 @@ -39,11 +39,9 @@ cdef object BytesIO, StringIO try: - from StringIO import StringIO - BytesIO = StringIO -except ImportError: - # Python 3 from io import BytesIO, StringIO +except (ImportError, AttributeError): + from StringIO import StringIO, StringIO as BytesIO cdef object _elementpath import _elementpath From scoder at codespeak.net Tue Jan 6 21:11:22 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Jan 2009 21:11:22 +0100 (CET) Subject: [Lxml-checkins] r60819 - in lxml/branch/lxml-2.1: . doc Message-ID: <20090106201122.41ABE168551@codespeak.net> Author: scoder Date: Tue Jan 6 21:11:16 2009 New Revision: 60819 Modified: lxml/branch/lxml-2.1/CHANGES.txt lxml/branch/lxml-2.1/doc/main.txt lxml/branch/lxml-2.1/version.txt Log: prepare release of 2.1.5 Modified: lxml/branch/lxml-2.1/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.1/CHANGES.txt (original) +++ lxml/branch/lxml-2.1/CHANGES.txt Tue Jan 6 21:11:16 2009 @@ -2,12 +2,15 @@ lxml changelog ============== -Under development -================= +2.1.5 (2009-01-06) +================== Bugs fixed ---------- +* Potential memory leak on exception handling. This was due to a + problem in Cython, not lxml itself. + * Failing import on systems that have an ``io`` module. Modified: lxml/branch/lxml-2.1/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.1/doc/main.txt (original) +++ lxml/branch/lxml-2.1/doc/main.txt Tue Jan 6 21:11:16 2009 @@ -147,8 +147,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.1.4`_, released 2008-12-12 -(`changes for 2.1.4`_). `Older versions`_ are listed below. +The latest version is `lxml 2.1.5`_, released 2009-01-06 +(`changes for 2.1.5`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -220,7 +220,9 @@ `2.0 `_ and the `current in-development version `_. -.. _`PDF documentation`: lxmldoc-2.1.4.pdf +.. _`PDF documentation`: lxmldoc-2.1.5.pdf + +* `lxml 2.1.4`_, released 2008-12-12 (`changes for 2.1.4`_) * `lxml 2.1.3`_, released 2008-11-17 (`changes for 2.1.3`_) @@ -312,6 +314,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.1.5`: lxml-2.1.5.tgz .. _`lxml 2.1.4`: lxml-2.1.4.tgz .. _`lxml 2.1.3`: lxml-2.1.3.tgz .. _`lxml 2.1.2`: lxml-2.1.2.tgz @@ -358,6 +361,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.1.5`: changes-2.1.5.html .. _`changes for 2.1.4`: changes-2.1.4.html .. _`changes for 2.1.3`: changes-2.1.3.html .. _`changes for 2.1.2`: changes-2.1.2.html Modified: lxml/branch/lxml-2.1/version.txt ============================================================================== --- lxml/branch/lxml-2.1/version.txt (original) +++ lxml/branch/lxml-2.1/version.txt Tue Jan 6 21:11:16 2009 @@ -1 +1 @@ -2.1.4 +2.1.5 From scoder at codespeak.net Tue Jan 6 21:13:39 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 6 Jan 2009 21:13:39 +0100 (CET) Subject: [Lxml-checkins] r60821 - lxml/tag/lxml-2.1.5 Message-ID: <20090106201339.0CFFD168551@codespeak.net> Author: scoder Date: Tue Jan 6 21:13:37 2009 New Revision: 60821 Added: lxml/tag/lxml-2.1.5/ - copied from r60820, lxml/branch/lxml-2.1/ Log: tag for 2.1.5 From lxml-checkins at codespeak.net Wed Jan 7 08:00:03 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Wed, 7 Jan 2009 08:00:03 +0100 (CET) Subject: [Lxml-checkins] your password Message-ID: <20090107070003.887A716852A@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090107/0f007d7a/attachment-0001.htm From lxml-checkins at codespeak.net Wed Jan 7 09:33:16 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Wed, 7 Jan 2009 09:33:16 +0100 (CET) Subject: [Lxml-checkins] So hard it hurt me Message-ID: <20090107083316.B021C168545@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090107/6276e587/attachment.htm From scoder at codespeak.net Wed Jan 7 20:35:53 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 7 Jan 2009 20:35:53 +0100 (CET) Subject: [Lxml-checkins] r60836 - lxml/branch/lxml-2.1/doc Message-ID: <20090107193553.9F750168562@codespeak.net> Author: scoder Date: Wed Jan 7 20:35:51 2009 New Revision: 60836 Modified: lxml/branch/lxml-2.1/doc/build.txt Log: doc fix: required Cython version Modified: lxml/branch/lxml-2.1/doc/build.txt ============================================================================== --- lxml/branch/lxml-2.1/doc/build.txt (original) +++ lxml/branch/lxml-2.1/doc/build.txt Wed Jan 7 20:35:51 2009 @@ -44,10 +44,10 @@ want to be an lxml developer, then you do need a working Cython installation. You can use EasyInstall_ to install it:: - easy_install Cython==0.9.8 + easy_install Cython==0.10.3 -lxml currently requires Cython 0.9.8, later versions were not -tested. +lxml currently requires Cython 0.10.3, later release versions should +work as well. Subversion From lxml-checkins at codespeak.net Wed Jan 7 22:57:17 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Wed, 7 Jan 2009 22:57:17 +0100 (CET) Subject: [Lxml-checkins] I can penetrate her so deep now Message-ID: <20090107215717.49DEE168456@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090107/0d4ce6d7/attachment.htm From lxml-checkins at codespeak.net Thu Jan 8 12:44:30 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Thu, 8 Jan 2009 12:44:30 +0100 (CET) Subject: [Lxml-checkins] News Release: Obama Scandalous video Message-ID: <20090108114430.9766816857C@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090108/34f55d5b/attachment.htm From lxml-checkins at codespeak.net Fri Jan 9 12:31:11 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 9 Jan 2009 12:31:11 +0100 (CET) Subject: [Lxml-checkins] Monster in your pants Message-ID: <20090109113111.3CA5F169DFE@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090109/3febd527/attachment.htm From lxml-checkins at codespeak.net Fri Jan 9 14:16:58 2009 From: lxml-checkins at codespeak.net (Doctor Darren Cortez) Date: Fri, 9 Jan 2009 14:16:58 +0100 (CET) Subject: [Lxml-checkins] inev #30827 Internet Online Drugstore aslq Message-ID: <20090109071527.4893.qmail@mta.email.webmd.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090109/852ad97a/attachment.htm From lxml-checkins at codespeak.net Sat Jan 10 20:53:41 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Sat, 10 Jan 2009 20:53:41 +0100 (CET) Subject: [Lxml-checkins] Give her the best thrusts ever Message-ID: <20090110195341.4B4A916849D@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090110/6e3abbfa/attachment.htm From lxml-checkins at codespeak.net Sun Jan 11 19:01:46 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Sun, 11 Jan 2009 19:01:46 +0100 (CET) Subject: [Lxml-checkins] Page 3 girl smokin home video Message-ID: <20090111180146.193A616847F@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090111/7691b98e/attachment.htm From lxml-checkins at codespeak.net Mon Jan 12 04:52:46 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 12 Jan 2009 04:52:46 +0100 (CET) Subject: [Lxml-checkins] Only if you want more intense, mindblowing climax Message-ID: <20090112035246.9DFB7168401@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090112/d22f483d/attachment.htm From lxml-checkins at codespeak.net Mon Jan 12 14:52:54 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 12 Jan 2009 14:52:54 +0100 (CET) Subject: [Lxml-checkins] This will bring fire to her crotch Message-ID: <20090112135254.5F05216840D@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090112/0503c520/attachment.htm From lxml-checkins at codespeak.net Tue Jan 13 03:54:49 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 13 Jan 2009 03:54:49 +0100 (CET) Subject: [Lxml-checkins] Spread her legs with this Message-ID: <20090113025449.954CF1683E3@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090113/7b6f518c/attachment-0001.htm From lxml-checkins at codespeak.net Wed Jan 14 02:45:56 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Wed, 14 Jan 2009 02:45:56 +0100 (CET) Subject: [Lxml-checkins] In 2009 make sure every woman wants you Message-ID: <20090114014556.B02EC1684C5@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090114/6aa85770/attachment.htm From scoder at codespeak.net Sun Jan 18 22:09:22 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 18 Jan 2009 22:09:22 +0100 (CET) Subject: [Lxml-checkins] r61087 - in lxml/trunk: . src/lxml Message-ID: <20090118210922.384CC1684BA@codespeak.net> Author: scoder Date: Sun Jan 18 22:09:20 2009 New Revision: 61087 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r4947 at delle: sbehnel | 2009-01-06 20:42:50 +0100 reverted last change for compatibility with Py2.6 Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Sun Jan 18 22:09:20 2009 @@ -31,11 +31,9 @@ cdef object BytesIO, StringIO try: - from StringIO import StringIO - BytesIO = StringIO -except (ImportError, AttributeError): - # Python 3 from io import BytesIO, StringIO +except (ImportError, AttributeError): + from StringIO import StringIO, StringIO as BytesIO cdef object _elementpath import _elementpath From scoder at codespeak.net Sun Jan 18 22:09:28 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 18 Jan 2009 22:09:28 +0100 (CET) Subject: [Lxml-checkins] r61088 - lxml/trunk Message-ID: <20090118210928.2FE5A1684D2@codespeak.net> Author: scoder Date: Sun Jan 18 22:09:27 2009 New Revision: 61088 Modified: lxml/trunk/ (props changed) lxml/trunk/versioninfo.py Log: r4948 at delle: sbehnel | 2009-01-06 22:35:28 +0100 support SVN 1.5+ when reading revision numbers Modified: lxml/trunk/versioninfo.py ============================================================================== --- lxml/trunk/versioninfo.py (original) +++ lxml/trunk/versioninfo.py Sun Jan 18 22:09:27 2009 @@ -33,7 +33,7 @@ data = f.read() f.close() - if data.startswith('8'): + if data[:1] in ('8', '9'): # SVN >= 1.4 data = [ d.splitlines() for d in data.split('\n\x0c\n') ] del data[0][0] # get rid of the '8' From scoder at codespeak.net Sun Jan 18 22:09:32 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 18 Jan 2009 22:09:32 +0100 (CET) Subject: [Lxml-checkins] r61089 - in lxml/trunk: . doc Message-ID: <20090118210932.D3C741684D2@codespeak.net> Author: scoder Date: Sun Jan 18 22:09:32 2009 New Revision: 61089 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/build.txt Log: r4949 at delle: sbehnel | 2009-01-07 21:35:10 +0100 doc fix: require Cython 0.10.3 Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Sun Jan 18 22:09:32 2009 @@ -44,10 +44,10 @@ want to be an lxml developer, then you do need a working Cython installation. You can use EasyInstall_ to install it:: - easy_install Cython==0.10 + easy_install Cython==0.10.3 -lxml currently requires Cython 0.10, later versions were not -necessarily tested but should work as well. +lxml currently requires Cython 0.10.3, later release versions should +work as well. Subversion From scoder at codespeak.net Sun Jan 18 22:09:37 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 18 Jan 2009 22:09:37 +0100 (CET) Subject: [Lxml-checkins] r61090 - lxml/trunk Message-ID: <20090118210937.87E8E1684D6@codespeak.net> Author: scoder Date: Sun Jan 18 22:09:36 2009 New Revision: 61090 Modified: lxml/trunk/ (props changed) lxml/trunk/setup.py Log: r4950 at delle: sbehnel | 2009-01-18 22:07:36 +0100 add trove identifier for HTML Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Sun Jan 18 22:09:36 2009 @@ -102,6 +102,7 @@ # 'Programming Language :: Python :: 3.0', 'Programming Language :: C', 'Operating System :: OS Independent', + 'Topic :: Text Processing :: Markup :: HTML', 'Topic :: Text Processing :: Markup :: XML', 'Topic :: Software Development :: Libraries :: Python Modules' ], From lxml-checkins at codespeak.net Mon Jan 19 02:27:24 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 19 Jan 2009 02:27:24 +0100 (CET) Subject: [Lxml-checkins] Pfizer Company Message-ID: <20090119012724.F0B1B1684A9@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090119/707037a9/attachment.htm From lxml-checkins at codespeak.net Mon Jan 19 03:13:04 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 19 Jan 2009 03:13:04 +0100 (CET) Subject: [Lxml-checkins] Pfizer Company Message-ID: <20090119021304.7335E1684B1@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090119/4888aea9/attachment.htm From lxml-checkins at codespeak.net Tue Jan 20 14:48:27 2009 From: lxml-checkins at codespeak.net (LeGuen) Date: Tue, 20 Jan 2009 14:48:27 +0100 (CET) Subject: [Lxml-checkins] Creamy mounds of beautiful flesh Message-ID: <20090120134827.603F1168472@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090120/8304f7d9/attachment.htm From lxml-checkins at codespeak.net Tue Jan 20 17:29:47 2009 From: lxml-checkins at codespeak.net (Branger) Date: Tue, 20 Jan 2009 17:29:47 +0100 (CET) Subject: [Lxml-checkins] She'll be instantly wet once she sees this Message-ID: <20090120162947.37808168457@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090120/7f195ab3/attachment.htm From lxml-checkins at codespeak.net Wed Jan 21 10:58:52 2009 From: lxml-checkins at codespeak.net (kopan) Date: Wed, 21 Jan 2009 10:58:52 +0100 (CET) Subject: [Lxml-checkins] Don't suffer in silence from a small tool Message-ID: <20090121095852.6EB61168426@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090121/fa689d68/attachment.htm From lxml-checkins at codespeak.net Wed Jan 21 18:56:39 2009 From: lxml-checkins at codespeak.net (kawa) Date: Wed, 21 Jan 2009 18:56:39 +0100 (CET) Subject: [Lxml-checkins] We were supposed to work, why? Message-ID: <20090121175639.836DC168460@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090121/6a4bdc5a/attachment.htm From lxml-checkins at codespeak.net Thu Jan 22 15:52:44 2009 From: lxml-checkins at codespeak.net (Lupien) Date: Thu, 22 Jan 2009 15:52:44 +0100 (CET) Subject: [Lxml-checkins] The simplest way to gain size Message-ID: <20090122145244.E28861684FF@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090122/508e71c1/attachment-0001.htm From scoder at codespeak.net Sat Jan 24 03:43:03 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 24 Jan 2009 03:43:03 +0100 (CET) Subject: [Lxml-checkins] r61291 - in lxml/trunk: . doc Message-ID: <20090124024303.BD72F1684BB@codespeak.net> Author: scoder Date: Sat Jan 24 03:43:01 2009 New Revision: 61291 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r4959 at delle: sbehnel | 2009-01-24 03:41:06 +0100 new FAQ entry on mapping XML to a dict-of-dicts Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Jan 24 03:43:01 2009 @@ -22,6 +22,8 @@ 1.5 What is the difference between lxml.etree and lxml.objectify? 1.6 How can I make my application run faster? 1.7 What about that trailing text on serialised Elements? + 1.8 How can I find out if an Element is a comment or PI? + 1.9 How can I map an XML tree into a dict of dicts? 2 Installation 2.1 Which version of libxml2 and libxslt should I use or require? 2.2 Where are the Windows binaries? @@ -31,8 +33,9 @@ 3.2 How can I contribute? 4 Bugs 4.1 My application crashes! - 4.2 I think I have found a bug in lxml. What should I do? - 4.3 How do I know a bug is really in lxml and not in libxml2? + 4.2 My application crashes on MacOS-X! + 4.3 I think I have found a bug in lxml. What should I do? + 4.4 How do I know a bug is really in lxml and not in libxml2? 5 Threading 5.1 Can I use threads to concurrently access the lxml API? 5.2 Does my program run faster if I use threads? @@ -295,6 +298,18 @@ True +How can I map an XML tree into a dict of dicts? +----------------------------------------------- + +I'm glad you asked. + +.. sourcecode:: pycon + + def recursive_dict(element): + return element.tag, \ + dict(map(recursive_dict, element)) or element.text + + Installation ============ From scoder at codespeak.net Sat Jan 24 14:01:35 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 24 Jan 2009 14:01:35 +0100 (CET) Subject: [Lxml-checkins] r61300 - in lxml/trunk: . doc Message-ID: <20090124130135.91A2E16850F@codespeak.net> Author: scoder Date: Sat Jan 24 14:01:34 2009 New Revision: 61300 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r4961 at delle: sbehnel | 2009-01-24 12:58:06 +0100 rst fixes Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Jan 24 14:01:34 2009 @@ -303,7 +303,7 @@ I'm glad you asked. -.. sourcecode:: pycon +.. sourcecode:: python def recursive_dict(element): return element.tag, \ @@ -645,10 +645,12 @@ were moved to other documents. You should be on the safe side when passing trees between threads if you either -a) do not modify these trees and do not move their elements to other trees, or -b) do not terminate threads while the trees they parsed are still in use - (e.g. by using a fixed size thread-pool or long-running threads in - processing chains) +- do not modify these trees and do not move their elements to other + trees, or + +- do not terminate threads while the trees they parsed are still in + use (e.g. by using a fixed size thread-pool or long-running threads + in processing chains) Does my program run faster if I use threads? From scoder at codespeak.net Sat Jan 24 14:01:42 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 24 Jan 2009 14:01:42 +0100 (CET) Subject: [Lxml-checkins] r61301 - in lxml/trunk: . doc Message-ID: <20090124130142.5157E168511@codespeak.net> Author: scoder Date: Sat Jan 24 14:01:41 2009 New Revision: 61301 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt lxml/trunk/doc/build.txt Log: r4962 at delle: sbehnel | 2009-01-24 13:47:34 +0100 updated install docs for MacOS and Windows Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Jan 24 14:01:41 2009 @@ -11,6 +11,8 @@ .. _compatibility: compatibility.html .. _ElementTree: http://effbot.org/zone/element-index.htm +.. _`build instructions`: build.html +.. _`MacOS-X` : build.html#building-lxml-on-macos-x .. contents:: .. @@ -346,30 +348,28 @@ .. _`release notes of libxslt`: http://xmlsoft.org/XSLT/news.html -Where are the Windows binaries? -------------------------------- - -Short answer: If you want to contribute a binary build, we are happy to put it -up on the Cheeseshop. - -Long answer: Two of the bigger problems with the Windows system are the lack -of a pre-installed standard compiler and the missing package management. Both -make it non-trivial to build lxml on this platform. We are trying hard to -make lxml as platform-independent as possible and it is regularly tested on -Windows systems. However, we currently cannot provide Windows binary -distributions ourselves. - -From time to time, users of different environments kindly contribute binary -builds of lxml, most frequently for Windows or Mac-OS X. We put these on the -Cheeseshop to make it as easy as possible for others to use lxml on their -platform. - -If there is not currently a binary distribution of the most recent lxml -release for your platform available from the Cheeseshop, please look through -the older versions to see if they provide a binary build. This is done by -appending the version number to the cheeseshop URL, e.g.: +Where are the binary builds? +---------------------------- - http://cheeseshop.python.org/pypi/lxml/1.1.2 +Sidnei da Silva regularly contributes Windows binaries for new +releases. This is because two of the major problems of Microsoft +Windows make it non-trivial for users to build lxml on this platform: +the lack of a pre-installed standard compiler and the missing package +management. + +If there is not currently a binary distribution of the most recent +lxml release for this platform available from the Python Package Index +(PyPI), please look through the older versions to see if they provide +a binary build. This is done by appending the version number to the +PyPI URL, e.g.:: + + http://pypi.python.org/pypi/lxml/2.1.5 + +Apart from that, we generally do not provide binary builds of lxml, as +most of the other operating systems out there can build lxml without +problems (with the exception of `MacOS-X`_), and the sheer mass of +variations between platforms makes it futile to provide builds for +everyone. Why do I get errors about missing UCS4 symbols when installing lxml? @@ -386,8 +386,6 @@ compilable on both platform types. See the `build instructions`_ on how to do this. -.. _`build instructions`: build.html - Contributing ============ @@ -499,30 +497,10 @@ My application crashes on MacOS-X! ---------------------------------- -Since the normal system libraries are pretty much outdated, you likely -have installed newer versions through a package management system like -fink or macports in addition to the system libraries. Chances are -high that your system is confused by the conflicting library versions. - -To work around this, please set the ``DYLD_LIBRARY_PATH`` environment -variable *at runtime* to the directory where you installed the newer -libraries. There are other Python packages that depend on libxml2, so -it is up to you to make sure that *all* packages that dynamically load -libxml2 load the *same* library version. Loading conflicting versions -*will* lead to a crash and has confused a lot of MacOS users already. - -Please understand that if your system uses conflicting library -versions, there is nothing lxml can do about it. It is up to you as a -user to make sure you have a sane execution environment. - -See `bug 197243`_ for more information. - -.. _`bug 197243`: https://bugs.launchpad.net/lxml/+bug/197243 - -If you want a sane, reliable execution environment, especially for -production systems, `using a buildout`_ might be a good idea. - -.. _`using a buildout`: http://comments.gmane.org/gmane.comp.python.lxml.devel/3297?set_lines=100000 +This was a common problem up to lxml 2.1.x. Since lxml 2.2, the only +officially supported way to use it on this platform is through a +static build against freshly downloaded versions of libxml2 and +libxslt. See the build instructions for `MacOS-X`_. I think I have found a bug in lxml. What should I do? Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Sat Jan 24 14:01:41 2009 @@ -171,55 +171,46 @@ lxml maintainer. -Providing newer library versions on Mac-OS X --------------------------------------------- +Building lxml on MacOS-X +------------------------ Apple regularly ships new system releases with horribly outdated system libraries. This is specifically the case for libxml2 and libxslt, where the system provided versions are too old to build lxml. -While the Unix environment in Mac-OS X makes it relatively easy to +While the Unix environment in MacOS-X makes it relatively easy to install Unix/Linux style package management tools and new software, it actually seems to be hard to get libraries set up for exclusive usage -that Mac-OS X ships in an older version. Alternative distributions +that MacOS-X ships in an older version. Alternative distributions (like macports) install their libraries in addition to the system -libraries, but the compiler and the runtime loader on Mac-OS still -sees the system libraries before the new libraries. This can lead to +libraries, but the compiler and the runtime loader on MacOS still sees +the system libraries before the new libraries. This can lead to undebuggable crashes where the newer library seems to be loaded but the older system library is used. Apple discourages static building against libraries, which would help working around this problem. Apple does not ship static library binaries with its system and several package management systems follow -this decision. Therefore, building static binaries would require -building the dependencies first. You can do this with the `buildout -recipe for lxml`_. - -To make sure the newer libxml2 and libxslt versions (e.g. those -provided by fink or macports) are used at *build time*, you must take -care that the script ``xslt-config`` from the newly installed version -is found when running the build setup. The system libraries also -provide this script, so the new one must come first in the PATH. The -best way to make sure the right version is used is by passing the path -to the script as an option to setup.py:: - - python setup.py build --with-xslt-config=/path/to/xslt-config \ - --with-xml2-config=/path/to/xml2-config +this decision. Therefore, building static binaries requires building +the dependencies first. The ``setup.py`` script does this +automatically when you call it like this:: + + python setup.py build --static-deps + +This will download and build the latest versions of libxml2 and +libxslt from the official FTP download site. If you want to use +specific versions, or want to prevent any online access, you can +download both ``tar.gz`` release files yourself, place them into a +subdirectory ``libs`` in the lxml distribution, and call ``setup.py`` +with the desired target versions like this:: + + python setup.py build --static-deps \ + --libxml2-version=2.7.3 \ + --libxslt-version=1.1.24 \ Instead of ``build``, you can use any target, like ``bdist_egg`` if you want to use setuptools to build an installable egg. -Since release 2.0.6, lxml automatically passes the option -``-flat_namespace`` to the C compiler. This was reported to make sure -that the libraries that lxml was built against are also used at -runtime. Without this option, users needed to add all directories -where the newer libraries are installed (i.e. libxml2, libxslt and -libexslt) to the ``DYLD_LIBRARY_PATH`` environment variable when using -lxml (i.e. at runtime). This should no longer be necessary with the -new build setup. - -.. _`buildout recipe for lxml`: http://thread.gmane.org/gmane.comp.python.lxml.devel/3290/focus=3297 - Static linking on Windows ------------------------- From scoder at codespeak.net Sat Jan 24 14:01:47 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 24 Jan 2009 14:01:47 +0100 (CET) Subject: [Lxml-checkins] r61302 - in lxml/trunk: . doc Message-ID: <20090124130147.E7DDF168515@codespeak.net> Author: scoder Date: Sat Jan 24 14:01:46 2009 New Revision: 61302 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt lxml/trunk/doc/build.txt Log: r4963 at delle: sbehnel | 2009-01-24 13:58:56 +0100 dox updates Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Sat Jan 24 14:01:46 2009 @@ -414,7 +414,7 @@ Please contact the `mailing list`_ if you need any help. .. _Cython: http://www.cython.org/ -.. _`code that Cython accepts`: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/overview.html +.. _`code that Cython accepts`: http://docs.cython.org/docs/tutorial.html How can I contribute? @@ -428,9 +428,11 @@ and the ReST_ `text files`_ in the ``doc`` directory. We also have a `list of missing features`_ that we would like to -implement but didn't due to lack if time. If you find the time, +implement but didn't due to lack if time. If *you* find the time, patches are very welcome. +.. _ReST: http://docutils.sourceforge.net/rst.html +.. _`text files`: http://codespeak.net/svn/lxml/trunk/doc/ .. _`list of missing features`: http://codespeak.net/svn/lxml/trunk/IDEAS.txt Besides enhancing the code, there are a lot of places where you can help the @@ -465,9 +467,6 @@ you can try to write up a better description and send it to the `mailing list`_. -.. _ReST: http://docutils.sourceforge.net/rst.html -.. _`text files`: http://codespeak.net/svn/lxml/trunk/doc/ - Bugs ==== Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Sat Jan 24 14:01:46 2009 @@ -1,11 +1,12 @@ How to build lxml from source ============================= -To build lxml from source, you need libxml2 and libxslt properly installed, -*including the header files*. These are likely shipped in separate ``-dev`` -or ``-devel`` packages like ``libxml2-dev``, which you need to install. The -build process also requires setuptools_. The lxml source distribution comes -with a script called ``ez_setup.py`` that can be used to install them. +To build lxml from source, you need libxml2 and libxslt properly +installed, *including the header files*. These are likely shipped in +separate ``-dev`` or ``-devel`` packages like ``libxml2-dev``, which +you must install before trying to build lxml. The build process also +requires setuptools_. The lxml source distribution comes with a +script called ``ez_setup.py`` that can be used to install them. .. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools @@ -149,10 +150,11 @@ .. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev -Contributing an egg -------------------- +Building an egg +--------------- -This is the procedure to make an lxml egg for your platform: +This is the procedure to make an lxml egg for your platform (assuming +that you have setuptools_ installed): * Download the lxml-x.y.tar.gz release. This contains the pregenerated C so that you can be sure you build exactly from the release sources. Unpack @@ -164,11 +166,9 @@ any ``.so`` file you find there. This reduces the size of the egg considerably. -* ``python setup.py bdist_egg upload`` +* ``python setup.py bdist_egg`` -The last 'upload' step only works if you have access to the lxml cheeseshop -entry. If not, you can just make an egg with ``bdist_egg`` and mail it to the -lxml maintainer. +This will put the egg into the ``dist`` directory. Building lxml on MacOS-X From scoder at codespeak.net Sun Jan 25 21:13:15 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 25 Jan 2009 21:13:15 +0100 (CET) Subject: [Lxml-checkins] r61333 - in lxml/trunk: . doc Message-ID: <20090125201315.398E8168570@codespeak.net> Author: scoder Date: Sun Jan 25 21:13:13 2009 New Revision: 61333 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt lxml/trunk/version.txt Log: r4967 at delle: sbehnel | 2009-01-25 21:11:25 +0100 prepare release of 2.2beta2 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Sun Jan 25 21:13:13 2009 @@ -2,16 +2,34 @@ lxml changelog ============== -Under Development -================= +2.2beta2 (2009-01-25) +===================== Bugs fixed ---------- +* Potential memory leak on exception handling. This was due to a + problem in Cython, not lxml itself. + * ``iter_links`` (and related link-rewriting functions) in ``lxml.html`` would interpret CSS like ``url("link")`` incorrectly (treating the quotation marks as part of the link). +* Failing import on systems that have an ``io`` module. + + +2.1.5 (2009-01-06) +================== + +Bugs fixed +---------- + +* Potential memory leak on exception handling. This was due to a + problem in Cython, not lxml itself. + +* Failing import on systems that have an ``io`` module. + + 2.2beta1 (2008-12-12) ===================== Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Sun Jan 25 21:13:13 2009 @@ -147,8 +147,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.2beta1`_, released 2008-12-12 -(`changes for 2.2beta1`_). `Older versions`_ are listed below. +The latest version is `lxml 2.2beta2`_, released 2009-01-25 +(`changes for 2.2beta2`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -220,7 +220,13 @@ `2.0 `_ and the `current in-development version `_. -.. _`PDF documentation`: lxmldoc-2.2beta1.pdf +.. _`PDF documentation`: lxmldoc-2.2beta2.pdf + +* `lxml 2.2beta1`_, released 2008-12-12 (`changes for 2.2beta1`_) + +* `lxml 2.2alpha1`_, released 2008-11-23 (`changes for 2.2alpha1`_) + +* `lxml 2.1.5`_, released 2009-01-06 (`changes for 2.1.5`_) * `lxml 2.1.4`_, released 2008-12-12 (`changes for 2.1.4`_) @@ -306,8 +312,10 @@ * `lxml 0.5`_, released 2005-04-08 -.. _`lxml 2.2alpha1`: lxml-2.2alpha1.tgz +.. _`lxml 2.2beta2`: lxml-2.2beta2.tgz .. _`lxml 2.2beta1`: lxml-2.2beta1.tgz +.. _`lxml 2.2alpha1`: lxml-2.2alpha1.tgz +.. _`lxml 2.1.5`: lxml-2.1.5.tgz .. _`lxml 2.1.4`: lxml-2.1.4.tgz .. _`lxml 2.1.3`: lxml-2.1.3.tgz .. _`lxml 2.1.2`: lxml-2.1.2.tgz @@ -350,8 +358,10 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.2beta2`: changes-2.2beta2.html .. _`changes for 2.2beta1`: changes-2.2beta1.html .. _`changes for 2.2alpha1`: changes-2.2alpha1.html +.. _`changes for 2.1.5`: changes-2.1.5.html .. _`changes for 2.1.4`: changes-2.1.4.html .. _`changes for 2.1.3`: changes-2.1.3.html .. _`changes for 2.1.2`: changes-2.1.2.html Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Sun Jan 25 21:13:13 2009 @@ -1 +1 @@ -2.2beta1 +2.2beta2 From scoder at codespeak.net Sun Jan 25 22:25:08 2009 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sun, 25 Jan 2009 22:25:08 +0100 (CET) Subject: [Lxml-checkins] r61334 - in lxml/trunk: . src/lxml/html src/lxml/html/tests/hackers-org-data Message-ID: <20090125212508.3B209168533@codespeak.net> Author: scoder Date: Sun Jan 25 22:25:05 2009 New Revision: 61334 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/__init__.py lxml/trunk/src/lxml/html/tests/hackers-org-data/style-url-js.data Log: r4969 at delle: sbehnel | 2009-01-25 22:23:19 +0100 fix CSS URL parsing Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Sun Jan 25 22:25:05 2009 @@ -67,12 +67,18 @@ _class_xpath = etree.XPath("descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), concat(' ', $class_name, ' '))]") _id_xpath = etree.XPath("descendant-or-self::*[@id=$id]") _collect_string_content = etree.XPath("string()") -_css_url_re = re.compile(r'url\([QUOTE"]?(.*?)[QUOTE"]?\)'.replace('QUOTE', "'"), re.I) +_css_url_re = re.compile(r'url\(('+'["][^"]*["]|'+"['][^']*[']|"+r'[^)]*)\)', re.I) _css_import_re = re.compile(r'@import "(.*?)"') _label_xpath = etree.XPath("//label[@for=$id]|//x:label[@for=$id]", namespaces={'x':XHTML_NAMESPACE}) _archive_re = re.compile(r'[^ ]+') +def _unquote_match(s, pos): + if s[:1] == '"' and s[-1:] == '"' or s[:1] == "'" and s[-1:] == "'": + return s[1:-1], pos+1 + else: + return s,pos + def _transform_result(typ, result): """Convert the result back into the input type. """ @@ -342,12 +348,14 @@ yield (el, 'value', el.get('value'), 0) if tag == 'style' and el.text: for match in _css_url_re.finditer(el.text): - yield (el, None, match.group(1), match.start(1)) + url, start = _unquote_match(match.group(1), match.start(1)) + yield (el, None, url, start) for match in _css_import_re.finditer(el.text): yield (el, None, match.group(1), match.start(1)) if 'style' in attribs: for match in _css_url_re.finditer(attribs['style']): - yield (el, 'style', match.group(1), match.start(1)) + url, start = _unquote_match(match.group(1), match.start(1)) + yield (el, 'style', url, start) def rewrite_links(self, link_repl_func, resolve_base_href=True, base_href=None): Modified: lxml/trunk/src/lxml/html/tests/hackers-org-data/style-url-js.data ============================================================================== --- lxml/trunk/src/lxml/html/tests/hackers-org-data/style-url-js.data (original) +++ lxml/trunk/src/lxml/html/tests/hackers-org-data/style-url-js.data Sun Jan 25 22:25:05 2009 @@ -4,5 +4,5 @@
---------- -
+
From lxml-checkins at codespeak.net Mon Jan 26 00:47:02 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 26 Jan 2009 00:47:02 +0100 (CET) Subject: [Lxml-checkins] Solve your embarassing small manhood Message-ID: <20090125234702.96DAC1684D7@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090126/a6aecf0a/attachment-0001.htm From lxml-checkins at codespeak.net Mon Jan 26 13:16:05 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 26 Jan 2009 13:16:05 +0100 (CET) Subject: [Lxml-checkins] She loves every inch of me Message-ID: <20090126121605.285B1168520@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090126/53f48c86/attachment.htm From lxml-checkins at codespeak.net Mon Jan 26 18:54:59 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Mon, 26 Jan 2009 18:54:59 +0100 (CET) Subject: [Lxml-checkins] Every girl will beg you for more Message-ID: <20090126175459.1DE64169DF6@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090126/76d087be/attachment.htm From lxml-checkins at codespeak.net Tue Jan 27 07:34:04 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 27 Jan 2009 07:34:04 +0100 (CET) Subject: [Lxml-checkins] Watch Giselle take it off Message-ID: <20090127063404.A92A1168442@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090127/825d45dc/attachment.htm From lxml-checkins at codespeak.net Tue Jan 27 11:54:47 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 27 Jan 2009 11:54:47 +0100 (CET) Subject: [Lxml-checkins] Don't embarrass yourself in the locker room again Message-ID: <20090127105447.77AD21684BA@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090127/60a1bc65/attachment.htm From lxml-checkins at codespeak.net Tue Jan 27 17:18:30 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Tue, 27 Jan 2009 17:18:30 +0100 (CET) Subject: [Lxml-checkins] Canadian Pharmacy Message 99750 Message-ID: <20090127161830.D5A69169E2C@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090127/79c6b214/attachment.htm From lxml-checkins at codespeak.net Thu Jan 29 04:11:31 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Thu, 29 Jan 2009 04:11:31 +0100 (CET) Subject: [Lxml-checkins] Dear lxml-checkins@codespeak.net Thu, 29 Jan 2009 11:09:49 +0800 80% 0FF Message-ID: <20090129190949.3785.qmail@amerblind.outbound.ed10.com.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090129/cf202a0a/attachment.htm From lxml-checkins at codespeak.net Fri Jan 30 13:37:29 2009 From: lxml-checkins at codespeak.net (lxml-checkins at codespeak.net) Date: Fri, 30 Jan 2009 13:37:29 +0100 (CET) Subject: [Lxml-checkins] Canadian Pharmacy Message 02129 Message-ID: <20090130123729.E2FFB1684E1@codespeak.net> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090130/7a374945/attachment.htm From lxml-checkins at codespeak.net Sat Jan 31 14:20:21 2009 From: lxml-checkins at codespeak.net (Best Price 2009) Date: Sat, 31 Jan 2009 14:20:21 +0100 (CET) Subject: [Lxml-checkins] Dear lxml-checkins@codespeak.net Mon, 31 Jan 2005 09:13:00 +0800 81% 0FF Message-ID: <20050131171300.3094.qmail@amerblind.outbound.ed10.com.com> An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-checkins/attachments/20090131/96dc5311/attachment.htm