From scoder at codespeak.net Thu May 1 11:06:21 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 11:06:21 +0200 (CEST) Subject: [Lxml-checkins] r54297 - in lxml/branch/lxml-2.0: . doc Message-ID: <20080501090621.EF4A32A01B2@codespeak.net> Author: scoder Date: Thu May 1 11:06:19 2008 New Revision: 54297 Modified: lxml/branch/lxml-2.0/CHANGES.txt lxml/branch/lxml-2.0/doc/main.txt lxml/branch/lxml-2.0/version.txt Log: prepare release of 2.0.5 Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Thu May 1 11:06:19 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -Under development -================= +2.0.4 (2008-05-01) +================== Features added -------------- Modified: lxml/branch/lxml-2.0/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/main.txt (original) +++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 11:06:19 2008 @@ -145,8 +145,8 @@ .. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ .. _`this key`: pubkey.asc -The latest version is `lxml 2.0.4`_, released 2008-04-13 -(`changes for 2.0.4`_). `Older versions`_ are listed below. +The latest version is `lxml 2.0.5`_, released 2008-04-13 +(`changes for 2.0.5`_). `Older versions`_ are listed below. .. _`Older versions`: #old-versions @@ -206,6 +206,8 @@ Old Versions ------------ +* `lxml 2.0.4`_, released 2008-04-13 (`changes for 2.0.4`_) + * `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_) * `lxml 2.0.2`_, released 2008-02-22 (`changes for 2.0.2`_) @@ -264,6 +266,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.0.5`: lxml-2.0.5.tgz .. _`lxml 2.0.4`: lxml-2.0.4.tgz .. _`lxml 2.0.3`: lxml-2.0.3.tgz .. _`lxml 2.0.2`: lxml-2.0.2.tgz @@ -294,6 +297,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.0.5`: changes-2.0.5.html .. _`changes for 2.0.4`: changes-2.0.4.html .. _`changes for 2.0.3`: changes-2.0.3.html .. _`changes for 2.0.2`: changes-2.0.2.html Modified: lxml/branch/lxml-2.0/version.txt ============================================================================== --- lxml/branch/lxml-2.0/version.txt (original) +++ lxml/branch/lxml-2.0/version.txt Thu May 1 11:06:19 2008 @@ -1 +1 @@ -2.0.4 +2.0.5 From scoder at codespeak.net Thu May 1 11:16:48 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 11:16:48 +0200 (CEST) Subject: [Lxml-checkins] r54298 - lxml/branch/lxml-2.0/doc Message-ID: <20080501091648.643A5169E35@codespeak.net> Author: scoder Date: Thu May 1 11:16:48 2008 New Revision: 54298 Added: lxml/branch/lxml-2.0/doc/mklatex.py - copied unchanged from r54297, lxml/trunk/doc/mklatex.py Log: support PDF generation in 2.0.x From scoder at codespeak.net Thu May 1 11:17:09 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 11:17:09 +0200 (CEST) Subject: [Lxml-checkins] r54299 - lxml/branch/lxml-2.0/doc Message-ID: <20080501091709.4B2D3169E37@codespeak.net> Author: scoder Date: Thu May 1 11:17:08 2008 New Revision: 54299 Added: lxml/branch/lxml-2.0/doc/rest2latex.py - copied unchanged from r54298, lxml/trunk/doc/rest2latex.py Log: support PDF generation in 2.0.x From scoder at codespeak.net Thu May 1 11:32:12 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 11:32:12 +0200 (CEST) Subject: [Lxml-checkins] r54300 - lxml/branch/lxml-2.0/doc/html Message-ID: <20080501093212.9F1091684E3@codespeak.net> Author: scoder Date: Thu May 1 11:32:12 2008 New Revision: 54300 Added: lxml/branch/lxml-2.0/doc/html/tagpython-big.png - copied unchanged from r54299, lxml/trunk/doc/html/tagpython-big.png Log: support PDF generation in 2.0.x From scoder at codespeak.net Thu May 1 12:01:36 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 12:01:36 +0200 (CEST) Subject: [Lxml-checkins] r54302 - in lxml/branch/lxml-2.0: . doc Message-ID: <20080501100136.EF3822A01B2@codespeak.net> Author: scoder Date: Thu May 1 12:01:35 2008 New Revision: 54302 Added: lxml/branch/lxml-2.0/doc/docstructure.py - copied unchanged from r53866, lxml/trunk/doc/docstructure.py Modified: lxml/branch/lxml-2.0/Makefile lxml/branch/lxml-2.0/doc/main.txt lxml/branch/lxml-2.0/doc/mkhtml.py Log: PDF doc fixes from trunk Modified: lxml/branch/lxml-2.0/Makefile ============================================================================== --- lxml/branch/lxml-2.0/Makefile (original) +++ lxml/branch/lxml-2.0/Makefile Thu May 1 12:01:35 2008 @@ -2,6 +2,7 @@ TESTFLAGS=-p -v TESTOPTS= SETUPFLAGS= +LXMLVERSION=`cat version.txt` all: inplace @@ -40,17 +41,40 @@ ftest_inplace: inplace $(PYTHON) test.py -f $(TESTFLAGS) $(TESTOPTS) -html: inplace - mkdir -p doc/html - PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . `cat version.txt` +apihtml: inplace rm -fr doc/html/api @[ -x "`which epydoc`" ] \ && (cd src && echo "Generating API docs ..." && \ PYTHONPATH=. epydoc -v --docformat "restructuredtext en" \ -o ../doc/html/api --no-private --exclude='[.]html[.]tests|[.]_' \ - --name lxml --url http://codespeak.net/lxml/ lxml/) \ + --exclude-introspect='[.]usedoctest' \ + --name "lxml API" --url http://codespeak.net/lxml/ lxml/) \ || (echo "not generating epydoc API documentation") +html: inplace apihtml + PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . ${LXMLVERSION} + +apipdf: inplace + rm -fr doc/pdf + mkdir -p doc/pdf + @[ -x "`which epydoc`" ] \ + && (cd src && echo "Generating API docs ..." && \ + PYTHONPATH=. epydoc -v --latex --docformat "restructuredtext en" \ + -o ../doc/pdf --no-private --exclude='([.]html)?[.]tests|[.]_' \ + --exclude-introspect='html[.]clean|[.]usedoctest' \ + --name "lxml API" --url http://codespeak.net/lxml/ lxml/) \ + || (echo "not generating epydoc API documentation") + +pdf: apipdf + $(PYTHON) doc/mklatex.py doc/pdf . ${LXMLVERSION} + (cd doc/pdf && pdflatex lxmldoc.tex \ + && pdflatex lxmldoc.tex \ + && pdflatex lxmldoc.tex) + @pdfopt doc/pdf/lxmldoc.pdf doc/pdf/lxmldoc-${LXMLVERSION}.pdf + @echo "PDF available as doc/pdf/lxmldoc-${LXMLVERSION}.pdf" + +# Two pdflatex runs are needed to build the correct Table of contents. + test: test_inplace valtest: valgrind_test_inplace @@ -65,7 +89,12 @@ find . \( -name '*.o' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \; rm -rf build -realclean: clean +docclean: + rm -f doc/html/*.html + rm -fr doc/html/api + rm -fr doc/pdf + +realclean: clean docclean find . -name '*.c' -exec rm -f {} \; rm -f TAGS $(PYTHON) setup.py clean -a Modified: lxml/branch/lxml-2.0/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/main.txt (original) +++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 12:01:35 2008 @@ -47,6 +47,10 @@ Documentation ------------- +The complete lxml documentation is available for download as `PDF +documentation`_. The HTML documentation from this web site is part of +the normal `source download <#download>`_. + * ElementTree: * `ElementTree API`_ @@ -140,32 +144,37 @@ The source distribution is signed with `this key`_. Binary builds for MS Windows usually become available through PyPI a few days after a source release. If you can't wait, consider trying a less recent -version first. - -.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ -.. _`this key`: pubkey.asc +release version first. The latest version is `lxml 2.0.5`_, released 2008-04-13 (`changes for 2.0.5`_). `Older versions`_ are listed below. -.. _`Older versions`: #old-versions - Please take a look at the `installation instructions`_! -.. _`installation instructions`: installation.html +This complete web site (including the generated API documentation) is +part of the source distribution, so if you want to download the +documentation for offline use, take the source archive and copy the +``doc/html`` directory out of the source tree. It's also possible to check out the latest development version of lxml from svn directly, using a command like this:: svn co http://codespeak.net/svn/lxml/trunk lxml -You can also `browse it through the web`_. Please read `how to build lxml -from source`_ first. The `latest CHANGES`_ of the developer version are also -accessible. You can check there if a bug you found has been fixed or a -feature you want has been implemented in the latest trunk version. +You can also browse the `Subversion repository`_ through the web, or +take a look at the `Subversion history`_. Please read `how to build lxml +from source`_ first. The `latest CHANGES`_ of the developer version +are also accessible. You can check there if a bug you found has been +fixed or a feature you want has been implemented in the latest trunk +version. -.. _`how to build lxml from source`: build.html -.. _`browse it through the web`: http://codespeak.net/svn/lxml +.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/ +.. _`this key`: pubkey.asc +.. _`Older versions`: #old-versions +.. _`installation instructions`: installation.html + .. _`how to build lxml from source`: build.html +.. _`Subversion repository`: http://codespeak.net/svn/lxml/ +.. _`Subversion history`: https://codespeak.net/viewvc/lxml/ .. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt @@ -178,7 +187,7 @@ .. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev .. _Gmane: http://blog.gmane.org/gmane.comp.python.lxml.devel -.. _Google: http://www.google.com/webhp?q=site:codespeak.net/mailman/listinfo/lxml-dev%20 +.. _Google: http://www.google.com/webhp?q=site:codespeak.net%2Fmailman%2Flistinfo%2Flxml-dev+ Bug tracker @@ -189,7 +198,7 @@ unexpected behaviour of lxml is a bug or not, please ask on the `mailing list`_ first. Do not forget to search the archive (e.g. with Gmane_)! -.. _`launchpad bug tracker`: https://launchpad.net/lxml +.. _`launchpad bug tracker`: https://launchpad.net/lxml/ License Modified: lxml/branch/lxml-2.0/doc/mkhtml.py ============================================================================== --- lxml/branch/lxml-2.0/doc/mkhtml.py (original) +++ lxml/branch/lxml-2.0/doc/mkhtml.py Thu May 1 12:01:35 2008 @@ -1,21 +1,8 @@ +from docstructure import SITE_STRUCTURE, HREF_MAP, BASENAME_MAP from lxml.etree import (parse, fromstring, ElementTree, Element, SubElement, XPath) import os, shutil, re, sys, copy, time -SITE_STRUCTURE = [ - ('lxml', ('main.txt', 'intro.txt', '../INSTALL.txt', 'lxml2.txt', - 'performance.txt', 'compatibility.txt', 'FAQ.txt')), - ('Developing with lxml', ('tutorial.txt', '@API reference', - 'api.txt', 'parsing.txt', - 'validation.txt', 'xpathxslt.txt', - 'objectify.txt', 'lxmlhtml.txt', - 'cssselect.txt', 'elementsoup.txt')), - ('Extending lxml', ('resolvers.txt', 'extensions.txt', - 'element_classes.txt', 'sax.txt', 'capi.txt')), - ('Developing lxml', ('build.txt', 'lxml-source-howto.txt', - '@Release Changelog')), - ] - RST2HTML_OPTIONS = " ".join([ "--no-toc-backlinks", "--strip-comments", @@ -23,15 +10,6 @@ "--date", ]) -HREF_MAP = { - "API reference" : "api/index.html" -} - -BASENAME_MAP = { - 'main' : 'index', - 'INSTALL' : 'installation', -} - htmlnsmap = {"h" : "http://www.w3.org/1999/xhtml"} find_title = XPath("/h:html/h:head/h:title/text()", namespaces=htmlnsmap) From scoder at codespeak.net Thu May 1 12:04:49 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 12:04:49 +0200 (CEST) Subject: [Lxml-checkins] r54303 - in lxml/branch/lxml-2.0: . doc Message-ID: <20080501100449.BD3582A01B2@codespeak.net> Author: scoder Date: Thu May 1 12:04:49 2008 New Revision: 54303 Modified: lxml/branch/lxml-2.0/MANIFEST.in lxml/branch/lxml-2.0/doc/main.txt Log: doc fixes Modified: lxml/branch/lxml-2.0/MANIFEST.in ============================================================================== --- lxml/branch/lxml-2.0/MANIFEST.in (original) +++ lxml/branch/lxml-2.0/MANIFEST.in Thu May 1 12:04:49 2008 @@ -10,6 +10,6 @@ recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt recursive-include src/lxml/html/tests *.data *.txt recursive-include benchmark *.py -recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png +recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython*.png recursive-include fake_pyrex *.py include doc/mkhtml.py doc/rest2html.py Modified: lxml/branch/lxml-2.0/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/main.txt (original) +++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 12:04:49 2008 @@ -172,7 +172,7 @@ .. _`this key`: pubkey.asc .. _`Older versions`: #old-versions .. _`installation instructions`: installation.html - .. _`how to build lxml from source`: build.html +.. _`how to build lxml from source`: build.html .. _`Subversion repository`: http://codespeak.net/svn/lxml/ .. _`Subversion history`: https://codespeak.net/viewvc/lxml/ .. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt @@ -215,6 +215,8 @@ Old Versions ------------ +.. _`PDF documentation`: lxmldoc-2.0.5.pdf + * `lxml 2.0.4`_, released 2008-04-13 (`changes for 2.0.4`_) * `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_) From scoder at codespeak.net Thu May 1 12:05:19 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 12:05:19 +0200 (CEST) Subject: [Lxml-checkins] r54304 - lxml/trunk Message-ID: <20080501100519.91FC12A01B2@codespeak.net> Author: scoder Date: Thu May 1 12:05:19 2008 New Revision: 54304 Modified: lxml/trunk/ (props changed) lxml/trunk/MANIFEST.in Log: r4104 at delle: sbehnel | 2008-05-01 12:03:52 +0200 include tagpython-big.png in source distro Modified: lxml/trunk/MANIFEST.in ============================================================================== --- lxml/trunk/MANIFEST.in (original) +++ lxml/trunk/MANIFEST.in Thu May 1 12:05:19 2008 @@ -11,6 +11,6 @@ recursive-include src/lxml/html/tests *.data *.txt recursive-include samples *.xml recursive-include benchmark *.py -recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png +recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython*.png recursive-include fake_pyrex *.py include doc/mkhtml.py doc/rest2html.py From scoder at codespeak.net Thu May 1 12:10:12 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 12:10:12 +0200 (CEST) Subject: [Lxml-checkins] r54305 - lxml/branch/lxml-2.0 Message-ID: <20080501101012.BA74C2A01B2@codespeak.net> Author: scoder Date: Thu May 1 12:10:12 2008 New Revision: 54305 Modified: lxml/branch/lxml-2.0/CHANGES.txt Log: pre-release fix Modified: lxml/branch/lxml-2.0/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.0/CHANGES.txt (original) +++ lxml/branch/lxml-2.0/CHANGES.txt Thu May 1 12:10:12 2008 @@ -2,7 +2,7 @@ lxml changelog ============== -2.0.4 (2008-05-01) +2.0.5 (2008-05-01) ================== Features added From scoder at codespeak.net Thu May 1 12:28:12 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 1 May 2008 12:28:12 +0200 (CEST) Subject: [Lxml-checkins] r54306 - in lxml/trunk: . doc Message-ID: <20080501102812.9821E168563@codespeak.net> Author: scoder Date: Thu May 1 12:28:11 2008 New Revision: 54306 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/doc/main.txt Log: r4112 at delle: sbehnel | 2008-05-01 12:26:43 +0200 integrated release changes of 2.0.5 Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Thu May 1 12:28:11 2008 @@ -26,6 +26,26 @@ namespace (i.e. they would end up in the wrong namespace). +2.0.5 (2008-05-01) +================== + +Features added +-------------- + +Bugs fixed +---------- + +* Resolving to a filename in custom resolvers didn't work. + +* lxml did not honour libxslt's second error state "STOPPED", which + let some XSLT errors pass silently. + +* Memory leak in Schematron with libxml2 >= 2.6.31. + +Other changes +------------- + + 2.1beta1 (2008-04-15) ===================== Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Thu May 1 12:28:11 2008 @@ -219,6 +219,8 @@ * `lxml 2.1alpha1`_, released 2008-03-27 (`changes for 2.1alpha1`_) +* `lxml 2.0.5`_, released 2008-05-01 (`changes for 2.0.5`_) + * `lxml 2.0.4`_, released 2008-04-14 (`changes for 2.0.4`_) * `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_) @@ -281,6 +283,7 @@ .. _`lxml 2.1beta1`: lxml-2.1beta1.tgz .. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz +.. _`lxml 2.0.5`: lxml-2.0.5.tgz .. _`lxml 2.0.4`: lxml-2.0.4.tgz .. _`lxml 2.0.3`: lxml-2.0.3.tgz .. _`lxml 2.0.2`: lxml-2.0.2.tgz @@ -313,6 +316,7 @@ .. _`changes for 2.1beta1`: changes-2.1beta1.html .. _`changes for 2.1alpha1`: changes-2.1alpha1.html +.. _`changes for 2.0.5`: changes-2.0.5.html .. _`changes for 2.0.4`: changes-2.0.4.html .. _`changes for 2.0.3`: changes-2.0.3.html .. _`changes for 2.0.2`: changes-2.0.2.html From scoder at codespeak.net Fri May 2 09:49:51 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 09:49:51 +0200 (CEST) Subject: [Lxml-checkins] r54311 - in lxml/trunk: . src/lxml/html src/lxml/html/tests Message-ID: <20080502074951.5688A498100@codespeak.net> Author: scoder Date: Fri May 2 09:49:49 2008 New Revision: 54311 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/html/__init__.py lxml/trunk/src/lxml/html/tests/test_basic.txt Log: r4115 at delle: sbehnel | 2008-05-02 09:48:17 +0200 'parser' keyword in lxml.html parse functions, XHTMLParser class Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri May 2 09:49:49 2008 @@ -8,6 +8,12 @@ Features added -------------- +* All parse functions in lxml.html take a ``parser`` keyword argument. + +* lxml.html has a new parser class ``XHTMLParser`` and a module + attribute ``xhtml_parser`` that provide XML parsers that are + pre-configured for the lxml.html package. + Bugs fixed ---------- Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Fri May 2 09:49:49 2008 @@ -443,14 +443,17 @@ # parsing ################################################################################ -def document_fromstring(html, **kw): - value = etree.HTML(html, html_parser, **kw) +def document_fromstring(html, parser=None, **kw): + if parser is None: + parser = html_parser + value = etree.fromstring(html, parser, **kw) if value is None: raise etree.ParserError( "Document is empty") return value -def fragments_fromstring(html, no_leading_text=False, base_url=None, **kw): +def fragments_fromstring(html, no_leading_text=False, base_url=None, + parser=None, **kw): """ Parses several HTML elements, returning a list of elements. @@ -461,11 +464,13 @@ base_url will set the document's base_url attribute (and the tree's docinfo.URL) """ + if parser is None: + parser = html_parser # FIXME: check what happens when you give html with a body, head, etc. start = html[:20].lstrip().lower() if not start.startswith('%s' % ( - create_parent, html, create_parent), base_url=base_url, **kw) - elements = fragments_fromstring(html, no_leading_text=True, base_url=base_url, **kw) + create_parent, html, create_parent), + parser=parser, base_url=base_url, **kw) + elements = fragments_fromstring(html, parser=parser, no_leading_text=True, + base_url=base_url, **kw) if not elements: raise etree.ParserError( "No elements found") @@ -512,7 +522,7 @@ el.tail = None return el -def fromstring(html, base_url=None, **kw): +def fromstring(html, base_url=None, parser=None, **kw): """ Parse the html, returning a single element/document. @@ -521,12 +531,14 @@ base_url will set the document's base_url attribute (and the tree's docinfo.URL) """ + if parser is None: + parser = html_parser start = html[:10].lstrip().lower() if start.startswith('footer + +lxml.html has two parsers, one for HTML, one for XHTML: + + >>> from lxml.html import HTMLParser, XHTMLParser + >>> html = "

Hi!

" + + >>> root = document_fromstring(html, parser=HTMLParser()) + >>> print root.tag + html + + >>> root = document_fromstring(html, parser=XHTMLParser()) + >>> print root.tag + html From scoder at codespeak.net Fri May 2 10:13:48 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 10:13:48 +0200 (CEST) Subject: [Lxml-checkins] r54313 - in lxml/trunk: . doc Message-ID: <20080502081348.C58B616855E@codespeak.net> Author: scoder Date: Fri May 2 10:13:48 2008 New Revision: 54313 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/build.txt Log: r4117 at delle: sbehnel | 2008-05-02 10:00:01 +0200 require Cython 0.9.6.14 for lxml 2.1 Modified: lxml/trunk/doc/build.txt ============================================================================== --- lxml/trunk/doc/build.txt (original) +++ lxml/trunk/doc/build.txt Fri May 2 10:13:48 2008 @@ -44,10 +44,10 @@ want to be an lxml developer, then you do need a working Cython installation. You can use EasyInstall_ to install it:: - easy_install Cython==0.9.6.12 + easy_install Cython==0.9.6.14 -lxml currently requires Cython 0.9.6.12. Any 0.9.6.13 version will not -work, later versions were not tested. +lxml currently requires Cython 0.9.6.14, later versions were not +tested. Subversion From scoder at codespeak.net Fri May 2 10:28:05 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 10:28:05 +0200 (CEST) Subject: [Lxml-checkins] r54315 - lxml/branch/lxml-2.0/doc Message-ID: <20080502082805.10C124980FC@codespeak.net> Author: scoder Date: Fri May 2 10:28:04 2008 New Revision: 54315 Modified: lxml/branch/lxml-2.0/doc/main.txt Log: doc fix Modified: lxml/branch/lxml-2.0/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.0/doc/main.txt (original) +++ lxml/branch/lxml-2.0/doc/main.txt Fri May 2 10:28:04 2008 @@ -146,7 +146,7 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.0.5`_, released 2008-04-13 +The latest version is `lxml 2.0.5`_, released 2008-05-01 (`changes for 2.0.5`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! From scoder at codespeak.net Fri May 2 19:14:58 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 19:14:58 +0200 (CEST) Subject: [Lxml-checkins] r54340 - in lxml/trunk: . src/lxml Message-ID: <20080502171458.378E82A00DB@codespeak.net> Author: scoder Date: Fri May 2 19:14:57 2008 New Revision: 54340 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/lxml.etree.pyx lxml/trunk/src/lxml/proxy.pxi lxml/trunk/src/lxml/tree.pxd Log: r4119 at delle: sbehnel | 2008-05-02 18:10:34 +0200 re-assign all node name pointers from the target dictionary when moving an element to a new tree of a different thread Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri May 2 19:14:57 2008 @@ -716,7 +716,7 @@ _moveTail(c_next, c_node) if not attemptDeallocation(c_node): # make namespaces absolute - moveNodeToDocument(doc, c_node) + moveNodeToDocument(doc, c_node.doc, c_node) return 0 cdef void _moveTail(xmlNode* c_tail, xmlNode* c_target): @@ -782,6 +782,7 @@ """ cdef xmlNode* c_orig_neighbour cdef xmlNode* c_next + cdef xmlDoc* c_source_doc cdef _Element element cdef Py_ssize_t seqlength, i, c cdef _node_to_node_function next_element @@ -864,12 +865,13 @@ for element in elements: assert element is not None, "Node must not be None" # move element and tail over + c_source_doc = element._c_node.doc c_next = element._c_node.next tree.xmlAddPrevSibling(c_node, element._c_node) _moveTail(c_next, element._c_node) # integrate element into new document - moveNodeToDocument(parent._doc, element._c_node) + moveNodeToDocument(parent._doc, c_source_doc, element._c_node) # stop at the end of the slice if slicelength > 0: @@ -899,7 +901,9 @@ """ cdef xmlNode* c_next cdef xmlNode* c_node + cdef xmlDoc* c_source_doc c_node = child._c_node + c_source_doc = c_node.doc # store possible text node c_next = c_node.next # move node itself @@ -908,7 +912,7 @@ _moveTail(c_next, c_node) # uh oh, elements may be pointing to different doc when # parent element has moved; change them too.. - moveNodeToDocument(parent._doc, c_node) + moveNodeToDocument(parent._doc, c_source_doc, c_node) cdef int _prependChild(_Element parent, _Element child) except -1: """Prepend a new child to a parent element. @@ -916,7 +920,9 @@ cdef xmlNode* c_next cdef xmlNode* c_child cdef xmlNode* c_node + cdef xmlDoc* c_source_doc c_node = child._c_node + c_source_doc = c_node.doc # store possible text node c_next = c_node.next # move node itself @@ -929,14 +935,16 @@ _moveTail(c_next, c_node) # uh oh, elements may be pointing to different doc when # parent element has moved; change them too.. - moveNodeToDocument(parent._doc, c_node) + moveNodeToDocument(parent._doc, c_source_doc, c_node) cdef int _appendSibling(_Element element, _Element sibling) except -1: """Append a new child to a parent element. """ cdef xmlNode* c_next cdef xmlNode* c_node + cdef xmlDoc* c_source_doc c_node = sibling._c_node + c_source_doc = c_node.doc # store possible text node c_next = c_node.next # move node itself @@ -944,14 +952,16 @@ _moveTail(c_next, c_node) # uh oh, elements may be pointing to different doc when # parent element has moved; change them too.. - moveNodeToDocument(element._doc, c_node) + moveNodeToDocument(element._doc, c_source_doc, c_node) cdef int _prependSibling(_Element element, _Element sibling) except -1: """Append a new child to a parent element. """ cdef xmlNode* c_next cdef xmlNode* c_node + cdef xmlDoc* c_source_doc c_node = sibling._c_node + c_source_doc = c_node.doc # store possible text node c_next = c_node.next # move node itself @@ -959,7 +969,7 @@ _moveTail(c_next, c_node) # uh oh, elements may be pointing to different doc when # parent element has moved; change them too.. - moveNodeToDocument(element._doc, c_node) + moveNodeToDocument(element._doc, c_source_doc, c_node) cdef inline int isutf8(char* s): cdef char c Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri May 2 19:14:57 2008 @@ -533,6 +533,7 @@ """ cdef xmlNode* c_node cdef xmlNode* c_next + cdef xmlDoc* c_source_doc cdef _Element element cdef bint left_to_right cdef Py_ssize_t slicelength, step @@ -554,13 +555,14 @@ c_node = _findChild(self._c_node, x) if c_node is NULL: raise IndexError, "list index out of range" + c_source_doc = element._c_node.doc c_next = element._c_node.next _removeText(c_node.next) tree.xmlReplaceNode(c_node, element._c_node) _moveTail(c_next, element._c_node) - moveNodeToDocument(self._doc, element._c_node) + moveNodeToDocument(self._doc, c_source_doc, element._c_node) if not attemptDeallocation(c_node): - moveNodeToDocument(self._doc, c_node) + moveNodeToDocument(self._doc, c_node.doc, c_node) def __delitem__(self, x): """__delitem__(self, x) @@ -707,14 +709,16 @@ """ cdef xmlNode* c_node cdef xmlNode* c_next + cdef xmlDoc* c_source_doc c_node = _findChild(self._c_node, index) if c_node is NULL: _appendChild(self, element) return + c_source_doc = c_node.doc c_next = element._c_node.next tree.xmlAddPrevSibling(c_node, element._c_node) _moveTail(c_next, element._c_node) - moveNodeToDocument(self._doc, element._c_node) + moveNodeToDocument(self._doc, c_source_doc, element._c_node) def remove(self, _Element element not None): """remove(self, element) @@ -732,7 +736,7 @@ tree.xmlUnlinkNode(c_node) _moveTail(c_next, c_node) # fix namespace declarations - moveNodeToDocument(self._doc, c_node) + moveNodeToDocument(self._doc, c_node.doc, c_node) def replace(self, _Element old_element not None, _Element new_element not None): @@ -744,18 +748,20 @@ cdef xmlNode* c_old_next cdef xmlNode* c_new_node cdef xmlNode* c_new_next + cdef xmlDoc* c_source_doc c_old_node = old_element._c_node if c_old_node.parent is not self._c_node: raise ValueError, "Element is not a child of this node." c_old_next = c_old_node.next c_new_node = new_element._c_node c_new_next = c_new_node.next + c_source_doc = c_new_next.doc tree.xmlReplaceNode(c_old_node, c_new_node) _moveTail(c_new_next, c_new_node) _moveTail(c_old_next, c_old_node) - moveNodeToDocument(self._doc, c_new_node) + moveNodeToDocument(self._doc, c_source_doc, c_new_node) # fix namespace declarations - moveNodeToDocument(self._doc, c_old_node) + moveNodeToDocument(self._doc, c_old_node.doc, c_old_node) # PROPERTIES property tag: Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri May 2 19:14:57 2008 @@ -276,7 +276,8 @@ c_nsdef[0] = c_ns_next return 0 -cdef int moveNodeToDocument(_Document doc, xmlNode* c_element) except -1: +cdef int moveNodeToDocument(_Document doc, xmlDoc* c_source_doc, + xmlNode* c_element) except -1: """Fix the xmlNs pointers of a node and its subtree that were moved. Mainly copied from libxml2's xmlReconciliateNs(). Expects libxml2 doc @@ -293,7 +294,11 @@ prefix). If a namespace is unknown, declare a new one on the node. - 3) Set the Document reference to the new Document (if different). + 3) Reassign the names of tags and attribute from the dict of the + target document *iff* it is different from the dict used in the + source subtree. + + 4) Set the Document reference to the new Document (if different). This is done on backtracking to keep the original Document alive as long as possible, until all its elements are updated. @@ -303,16 +308,26 @@ """ cdef xmlNode* c_start_node cdef xmlNode* c_node + cdef char* c_name cdef _nscache c_ns_cache cdef xmlNs* c_ns cdef xmlNs* c_ns_next cdef xmlNs* c_nsdef cdef xmlNs* c_del_ns_list cdef cstd.size_t i + cdef tree.xmlDict* c_dict if not tree._isElementOrXInclude(c_element): return 0 + # we need to copy the names of tags and attributes iff the element + # is based on a different libxml2 tag name dictionary + if doc._c_doc.dict is not c_source_doc.dict and \ + doc._c_doc.dict is not NULL and c_source_doc.dict is not NULL: + c_dict = doc._c_doc.dict + else: + c_dict = NULL + c_start_node = c_element c_del_ns_list = NULL @@ -343,6 +358,13 @@ c_element, c_node.ns.href, c_node.ns.prefix) _appendToNsCache(&c_ns_cache, c_node.ns, c_ns) c_node.ns = c_ns + + # 3) re-assign names from the target dict + if c_dict is not NULL: + c_name = tree.xmlDictLookup(c_dict, c_node.name, -1) + if c_name is not NULL: + c_element.name = c_name + if c_node is c_element: # after the element, continue with its attributes c_node = c_element.properties @@ -358,7 +380,7 @@ if c_node is NULL: # no children => back off and continue with siblings and parents - # 3) fix _Document reference (may dealloc the original document!) + # 4) fix _Document reference (may dealloc the original document!) if c_element._private is not NULL: _updateProxyDocument(c_element, doc) @@ -376,7 +398,7 @@ if c_element is NULL or not tree._isElementOrXInclude(c_element): break - # 3) fix _Document reference (may dealloc the original document!) + # 4) fix _Document reference (may dealloc the original document!) if c_element._private is not NULL: _updateProxyDocument(c_element, doc) Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Fri May 2 19:14:57 2008 @@ -52,12 +52,12 @@ void xmlHashScan(xmlHashTable* table, xmlHashScanner f, void* data) nogil void* xmlHashLookup(xmlHashTable* table, char* name) nogil -cdef extern from "libxml/tree.h": - - # for some reason need to define this in this section; +cdef extern from *: # actually "libxml/dict.h" # libxml/dict.h appears to be broken to include in C ctypedef struct xmlDict - + cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) + +cdef extern from "libxml/tree.h": ctypedef struct xmlDoc ctypedef struct xmlAttr ctypedef struct xmlNotationTable From scoder at codespeak.net Fri May 2 19:15:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 19:15:03 +0200 (CEST) Subject: [Lxml-checkins] r54341 - in lxml/trunk: . src/lxml Message-ID: <20080502171503.51C932A00DB@codespeak.net> Author: scoder Date: Fri May 2 19:15:02 2008 New Revision: 54341 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r4120 at delle: sbehnel | 2008-05-02 18:42:35 +0200 typo Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri May 2 19:15:02 2008 @@ -755,7 +755,7 @@ c_old_next = c_old_node.next c_new_node = new_element._c_node c_new_next = c_new_node.next - c_source_doc = c_new_next.doc + c_source_doc = c_new_node.doc tree.xmlReplaceNode(c_old_node, c_new_node) _moveTail(c_new_next, c_new_node) _moveTail(c_old_next, c_old_node) From scoder at codespeak.net Fri May 2 19:15:07 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 19:15:07 +0200 (CEST) Subject: [Lxml-checkins] r54342 - in lxml/trunk: . src/lxml Message-ID: <20080502171507.8028D39B593@codespeak.net> Author: scoder Date: Fri May 2 19:15:07 2008 New Revision: 54342 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tree.pxd Log: r4121 at delle: sbehnel | 2008-05-02 19:10:07 +0200 cleanup Modified: lxml/trunk/src/lxml/tree.pxd ============================================================================== --- lxml/trunk/src/lxml/tree.pxd (original) +++ lxml/trunk/src/lxml/tree.pxd Fri May 2 19:15:07 2008 @@ -55,7 +55,7 @@ cdef extern from *: # actually "libxml/dict.h" # libxml/dict.h appears to be broken to include in C ctypedef struct xmlDict - cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) + cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) nogil cdef extern from "libxml/tree.h": ctypedef struct xmlDoc From scoder at codespeak.net Fri May 2 19:15:11 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 19:15:11 +0200 (CEST) Subject: [Lxml-checkins] r54343 - in lxml/trunk: . src/lxml Message-ID: <20080502171511.A0B4E2A00DB@codespeak.net> Author: scoder Date: Fri May 2 19:15:11 2008 New Revision: 54343 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/proxy.pxi Log: r4122 at delle: sbehnel | 2008-05-02 19:11:59 +0200 fix and simplification Modified: lxml/trunk/src/lxml/proxy.pxi ============================================================================== --- lxml/trunk/src/lxml/proxy.pxi (original) +++ lxml/trunk/src/lxml/proxy.pxi Fri May 2 19:15:11 2008 @@ -322,8 +322,7 @@ # we need to copy the names of tags and attributes iff the element # is based on a different libxml2 tag name dictionary - if doc._c_doc.dict is not c_source_doc.dict and \ - doc._c_doc.dict is not NULL and c_source_doc.dict is not NULL: + if doc._c_doc.dict is not c_source_doc.dict: c_dict = doc._c_doc.dict else: c_dict = NULL @@ -362,8 +361,10 @@ # 3) re-assign names from the target dict if c_dict is not NULL: c_name = tree.xmlDictLookup(c_dict, c_node.name, -1) + # c_name can be NULL on memory error, but we don't + # handle that here if c_name is not NULL: - c_element.name = c_name + c_node.name = c_name if c_node is c_element: # after the element, continue with its attributes From scoder at codespeak.net Fri May 2 19:15:15 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 19:15:15 +0200 (CEST) Subject: [Lxml-checkins] r54344 - lxml/trunk Message-ID: <20080502171515.58D732A00DB@codespeak.net> Author: scoder Date: Fri May 2 19:15:15 2008 New Revision: 54344 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r4123 at delle: sbehnel | 2008-05-02 19:13:17 +0200 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri May 2 19:15:15 2008 @@ -17,6 +17,10 @@ Bugs fixed ---------- +* Moving a subtree from a document created in one thread into a + document of another thread could crash when the rest of the source + document is deleted while the subtree is still in use. + * Passing an nsmap when creating an Element will no longer strip redundantly defined namespace URIs. This prevented the definition of more than one prefix for a namespace on the same Element. From scoder at codespeak.net Fri May 2 19:59:59 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 19:59:59 +0200 (CEST) Subject: [Lxml-checkins] r54345 - in lxml/trunk: . src/lxml/tests Message-ID: <20080502175959.1362C2A00DB@codespeak.net> Author: scoder Date: Fri May 2 19:59:57 2008 New Revision: 54345 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_elementtree.py Log: r4130 at delle: sbehnel | 2008-05-02 19:55:34 +0200 cleanup Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Fri May 2 19:59:57 2008 @@ -8,7 +8,7 @@ for IO related test cases. """ -import unittest, doctest +import unittest import os, re, tempfile, copy, operator, gc from common_imports import StringIO, etree, ElementTree, cElementTree From scoder at codespeak.net Fri May 2 20:00:03 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 20:00:03 +0200 (CEST) Subject: [Lxml-checkins] r54346 - in lxml/trunk: . src/lxml/tests Message-ID: <20080502180003.C61222A00DB@codespeak.net> Author: scoder Date: Fri May 2 20:00:03 2008 New Revision: 54346 Added: lxml/trunk/src/lxml/tests/test_threading.py Modified: lxml/trunk/ (props changed) Log: r4131 at delle: sbehnel | 2008-05-02 19:58:26 +0200 new test suite for threading tests Added: lxml/trunk/src/lxml/tests/test_threading.py ============================================================================== --- (empty file) +++ lxml/trunk/src/lxml/tests/test_threading.py Fri May 2 20:00:03 2008 @@ -0,0 +1,38 @@ +# -*- coding: utf-8 -*- + +""" +Tests for thread usage in lxml.etree. +""" + +import unittest, threading + +from common_imports import etree, HelperTestCase + +class ThreadingTestCase(HelperTestCase): + """Threading tests""" + etree = etree + + def test_subtree_copy(self): + tostring = self.etree.tostring + XML = self.etree.XML + xml = "" + main_root = XML("") + + def run_thread(): + thread_root = XML(xml) + main_root.append(thread_root[0]) + del thread_root + + thread = threading.Thread(target=run_thread) + thread.start() + thread.join() + + self.assertEquals(xml, tostring(main_root)) + +def test_suite(): + suite = unittest.TestSuite() + suite.addTests([unittest.makeSuite(ThreadingTestCase)]) + return suite + +if __name__ == '__main__': + print 'to test use test.py %s' % __file__ From scoder at codespeak.net Fri May 2 20:12:28 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 20:12:28 +0200 (CEST) Subject: [Lxml-checkins] r54349 - in lxml/trunk: . doc Message-ID: <20080502181228.7C57B169E74@codespeak.net> Author: scoder Date: Fri May 2 20:12:28 2008 New Revision: 54349 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r4134 at delle: sbehnel | 2008-05-02 20:10:19 +0200 relieve FAQ on threading from 'big fat warning' Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Fri May 2 20:12:28 2008 @@ -565,29 +565,32 @@ Can I use threads to concurrently access the lxml API? ------------------------------------------------------ -Yes, although not carelessly. +Short answer: yes, if you use lxml 2.1 and later. -lxml frees the GIL (Python's global interpreter lock) internally when parsing -from disk and memory, as long as you use either the default parser (which is -replicated for each thread) or create a parser for each thread yourself. lxml -also allows concurrency during validation (RelaxNG and XMLSchema) and XSL -transformation. You can share RelaxNG, XMLSchema and (with restrictions) XSLT -objects between threads. While you can also share parsers between threads, -this will serialize the access to each of them, so it is better to ``copy()`` -parsers or to just use the default parser (which is automatically copied for -each thread). +Since version 1.1, lxml frees the GIL (Python's global interpreter +lock) internally when parsing from disk and memory, as long as you use +either the default parser (which is replicated for each thread) or +create a parser for each thread yourself. lxml also allows +concurrency during validation (RelaxNG and XMLSchema) and XSL +transformation. You can share RelaxNG, XMLSchema and (with +restrictions) XSLT objects between threads. While you can also share +parsers between threads, this will serialize the access to each of +them, so it is better to ``.copy()`` parsers or to just use the +default parser if you do not need any special configuration. Due to the way libxslt handles threading, applying a stylesheets is most efficient if it was parsed in the same thread that executes it. One way to achieve this is by caching stylesheets in thread-local storage. -Warning: You should generally avoid modifying trees in other threads than the -one it was generated in. Although this should work in many cases, there are -certain scenarios where the termination of a thread that parsed a tree can -crash the application if subtrees of this tree were moved to other documents. -You should be on the safe side when passing trees between threads if you -either +Warning: Before lxml 2.1, there were issues when moving subtrees +between different threads. If you need code to run with older +versions, you should generally avoid modifying trees in other threads +than the one it was generated in. Although this should work in many +cases, there are certain scenarios where the termination of a thread +that parsed a tree can crash the application if subtrees of this tree +were moved to other documents. You should be on the safe side when +passing trees between threads if you either a) do not modify these trees and do not move their elements to other trees, or b) do not terminate threads while the trees they parsed are still in use From scoder at codespeak.net Fri May 2 20:12:33 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 20:12:33 +0200 (CEST) Subject: [Lxml-checkins] r54350 - in lxml/trunk: . doc Message-ID: <20080502181233.B3E33169E77@codespeak.net> Author: scoder Date: Fri May 2 20:12:33 2008 New Revision: 54350 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/main.txt lxml/trunk/version.txt Log: r4135 at delle: sbehnel | 2008-05-02 20:10:38 +0200 pre-release changes Modified: lxml/trunk/doc/main.txt ============================================================================== --- lxml/trunk/doc/main.txt (original) +++ lxml/trunk/doc/main.txt Fri May 2 20:12:33 2008 @@ -146,8 +146,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.1beta1`_, released 2008-04-15 -(`changes for 2.1beta1`_). `Older versions`_ are listed below. +The latest version is `lxml 2.1beta2`_, released 2008-05-02 +(`changes for 2.1beta2`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -215,7 +215,9 @@ Old Versions ------------ -.. _`PDF documentation`: lxmldoc-2.1beta1.pdf +.. _`PDF documentation`: lxmldoc-2.1beta2.pdf + +* `lxml 2.1beta1`_, released 2008-04-15 (`changes for 2.1beta1`_) * `lxml 2.1alpha1`_, released 2008-03-27 (`changes for 2.1alpha1`_) @@ -281,6 +283,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.1beta2`: lxml-2.1beta2.tgz .. _`lxml 2.1beta1`: lxml-2.1beta1.tgz .. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz .. _`lxml 2.0.5`: lxml-2.0.5.tgz @@ -314,6 +317,7 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz +.. _`changes for 2.1beta2`: changes-2.1beta2.html .. _`changes for 2.1beta1`: changes-2.1beta1.html .. _`changes for 2.1alpha1`: changes-2.1alpha1.html .. _`changes for 2.0.5`: changes-2.0.5.html Modified: lxml/trunk/version.txt ============================================================================== --- lxml/trunk/version.txt (original) +++ lxml/trunk/version.txt Fri May 2 20:12:33 2008 @@ -1 +1 @@ -2.1beta1 +2.1beta2 From scoder at codespeak.net Fri May 2 20:18:01 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 20:18:01 +0200 (CEST) Subject: [Lxml-checkins] r54351 - lxml/trunk Message-ID: <20080502181801.CBD87169E78@codespeak.net> Author: scoder Date: Fri May 2 20:18:01 2008 New Revision: 54351 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r4138 at delle: sbehnel | 2008-05-02 20:16:32 +0200 pre-release changes Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri May 2 20:18:01 2008 @@ -2,8 +2,8 @@ lxml changelog ============== -Under development -================= +2.1beta2 (2008-05-02) +===================== Features added -------------- From scoder at codespeak.net Fri May 2 21:56:30 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 21:56:30 +0200 (CEST) Subject: [Lxml-checkins] r54352 - in lxml/trunk: . src/lxml/html Message-ID: <20080502195630.2D9A4169E4C@codespeak.net> Author: scoder Date: Fri May 2 21:56:28 2008 New Revision: 54352 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/defs.py Log: r4140 at delle: sbehnel | 2008-05-02 21:46:15 +0200 use sets instead of lists in defs.py as most use cases only test for containment Modified: lxml/trunk/src/lxml/html/defs.py ============================================================================== --- lxml/trunk/src/lxml/html/defs.py (original) +++ lxml/trunk/src/lxml/html/defs.py Fri May 2 21:56:28 2008 @@ -4,34 +4,40 @@ # Data taken from http://www.w3.org/TR/html401/index/elements.html -empty_tags = [ +try: + frozenset +except NameError: + from sets import Set as frozenset + + +empty_tags = frozenset([ 'area', 'base', 'basefont', 'br', 'col', 'frame', 'hr', - 'img', 'input', 'isindex', 'link', 'meta', 'param'] + 'img', 'input', 'isindex', 'link', 'meta', 'param']) -deprecated_tags = [ +deprecated_tags = frozenset([ 'applet', 'basefont', 'center', 'dir', 'font', 'isindex', - 'menu', 's', 'strike', 'u'] + 'menu', 's', 'strike', 'u']) # archive actually takes a space-separated list of URIs -link_attrs = [ +link_attrs = frozenset([ 'action', 'archive', 'background', 'cite', 'classid', 'codebase', 'data', 'href', 'longdesc', 'profile', 'src', 'usemap', # Not standard: 'dynsrc', 'lowsrc', - ] + ]) # Not in the HTML 4 spec: # onerror, onresize -event_attrs = [ +event_attrs = frozenset([ 'onblur', 'onchange', 'onclick', 'ondblclick', 'onerror', 'onfocus', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onreset', 'onresize', 'onselect', 'onsubmit', 'onunload', - ] + ]) -safe_attrs = [ +safe_attrs = frozenset([ 'abbr', 'accept', 'accept-charset', 'accesskey', 'action', 'align', 'alt', 'axis', 'border', 'cellpadding', 'cellspacing', 'char', 'charoff', 'charset', 'checked', 'cite', 'class', 'clear', 'cols', 'colspan', @@ -41,18 +47,18 @@ 'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly', 'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size', 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title', - 'type', 'usemap', 'valign', 'value', 'vspace', 'width'] + 'type', 'usemap', 'valign', 'value', 'vspace', 'width']) # From http://htmlhelp.com/reference/html40/olist.html -top_level_tags = [ +top_level_tags = frozenset([ 'html', 'head', 'body', 'frameset', - ] + ]) -head_tags = [ +head_tags = frozenset([ 'base', 'isindex', 'link', 'meta', 'script', 'style', 'title', - ] + ]) -general_block_tags = [ +general_block_tags = frozenset([ 'address', 'blockquote', 'center', @@ -70,51 +76,51 @@ 'noscript', 'p', 'pre', - ] + ]) -list_tags = [ +list_tags = frozenset([ 'dir', 'dl', 'dt', 'dd', 'li', 'menu', 'ol', 'ul', - ] + ]) -table_tags = [ +table_tags = frozenset([ 'table', 'caption', 'colgroup', 'col', 'thead', 'tfoot', 'tbody', 'tr', 'td', 'th', - ] + ]) # just this one from # http://www.georgehernandez.com/h/XComputers/HTML/2BlockLevel.htm -block_tags = general_block_tags + list_tags + table_tags + [ +block_tags = general_block_tags | list_tags | table_tags | frozenset([ # Partial form tags 'fieldset', 'form', 'legend', 'optgroup', 'option', - ] + ]) -form_tags = [ +form_tags = frozenset([ 'form', 'button', 'fieldset', 'legend', 'input', 'label', 'select', 'optgroup', 'option', 'textarea', - ] + ]) -special_inline_tags = [ +special_inline_tags = frozenset([ 'a', 'applet', 'basefont', 'bdo', 'br', 'embed', 'font', 'iframe', 'img', 'map', 'area', 'object', 'param', 'q', 'script', 'span', 'sub', 'sup', - ] + ]) -phrase_tags = [ +phrase_tags = frozenset([ 'abbr', 'acronym', 'cite', 'code', 'del', 'dfn', 'em', 'ins', 'kbd', 'samp', 'strong', 'var', - ] + ]) -font_style_tags = [ +font_style_tags = frozenset([ 'b', 'big', 'i', 's', 'small', 'strike', 'tt', 'u', - ] + ]) -frame_tags = [ +frame_tags = frozenset([ 'frameset', 'frame', 'noframes', - ] + ]) # These tags aren't standard -nonstandard_tags = ['blink', 'marque'] +nonstandard_tags = frozenset(['blink', 'marque']) -tags = (top_level_tags + head_tags + general_block_tags + list_tags - + table_tags + form_tags + special_inline_tags + phrase_tags - + font_style_tags + nonstandard_tags) +tags = (top_level_tags | head_tags | general_block_tags | list_tags + | table_tags | form_tags | special_inline_tags | phrase_tags + | font_style_tags | nonstandard_tags) From scoder at codespeak.net Fri May 2 21:56:35 2008 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 2 May 2008 21:56:35 +0200 (CEST) Subject: [Lxml-checkins] r54353 - in lxml/trunk: . src/lxml/html Message-ID: <20080502195635.5C6BD2A00DB@codespeak.net> Author: scoder Date: Fri May 2 21:56:34 2008 New Revision: 54353 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/html/__init__.py lxml/trunk/src/lxml/html/clean.py lxml/trunk/src/lxml/html/formfill.py Log: r4141 at delle: sbehnel | 2008-05-02 21:47:32 +0200 support XHTML tags in XPath expressions of lxml.html Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Fri May 2 21:56:34 2008 @@ -2,6 +2,21 @@ lxml changelog ============== +Under development +================= + +Features added +-------------- + +* Most features in lxml.html work for XHTML namespaced tag names. + +Bugs fixed +---------- + +Other changes +------------- + + 2.1beta2 (2008-05-02) ===================== Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Fri May 2 21:56:34 2008 @@ -22,16 +22,30 @@ 'find_rel_links', 'find_class', 'make_links_absolute', 'resolve_base_href', 'iterlinks', 'rewrite_links', 'open_in_browser'] -_rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]") +XHTML_NAMESPACE = "http://www.w3.org/1999/xhtml" + +_rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]|descendant-or-self::x:a[@rel]", + namespaces={'x':XHTML_NAMESPACE}) +_options_xpath = etree.XPath("descendant-or-self::option|descendant-or-self::x:option", + namespaces={'x':XHTML_NAMESPACE}) +_forms_xpath = etree.XPath("descendant-or-self::form|descendant-or-self::x:form", + namespaces={'x':XHTML_NAMESPACE}) #_class_xpath = etree.XPath(r"descendant-or-self::*[regexp:match(@class, concat('\b', $class_name, '\b'))]", {'regexp': 'http://exslt.org/regular-expressions'}) _class_xpath = etree.XPath("descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), concat(' ', $class_name, ' '))]") _id_xpath = etree.XPath("descendant-or-self::*[@id=$id]") _collect_string_content = etree.XPath("string()") _css_url_re = re.compile(r'url\((.*?)\)', re.I) _css_import_re = re.compile(r'@import "(.*?)"') -_label_xpath = etree.XPath("//label[@for=$id]") +_label_xpath = etree.XPath("//label[@for=$id]|//x:label[@for=$id]", + namespaces={'x':XHTML_NAMESPACE}) _archive_re = re.compile(r'[^ ]+') +def _nons(tag): + if isinstance(tag, basestring): + if tag[0] == '{' and tag[1:len(XHTML_NAMESPACE)+1] == XHTML_NAMESPACE: + return tag.split('}')[-1] + return tag + class HtmlMixin(object): def base_url(self): @@ -48,7 +62,7 @@ """ Return a list of all the forms """ - return list(self.getiterator('form')) + return _forms_xpath(self) forms = property(forms, doc=forms.__doc__) def body(self): @@ -56,7 +70,7 @@ Return the element. Can be called from a child element to get the document's head. """ - return self.xpath('//body')[0] + return self.xpath('//body|//x:body', namespaces={'x':XHTML_NAMESPACE})[0] body = property(body, doc=body.__doc__) def head(self): @@ -64,7 +78,7 @@ Returns the element. Can be called from a child element to get the document's head. """ - return self.xpath('//head')[0] + return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0] head = property(head, doc=head.__doc__) def _label__get(self): @@ -85,7 +99,7 @@ raise TypeError( "You cannot set a label for an element (%r) that has no id" % self) - if not label.tag == 'label': + if _nons(label.tag) != 'label': raise TypeError( "You can only assign label to a label element (not %r)" % label) @@ -228,7 +242,7 @@ tag once it has been applied. """ base_href = None - basetags = self.xpath('//base[@href]') + basetags = self.xpath('//base[@href]|//x:base[@href]', namespaces={'x':XHTML_NAMESPACE}) for b in basetags: base_href = b.get('href') b.drop_tree() @@ -249,11 +263,12 @@ link_attrs = defs.link_attrs for el in self.getiterator(): attribs = el.attrib - if el.tag != 'object': + tag = _nons(el.tag) + if tag != 'object': for attrib in link_attrs: if attrib in attribs: yield (el, attrib, attribs[attrib], 0) - elif el.tag == 'object': + elif tag == 'object': codebase = None ## tags have attributes that are relative to ## codebase @@ -272,7 +287,7 @@ if codebase is not None: value = urlparse.urljoin(codebase, value) yield (el, 'archive', value, match.start()) - if el.tag == 'param': + if tag == 'param': valuetype = el.get('valuetype') or '' if valuetype.lower() == 'ref': ## FIXME: while it's fine we *find* this link, @@ -282,7 +297,7 @@ ## doesn't have a valuetype="ref" (which seems to be the norm) ## http://www.w3.org/TR/html401/struct/objects.html#adef-valuetype yield (el, 'value', el.get('value'), 0) - if el.tag == 'style' and el.text: + if tag == 'style' and el.text: for match in _css_url_re.finditer(el.text): yield (el, None, match.group(1), match.start(1)) for match in _css_import_re.finditer(el.text): @@ -471,8 +486,8 @@ if not start.startswith(' 1: @@ -558,6 +575,8 @@ else: body = None heads = doc.findall('head') + if not heads: + heads = doc.findall('{%s}head' % XHTML_NAMESPACE) if heads: # Well, we have some sort of structure, so lets keep it all head = heads[0] @@ -598,7 +617,7 @@ # FIXME: I could do this with XPath, but would that just be # unnecessarily slow? for el in el.getiterator(): - if el.tag in defs.block_tags: + if _nons(el.tag) in defs.block_tags: return True return False @@ -608,7 +627,7 @@ elif isinstance(el, basestring): return 'string' else: - return el.tag + return _nons(el.tag) ################################################################################ # form handling @@ -655,7 +674,10 @@ return self.get('name') elif self.get('id'): return '#' + self.get('id') - return str(self.body.findall('form').index(self)) + forms = self.body.findall('form') + if not forms: + forms = self.body.findall('{%s}form' % XHTML_NAMESPACE) + return str(forms.index(self)) def form_values(self): """ @@ -667,9 +689,10 @@ name = el.name if not name: continue - if el.tag == 'textarea': + tag = _nons(el.tag) + if tag == 'textarea': results.append((name, el.value)) - elif el.tag == 'select': + elif tag == 'select': value = el.value if el.multiple: for v in value: @@ -677,7 +700,7 @@ elif value is not None: results.append((name, el.value)) else: - assert el.tag == 'input', ( + assert tag == 'input', ( "Unexpected tag: %r" % el) if el.checkable and not el.checked: continue @@ -801,8 +824,8 @@ checkboxes and radio elements are returned individually. """ - _name_xpath = etree.XPath(".//*[@name = $name and (name(.) = 'select' or name(.) = 'input' or name(.) = 'textarea')]") - _all_xpath = etree.XPath(".//*[name() = 'select' or name() = 'input' or name() = 'textarea']") + _name_xpath = etree.XPath(".//*[@name = $name and (local-name(.) = 'select' or local-name(.) = 'input' or local-name(.) = 'textarea')]") + _all_xpath = etree.XPath(".//*[local-name() = 'select' or local-name() = 'input' or local-name() = 'textarea']") def __init__(self, form): self.form = form @@ -919,7 +942,7 @@ """ if self.multiple: return MultipleSelectOptions(self) - for el in self.getiterator('option'): + for el in _options_xpath(self): if 'selected' in el.attrib: value = el.get('value') # FIXME: If value is None, what to return?, get_text()? @@ -935,7 +958,7 @@ self.value.update(value) return if value is not None: - for el in self.getiterator('option'): + for el in _options_xpath(self): # FIXME: also if el.get('value') is None? if el.get('value') == value: checked_option = el @@ -943,7 +966,7 @@ else: raise ValueError( "There is no option with the value of %r" % value) - for el in self.getiterator('option'): + for el in _options_xpath(self): if 'selected' in el.attrib: del el.attrib['selected'] if value is not None: @@ -963,7 +986,7 @@ All the possible values this select can have (the ``value`` attribute of all the ``