From scoder at codespeak.net Thu May 1 11:06:21 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:06:21 +0200 (CEST)
Subject: [Lxml-checkins] r54297 - in lxml/branch/lxml-2.0: . doc
Message-ID: <20080501090621.EF4A32A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 11:06:19 2008
New Revision: 54297
Modified:
lxml/branch/lxml-2.0/CHANGES.txt
lxml/branch/lxml-2.0/doc/main.txt
lxml/branch/lxml-2.0/version.txt
Log:
prepare release of 2.0.5
Modified: lxml/branch/lxml-2.0/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-2.0/CHANGES.txt (original)
+++ lxml/branch/lxml-2.0/CHANGES.txt Thu May 1 11:06:19 2008
@@ -2,8 +2,8 @@
lxml changelog
==============
-Under development
-=================
+2.0.4 (2008-05-01)
+==================
Features added
--------------
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 11:06:19 2008
@@ -145,8 +145,8 @@
.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/
.. _`this key`: pubkey.asc
-The latest version is `lxml 2.0.4`_, released 2008-04-13
-(`changes for 2.0.4`_). `Older versions`_ are listed below.
+The latest version is `lxml 2.0.5`_, released 2008-04-13
+(`changes for 2.0.5`_). `Older versions`_ are listed below.
.. _`Older versions`: #old-versions
@@ -206,6 +206,8 @@
Old Versions
------------
+* `lxml 2.0.4`_, released 2008-04-13 (`changes for 2.0.4`_)
+
* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_)
* `lxml 2.0.2`_, released 2008-02-22 (`changes for 2.0.2`_)
@@ -264,6 +266,7 @@
* `lxml 0.5`_, released 2005-04-08
+.. _`lxml 2.0.5`: lxml-2.0.5.tgz
.. _`lxml 2.0.4`: lxml-2.0.4.tgz
.. _`lxml 2.0.3`: lxml-2.0.3.tgz
.. _`lxml 2.0.2`: lxml-2.0.2.tgz
@@ -294,6 +297,7 @@
.. _`lxml 0.5.1`: lxml-0.5.1.tgz
.. _`lxml 0.5`: lxml-0.5.tgz
+.. _`changes for 2.0.5`: changes-2.0.5.html
.. _`changes for 2.0.4`: changes-2.0.4.html
.. _`changes for 2.0.3`: changes-2.0.3.html
.. _`changes for 2.0.2`: changes-2.0.2.html
Modified: lxml/branch/lxml-2.0/version.txt
==============================================================================
--- lxml/branch/lxml-2.0/version.txt (original)
+++ lxml/branch/lxml-2.0/version.txt Thu May 1 11:06:19 2008
@@ -1 +1 @@
-2.0.4
+2.0.5
From scoder at codespeak.net Thu May 1 11:16:48 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:16:48 +0200 (CEST)
Subject: [Lxml-checkins] r54298 - lxml/branch/lxml-2.0/doc
Message-ID: <20080501091648.643A5169E35@codespeak.net>
Author: scoder
Date: Thu May 1 11:16:48 2008
New Revision: 54298
Added:
lxml/branch/lxml-2.0/doc/mklatex.py
- copied unchanged from r54297, lxml/trunk/doc/mklatex.py
Log:
support PDF generation in 2.0.x
From scoder at codespeak.net Thu May 1 11:17:09 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:17:09 +0200 (CEST)
Subject: [Lxml-checkins] r54299 - lxml/branch/lxml-2.0/doc
Message-ID: <20080501091709.4B2D3169E37@codespeak.net>
Author: scoder
Date: Thu May 1 11:17:08 2008
New Revision: 54299
Added:
lxml/branch/lxml-2.0/doc/rest2latex.py
- copied unchanged from r54298, lxml/trunk/doc/rest2latex.py
Log:
support PDF generation in 2.0.x
From scoder at codespeak.net Thu May 1 11:32:12 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 11:32:12 +0200 (CEST)
Subject: [Lxml-checkins] r54300 - lxml/branch/lxml-2.0/doc/html
Message-ID: <20080501093212.9F1091684E3@codespeak.net>
Author: scoder
Date: Thu May 1 11:32:12 2008
New Revision: 54300
Added:
lxml/branch/lxml-2.0/doc/html/tagpython-big.png
- copied unchanged from r54299, lxml/trunk/doc/html/tagpython-big.png
Log:
support PDF generation in 2.0.x
From scoder at codespeak.net Thu May 1 12:01:36 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:01:36 +0200 (CEST)
Subject: [Lxml-checkins] r54302 - in lxml/branch/lxml-2.0: . doc
Message-ID: <20080501100136.EF3822A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:01:35 2008
New Revision: 54302
Added:
lxml/branch/lxml-2.0/doc/docstructure.py
- copied unchanged from r53866, lxml/trunk/doc/docstructure.py
Modified:
lxml/branch/lxml-2.0/Makefile
lxml/branch/lxml-2.0/doc/main.txt
lxml/branch/lxml-2.0/doc/mkhtml.py
Log:
PDF doc fixes from trunk
Modified: lxml/branch/lxml-2.0/Makefile
==============================================================================
--- lxml/branch/lxml-2.0/Makefile (original)
+++ lxml/branch/lxml-2.0/Makefile Thu May 1 12:01:35 2008
@@ -2,6 +2,7 @@
TESTFLAGS=-p -v
TESTOPTS=
SETUPFLAGS=
+LXMLVERSION=`cat version.txt`
all: inplace
@@ -40,17 +41,40 @@
ftest_inplace: inplace
$(PYTHON) test.py -f $(TESTFLAGS) $(TESTOPTS)
-html: inplace
- mkdir -p doc/html
- PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . `cat version.txt`
+apihtml: inplace
rm -fr doc/html/api
@[ -x "`which epydoc`" ] \
&& (cd src && echo "Generating API docs ..." && \
PYTHONPATH=. epydoc -v --docformat "restructuredtext en" \
-o ../doc/html/api --no-private --exclude='[.]html[.]tests|[.]_' \
- --name lxml --url http://codespeak.net/lxml/ lxml/) \
+ --exclude-introspect='[.]usedoctest' \
+ --name "lxml API" --url http://codespeak.net/lxml/ lxml/) \
|| (echo "not generating epydoc API documentation")
+html: inplace apihtml
+ PYTHONPATH=src $(PYTHON) doc/mkhtml.py doc/html . ${LXMLVERSION}
+
+apipdf: inplace
+ rm -fr doc/pdf
+ mkdir -p doc/pdf
+ @[ -x "`which epydoc`" ] \
+ && (cd src && echo "Generating API docs ..." && \
+ PYTHONPATH=. epydoc -v --latex --docformat "restructuredtext en" \
+ -o ../doc/pdf --no-private --exclude='([.]html)?[.]tests|[.]_' \
+ --exclude-introspect='html[.]clean|[.]usedoctest' \
+ --name "lxml API" --url http://codespeak.net/lxml/ lxml/) \
+ || (echo "not generating epydoc API documentation")
+
+pdf: apipdf
+ $(PYTHON) doc/mklatex.py doc/pdf . ${LXMLVERSION}
+ (cd doc/pdf && pdflatex lxmldoc.tex \
+ && pdflatex lxmldoc.tex \
+ && pdflatex lxmldoc.tex)
+ @pdfopt doc/pdf/lxmldoc.pdf doc/pdf/lxmldoc-${LXMLVERSION}.pdf
+ @echo "PDF available as doc/pdf/lxmldoc-${LXMLVERSION}.pdf"
+
+# Two pdflatex runs are needed to build the correct Table of contents.
+
test: test_inplace
valtest: valgrind_test_inplace
@@ -65,7 +89,12 @@
find . \( -name '*.o' -o -name '*.so' -o -name '*.py[cod]' -o -name '*.dll' \) -exec rm -f {} \;
rm -rf build
-realclean: clean
+docclean:
+ rm -f doc/html/*.html
+ rm -fr doc/html/api
+ rm -fr doc/pdf
+
+realclean: clean docclean
find . -name '*.c' -exec rm -f {} \;
rm -f TAGS
$(PYTHON) setup.py clean -a
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 12:01:35 2008
@@ -47,6 +47,10 @@
Documentation
-------------
+The complete lxml documentation is available for download as `PDF
+documentation`_. The HTML documentation from this web site is part of
+the normal `source download <#download>`_.
+
* ElementTree:
* `ElementTree API`_
@@ -140,32 +144,37 @@
The source distribution is signed with `this key`_. Binary builds for
MS Windows usually become available through PyPI a few days after a
source release. If you can't wait, consider trying a less recent
-version first.
-
-.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/
-.. _`this key`: pubkey.asc
+release version first.
The latest version is `lxml 2.0.5`_, released 2008-04-13
(`changes for 2.0.5`_). `Older versions`_ are listed below.
-.. _`Older versions`: #old-versions
-
Please take a look at the `installation instructions`_!
-.. _`installation instructions`: installation.html
+This complete web site (including the generated API documentation) is
+part of the source distribution, so if you want to download the
+documentation for offline use, take the source archive and copy the
+``doc/html`` directory out of the source tree.
It's also possible to check out the latest development version of lxml
from svn directly, using a command like this::
svn co http://codespeak.net/svn/lxml/trunk lxml
-You can also `browse it through the web`_. Please read `how to build lxml
-from source`_ first. The `latest CHANGES`_ of the developer version are also
-accessible. You can check there if a bug you found has been fixed or a
-feature you want has been implemented in the latest trunk version.
+You can also browse the `Subversion repository`_ through the web, or
+take a look at the `Subversion history`_. Please read `how to build lxml
+from source`_ first. The `latest CHANGES`_ of the developer version
+are also accessible. You can check there if a bug you found has been
+fixed or a feature you want has been implemented in the latest trunk
+version.
-.. _`how to build lxml from source`: build.html
-.. _`browse it through the web`: http://codespeak.net/svn/lxml
+.. _`lxml at the Python Package Index`: http://pypi.python.org/pypi/lxml/
+.. _`this key`: pubkey.asc
+.. _`Older versions`: #old-versions
+.. _`installation instructions`: installation.html
+ .. _`how to build lxml from source`: build.html
+.. _`Subversion repository`: http://codespeak.net/svn/lxml/
+.. _`Subversion history`: https://codespeak.net/viewvc/lxml/
.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
@@ -178,7 +187,7 @@
.. _`mailing list`: http://codespeak.net/mailman/listinfo/lxml-dev
.. _Gmane: http://blog.gmane.org/gmane.comp.python.lxml.devel
-.. _Google: http://www.google.com/webhp?q=site:codespeak.net/mailman/listinfo/lxml-dev%20
+.. _Google: http://www.google.com/webhp?q=site:codespeak.net%2Fmailman%2Flistinfo%2Flxml-dev+
Bug tracker
@@ -189,7 +198,7 @@
unexpected behaviour of lxml is a bug or not, please ask on the `mailing
list`_ first. Do not forget to search the archive (e.g. with Gmane_)!
-.. _`launchpad bug tracker`: https://launchpad.net/lxml
+.. _`launchpad bug tracker`: https://launchpad.net/lxml/
License
Modified: lxml/branch/lxml-2.0/doc/mkhtml.py
==============================================================================
--- lxml/branch/lxml-2.0/doc/mkhtml.py (original)
+++ lxml/branch/lxml-2.0/doc/mkhtml.py Thu May 1 12:01:35 2008
@@ -1,21 +1,8 @@
+from docstructure import SITE_STRUCTURE, HREF_MAP, BASENAME_MAP
from lxml.etree import (parse, fromstring, ElementTree,
Element, SubElement, XPath)
import os, shutil, re, sys, copy, time
-SITE_STRUCTURE = [
- ('lxml', ('main.txt', 'intro.txt', '../INSTALL.txt', 'lxml2.txt',
- 'performance.txt', 'compatibility.txt', 'FAQ.txt')),
- ('Developing with lxml', ('tutorial.txt', '@API reference',
- 'api.txt', 'parsing.txt',
- 'validation.txt', 'xpathxslt.txt',
- 'objectify.txt', 'lxmlhtml.txt',
- 'cssselect.txt', 'elementsoup.txt')),
- ('Extending lxml', ('resolvers.txt', 'extensions.txt',
- 'element_classes.txt', 'sax.txt', 'capi.txt')),
- ('Developing lxml', ('build.txt', 'lxml-source-howto.txt',
- '@Release Changelog')),
- ]
-
RST2HTML_OPTIONS = " ".join([
"--no-toc-backlinks",
"--strip-comments",
@@ -23,15 +10,6 @@
"--date",
])
-HREF_MAP = {
- "API reference" : "api/index.html"
-}
-
-BASENAME_MAP = {
- 'main' : 'index',
- 'INSTALL' : 'installation',
-}
-
htmlnsmap = {"h" : "http://www.w3.org/1999/xhtml"}
find_title = XPath("/h:html/h:head/h:title/text()", namespaces=htmlnsmap)
From scoder at codespeak.net Thu May 1 12:04:49 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:04:49 +0200 (CEST)
Subject: [Lxml-checkins] r54303 - in lxml/branch/lxml-2.0: . doc
Message-ID: <20080501100449.BD3582A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:04:49 2008
New Revision: 54303
Modified:
lxml/branch/lxml-2.0/MANIFEST.in
lxml/branch/lxml-2.0/doc/main.txt
Log:
doc fixes
Modified: lxml/branch/lxml-2.0/MANIFEST.in
==============================================================================
--- lxml/branch/lxml-2.0/MANIFEST.in (original)
+++ lxml/branch/lxml-2.0/MANIFEST.in Thu May 1 12:04:49 2008
@@ -10,6 +10,6 @@
recursive-include src/lxml/tests *.rng *.xslt *.xml *.dtd *.xsd *.html *.txt
recursive-include src/lxml/html/tests *.data *.txt
recursive-include benchmark *.py
-recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png
+recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython*.png
recursive-include fake_pyrex *.py
include doc/mkhtml.py doc/rest2html.py
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Thu May 1 12:04:49 2008
@@ -172,7 +172,7 @@
.. _`this key`: pubkey.asc
.. _`Older versions`: #old-versions
.. _`installation instructions`: installation.html
- .. _`how to build lxml from source`: build.html
+.. _`how to build lxml from source`: build.html
.. _`Subversion repository`: http://codespeak.net/svn/lxml/
.. _`Subversion history`: https://codespeak.net/viewvc/lxml/
.. _`latest CHANGES`: http://codespeak.net/svn/lxml/trunk/CHANGES.txt
@@ -215,6 +215,8 @@
Old Versions
------------
+.. _`PDF documentation`: lxmldoc-2.0.5.pdf
+
* `lxml 2.0.4`_, released 2008-04-13 (`changes for 2.0.4`_)
* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_)
From scoder at codespeak.net Thu May 1 12:05:19 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:05:19 +0200 (CEST)
Subject: [Lxml-checkins] r54304 - lxml/trunk
Message-ID: <20080501100519.91FC12A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:05:19 2008
New Revision: 54304
Modified:
lxml/trunk/ (props changed)
lxml/trunk/MANIFEST.in
Log:
r4104 at delle: sbehnel | 2008-05-01 12:03:52 +0200
include tagpython-big.png in source distro
Modified: lxml/trunk/MANIFEST.in
==============================================================================
--- lxml/trunk/MANIFEST.in (original)
+++ lxml/trunk/MANIFEST.in Thu May 1 12:05:19 2008
@@ -11,6 +11,6 @@
recursive-include src/lxml/html/tests *.data *.txt
recursive-include samples *.xml
recursive-include benchmark *.py
-recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython.png
+recursive-include doc *.txt *.html *.css *.xml *.mgp pubkey.asc tagpython*.png
recursive-include fake_pyrex *.py
include doc/mkhtml.py doc/rest2html.py
From scoder at codespeak.net Thu May 1 12:10:12 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:10:12 +0200 (CEST)
Subject: [Lxml-checkins] r54305 - lxml/branch/lxml-2.0
Message-ID: <20080501101012.BA74C2A01B2@codespeak.net>
Author: scoder
Date: Thu May 1 12:10:12 2008
New Revision: 54305
Modified:
lxml/branch/lxml-2.0/CHANGES.txt
Log:
pre-release fix
Modified: lxml/branch/lxml-2.0/CHANGES.txt
==============================================================================
--- lxml/branch/lxml-2.0/CHANGES.txt (original)
+++ lxml/branch/lxml-2.0/CHANGES.txt Thu May 1 12:10:12 2008
@@ -2,7 +2,7 @@
lxml changelog
==============
-2.0.4 (2008-05-01)
+2.0.5 (2008-05-01)
==================
Features added
From scoder at codespeak.net Thu May 1 12:28:12 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Thu, 1 May 2008 12:28:12 +0200 (CEST)
Subject: [Lxml-checkins] r54306 - in lxml/trunk: . doc
Message-ID: <20080501102812.9821E168563@codespeak.net>
Author: scoder
Date: Thu May 1 12:28:11 2008
New Revision: 54306
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/doc/main.txt
Log:
r4112 at delle: sbehnel | 2008-05-01 12:26:43 +0200
integrated release changes of 2.0.5
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Thu May 1 12:28:11 2008
@@ -26,6 +26,26 @@
namespace (i.e. they would end up in the wrong namespace).
+2.0.5 (2008-05-01)
+==================
+
+Features added
+--------------
+
+Bugs fixed
+----------
+
+* Resolving to a filename in custom resolvers didn't work.
+
+* lxml did not honour libxslt's second error state "STOPPED", which
+ let some XSLT errors pass silently.
+
+* Memory leak in Schematron with libxml2 >= 2.6.31.
+
+Other changes
+-------------
+
+
2.1beta1 (2008-04-15)
=====================
Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt (original)
+++ lxml/trunk/doc/main.txt Thu May 1 12:28:11 2008
@@ -219,6 +219,8 @@
* `lxml 2.1alpha1`_, released 2008-03-27 (`changes for 2.1alpha1`_)
+* `lxml 2.0.5`_, released 2008-05-01 (`changes for 2.0.5`_)
+
* `lxml 2.0.4`_, released 2008-04-14 (`changes for 2.0.4`_)
* `lxml 2.0.3`_, released 2008-03-26 (`changes for 2.0.3`_)
@@ -281,6 +283,7 @@
.. _`lxml 2.1beta1`: lxml-2.1beta1.tgz
.. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz
+.. _`lxml 2.0.5`: lxml-2.0.5.tgz
.. _`lxml 2.0.4`: lxml-2.0.4.tgz
.. _`lxml 2.0.3`: lxml-2.0.3.tgz
.. _`lxml 2.0.2`: lxml-2.0.2.tgz
@@ -313,6 +316,7 @@
.. _`changes for 2.1beta1`: changes-2.1beta1.html
.. _`changes for 2.1alpha1`: changes-2.1alpha1.html
+.. _`changes for 2.0.5`: changes-2.0.5.html
.. _`changes for 2.0.4`: changes-2.0.4.html
.. _`changes for 2.0.3`: changes-2.0.3.html
.. _`changes for 2.0.2`: changes-2.0.2.html
From scoder at codespeak.net Fri May 2 09:49:51 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 09:49:51 +0200 (CEST)
Subject: [Lxml-checkins] r54311 - in lxml/trunk: . src/lxml/html
src/lxml/html/tests
Message-ID: <20080502074951.5688A498100@codespeak.net>
Author: scoder
Date: Fri May 2 09:49:49 2008
New Revision: 54311
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/html/__init__.py
lxml/trunk/src/lxml/html/tests/test_basic.txt
Log:
r4115 at delle: sbehnel | 2008-05-02 09:48:17 +0200
'parser' keyword in lxml.html parse functions, XHTMLParser class
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 09:49:49 2008
@@ -8,6 +8,12 @@
Features added
--------------
+* All parse functions in lxml.html take a ``parser`` keyword argument.
+
+* lxml.html has a new parser class ``XHTMLParser`` and a module
+ attribute ``xhtml_parser`` that provide XML parsers that are
+ pre-configured for the lxml.html package.
+
Bugs fixed
----------
Modified: lxml/trunk/src/lxml/html/__init__.py
==============================================================================
--- lxml/trunk/src/lxml/html/__init__.py (original)
+++ lxml/trunk/src/lxml/html/__init__.py Fri May 2 09:49:49 2008
@@ -443,14 +443,17 @@
# parsing
################################################################################
-def document_fromstring(html, **kw):
- value = etree.HTML(html, html_parser, **kw)
+def document_fromstring(html, parser=None, **kw):
+ if parser is None:
+ parser = html_parser
+ value = etree.fromstring(html, parser, **kw)
if value is None:
raise etree.ParserError(
"Document is empty")
return value
-def fragments_fromstring(html, no_leading_text=False, base_url=None, **kw):
+def fragments_fromstring(html, no_leading_text=False, base_url=None,
+ parser=None, **kw):
"""
Parses several HTML elements, returning a list of elements.
@@ -461,11 +464,13 @@
base_url will set the document's base_url attribute (and the tree's docinfo.URL)
"""
+ if parser is None:
+ parser = html_parser
# FIXME: check what happens when you give html with a body, head, etc.
start = html[:20].lstrip().lower()
if not start.startswith('%s%s>' % (
- create_parent, html, create_parent), base_url=base_url, **kw)
- elements = fragments_fromstring(html, no_leading_text=True, base_url=base_url, **kw)
+ create_parent, html, create_parent),
+ parser=parser, base_url=base_url, **kw)
+ elements = fragments_fromstring(html, parser=parser, no_leading_text=True,
+ base_url=base_url, **kw)
if not elements:
raise etree.ParserError(
"No elements found")
@@ -512,7 +522,7 @@
el.tail = None
return el
-def fromstring(html, base_url=None, **kw):
+def fromstring(html, base_url=None, parser=None, **kw):
"""
Parse the html, returning a single element/document.
@@ -521,12 +531,14 @@
base_url will set the document's base_url attribute (and the tree's docinfo.URL)
"""
+ if parser is None:
+ parser = html_parser
start = html[:10].lstrip().lower()
if start.startswith('footer
+
+lxml.html has two parsers, one for HTML, one for XHTML:
+
+ >>> from lxml.html import HTMLParser, XHTMLParser
+ >>> html = "
Hi!
"
+
+ >>> root = document_fromstring(html, parser=HTMLParser())
+ >>> print root.tag
+ html
+
+ >>> root = document_fromstring(html, parser=XHTMLParser())
+ >>> print root.tag
+ html
From scoder at codespeak.net Fri May 2 10:13:48 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 10:13:48 +0200 (CEST)
Subject: [Lxml-checkins] r54313 - in lxml/trunk: . doc
Message-ID: <20080502081348.C58B616855E@codespeak.net>
Author: scoder
Date: Fri May 2 10:13:48 2008
New Revision: 54313
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/build.txt
Log:
r4117 at delle: sbehnel | 2008-05-02 10:00:01 +0200
require Cython 0.9.6.14 for lxml 2.1
Modified: lxml/trunk/doc/build.txt
==============================================================================
--- lxml/trunk/doc/build.txt (original)
+++ lxml/trunk/doc/build.txt Fri May 2 10:13:48 2008
@@ -44,10 +44,10 @@
want to be an lxml developer, then you do need a working Cython
installation. You can use EasyInstall_ to install it::
- easy_install Cython==0.9.6.12
+ easy_install Cython==0.9.6.14
-lxml currently requires Cython 0.9.6.12. Any 0.9.6.13 version will not
-work, later versions were not tested.
+lxml currently requires Cython 0.9.6.14, later versions were not
+tested.
Subversion
From scoder at codespeak.net Fri May 2 10:28:05 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 10:28:05 +0200 (CEST)
Subject: [Lxml-checkins] r54315 - lxml/branch/lxml-2.0/doc
Message-ID: <20080502082805.10C124980FC@codespeak.net>
Author: scoder
Date: Fri May 2 10:28:04 2008
New Revision: 54315
Modified:
lxml/branch/lxml-2.0/doc/main.txt
Log:
doc fix
Modified: lxml/branch/lxml-2.0/doc/main.txt
==============================================================================
--- lxml/branch/lxml-2.0/doc/main.txt (original)
+++ lxml/branch/lxml-2.0/doc/main.txt Fri May 2 10:28:04 2008
@@ -146,7 +146,7 @@
source release. If you can't wait, consider trying a less recent
release version first.
-The latest version is `lxml 2.0.5`_, released 2008-04-13
+The latest version is `lxml 2.0.5`_, released 2008-05-01
(`changes for 2.0.5`_). `Older versions`_ are listed below.
Please take a look at the `installation instructions`_!
From scoder at codespeak.net Fri May 2 19:14:58 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:14:58 +0200 (CEST)
Subject: [Lxml-checkins] r54340 - in lxml/trunk: . src/lxml
Message-ID: <20080502171458.378E82A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:14:57 2008
New Revision: 54340
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/apihelpers.pxi
lxml/trunk/src/lxml/lxml.etree.pyx
lxml/trunk/src/lxml/proxy.pxi
lxml/trunk/src/lxml/tree.pxd
Log:
r4119 at delle: sbehnel | 2008-05-02 18:10:34 +0200
re-assign all node name pointers from the target dictionary when moving an element to a new tree of a different thread
Modified: lxml/trunk/src/lxml/apihelpers.pxi
==============================================================================
--- lxml/trunk/src/lxml/apihelpers.pxi (original)
+++ lxml/trunk/src/lxml/apihelpers.pxi Fri May 2 19:14:57 2008
@@ -716,7 +716,7 @@
_moveTail(c_next, c_node)
if not attemptDeallocation(c_node):
# make namespaces absolute
- moveNodeToDocument(doc, c_node)
+ moveNodeToDocument(doc, c_node.doc, c_node)
return 0
cdef void _moveTail(xmlNode* c_tail, xmlNode* c_target):
@@ -782,6 +782,7 @@
"""
cdef xmlNode* c_orig_neighbour
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
cdef _Element element
cdef Py_ssize_t seqlength, i, c
cdef _node_to_node_function next_element
@@ -864,12 +865,13 @@
for element in elements:
assert element is not None, "Node must not be None"
# move element and tail over
+ c_source_doc = element._c_node.doc
c_next = element._c_node.next
tree.xmlAddPrevSibling(c_node, element._c_node)
_moveTail(c_next, element._c_node)
# integrate element into new document
- moveNodeToDocument(parent._doc, element._c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, element._c_node)
# stop at the end of the slice
if slicelength > 0:
@@ -899,7 +901,9 @@
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = child._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -908,7 +912,7 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(parent._doc, c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _prependChild(_Element parent, _Element child) except -1:
"""Prepend a new child to a parent element.
@@ -916,7 +920,9 @@
cdef xmlNode* c_next
cdef xmlNode* c_child
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = child._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -929,14 +935,16 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(parent._doc, c_node)
+ moveNodeToDocument(parent._doc, c_source_doc, c_node)
cdef int _appendSibling(_Element element, _Element sibling) except -1:
"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = sibling._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -944,14 +952,16 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(element._doc, c_node)
+ moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef int _prependSibling(_Element element, _Element sibling) except -1:
"""Append a new child to a parent element.
"""
cdef xmlNode* c_next
cdef xmlNode* c_node
+ cdef xmlDoc* c_source_doc
c_node = sibling._c_node
+ c_source_doc = c_node.doc
# store possible text node
c_next = c_node.next
# move node itself
@@ -959,7 +969,7 @@
_moveTail(c_next, c_node)
# uh oh, elements may be pointing to different doc when
# parent element has moved; change them too..
- moveNodeToDocument(element._doc, c_node)
+ moveNodeToDocument(element._doc, c_source_doc, c_node)
cdef inline int isutf8(char* s):
cdef char c
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Fri May 2 19:14:57 2008
@@ -533,6 +533,7 @@
"""
cdef xmlNode* c_node
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
cdef _Element element
cdef bint left_to_right
cdef Py_ssize_t slicelength, step
@@ -554,13 +555,14 @@
c_node = _findChild(self._c_node, x)
if c_node is NULL:
raise IndexError, "list index out of range"
+ c_source_doc = element._c_node.doc
c_next = element._c_node.next
_removeText(c_node.next)
tree.xmlReplaceNode(c_node, element._c_node)
_moveTail(c_next, element._c_node)
- moveNodeToDocument(self._doc, element._c_node)
+ moveNodeToDocument(self._doc, c_source_doc, element._c_node)
if not attemptDeallocation(c_node):
- moveNodeToDocument(self._doc, c_node)
+ moveNodeToDocument(self._doc, c_node.doc, c_node)
def __delitem__(self, x):
"""__delitem__(self, x)
@@ -707,14 +709,16 @@
"""
cdef xmlNode* c_node
cdef xmlNode* c_next
+ cdef xmlDoc* c_source_doc
c_node = _findChild(self._c_node, index)
if c_node is NULL:
_appendChild(self, element)
return
+ c_source_doc = c_node.doc
c_next = element._c_node.next
tree.xmlAddPrevSibling(c_node, element._c_node)
_moveTail(c_next, element._c_node)
- moveNodeToDocument(self._doc, element._c_node)
+ moveNodeToDocument(self._doc, c_source_doc, element._c_node)
def remove(self, _Element element not None):
"""remove(self, element)
@@ -732,7 +736,7 @@
tree.xmlUnlinkNode(c_node)
_moveTail(c_next, c_node)
# fix namespace declarations
- moveNodeToDocument(self._doc, c_node)
+ moveNodeToDocument(self._doc, c_node.doc, c_node)
def replace(self, _Element old_element not None,
_Element new_element not None):
@@ -744,18 +748,20 @@
cdef xmlNode* c_old_next
cdef xmlNode* c_new_node
cdef xmlNode* c_new_next
+ cdef xmlDoc* c_source_doc
c_old_node = old_element._c_node
if c_old_node.parent is not self._c_node:
raise ValueError, "Element is not a child of this node."
c_old_next = c_old_node.next
c_new_node = new_element._c_node
c_new_next = c_new_node.next
+ c_source_doc = c_new_next.doc
tree.xmlReplaceNode(c_old_node, c_new_node)
_moveTail(c_new_next, c_new_node)
_moveTail(c_old_next, c_old_node)
- moveNodeToDocument(self._doc, c_new_node)
+ moveNodeToDocument(self._doc, c_source_doc, c_new_node)
# fix namespace declarations
- moveNodeToDocument(self._doc, c_old_node)
+ moveNodeToDocument(self._doc, c_old_node.doc, c_old_node)
# PROPERTIES
property tag:
Modified: lxml/trunk/src/lxml/proxy.pxi
==============================================================================
--- lxml/trunk/src/lxml/proxy.pxi (original)
+++ lxml/trunk/src/lxml/proxy.pxi Fri May 2 19:14:57 2008
@@ -276,7 +276,8 @@
c_nsdef[0] = c_ns_next
return 0
-cdef int moveNodeToDocument(_Document doc, xmlNode* c_element) except -1:
+cdef int moveNodeToDocument(_Document doc, xmlDoc* c_source_doc,
+ xmlNode* c_element) except -1:
"""Fix the xmlNs pointers of a node and its subtree that were moved.
Mainly copied from libxml2's xmlReconciliateNs(). Expects libxml2 doc
@@ -293,7 +294,11 @@
prefix). If a namespace is unknown, declare a new one on the
node.
- 3) Set the Document reference to the new Document (if different).
+ 3) Reassign the names of tags and attribute from the dict of the
+ target document *iff* it is different from the dict used in the
+ source subtree.
+
+ 4) Set the Document reference to the new Document (if different).
This is done on backtracking to keep the original Document
alive as long as possible, until all its elements are updated.
@@ -303,16 +308,26 @@
"""
cdef xmlNode* c_start_node
cdef xmlNode* c_node
+ cdef char* c_name
cdef _nscache c_ns_cache
cdef xmlNs* c_ns
cdef xmlNs* c_ns_next
cdef xmlNs* c_nsdef
cdef xmlNs* c_del_ns_list
cdef cstd.size_t i
+ cdef tree.xmlDict* c_dict
if not tree._isElementOrXInclude(c_element):
return 0
+ # we need to copy the names of tags and attributes iff the element
+ # is based on a different libxml2 tag name dictionary
+ if doc._c_doc.dict is not c_source_doc.dict and \
+ doc._c_doc.dict is not NULL and c_source_doc.dict is not NULL:
+ c_dict = doc._c_doc.dict
+ else:
+ c_dict = NULL
+
c_start_node = c_element
c_del_ns_list = NULL
@@ -343,6 +358,13 @@
c_element, c_node.ns.href, c_node.ns.prefix)
_appendToNsCache(&c_ns_cache, c_node.ns, c_ns)
c_node.ns = c_ns
+
+ # 3) re-assign names from the target dict
+ if c_dict is not NULL:
+ c_name = tree.xmlDictLookup(c_dict, c_node.name, -1)
+ if c_name is not NULL:
+ c_element.name = c_name
+
if c_node is c_element:
# after the element, continue with its attributes
c_node = c_element.properties
@@ -358,7 +380,7 @@
if c_node is NULL:
# no children => back off and continue with siblings and parents
- # 3) fix _Document reference (may dealloc the original document!)
+ # 4) fix _Document reference (may dealloc the original document!)
if c_element._private is not NULL:
_updateProxyDocument(c_element, doc)
@@ -376,7 +398,7 @@
if c_element is NULL or not tree._isElementOrXInclude(c_element):
break
- # 3) fix _Document reference (may dealloc the original document!)
+ # 4) fix _Document reference (may dealloc the original document!)
if c_element._private is not NULL:
_updateProxyDocument(c_element, doc)
Modified: lxml/trunk/src/lxml/tree.pxd
==============================================================================
--- lxml/trunk/src/lxml/tree.pxd (original)
+++ lxml/trunk/src/lxml/tree.pxd Fri May 2 19:14:57 2008
@@ -52,12 +52,12 @@
void xmlHashScan(xmlHashTable* table, xmlHashScanner f, void* data) nogil
void* xmlHashLookup(xmlHashTable* table, char* name) nogil
-cdef extern from "libxml/tree.h":
-
- # for some reason need to define this in this section;
+cdef extern from *: # actually "libxml/dict.h"
# libxml/dict.h appears to be broken to include in C
ctypedef struct xmlDict
-
+ cdef char* xmlDictLookup(xmlDict* dict, char* name, int len)
+
+cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
ctypedef struct xmlAttr
ctypedef struct xmlNotationTable
From scoder at codespeak.net Fri May 2 19:15:03 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:03 +0200 (CEST)
Subject: [Lxml-checkins] r54341 - in lxml/trunk: . src/lxml
Message-ID: <20080502171503.51C932A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:02 2008
New Revision: 54341
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/lxml.etree.pyx
Log:
r4120 at delle: sbehnel | 2008-05-02 18:42:35 +0200
typo
Modified: lxml/trunk/src/lxml/lxml.etree.pyx
==============================================================================
--- lxml/trunk/src/lxml/lxml.etree.pyx (original)
+++ lxml/trunk/src/lxml/lxml.etree.pyx Fri May 2 19:15:02 2008
@@ -755,7 +755,7 @@
c_old_next = c_old_node.next
c_new_node = new_element._c_node
c_new_next = c_new_node.next
- c_source_doc = c_new_next.doc
+ c_source_doc = c_new_node.doc
tree.xmlReplaceNode(c_old_node, c_new_node)
_moveTail(c_new_next, c_new_node)
_moveTail(c_old_next, c_old_node)
From scoder at codespeak.net Fri May 2 19:15:07 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:07 +0200 (CEST)
Subject: [Lxml-checkins] r54342 - in lxml/trunk: . src/lxml
Message-ID: <20080502171507.8028D39B593@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:07 2008
New Revision: 54342
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tree.pxd
Log:
r4121 at delle: sbehnel | 2008-05-02 19:10:07 +0200
cleanup
Modified: lxml/trunk/src/lxml/tree.pxd
==============================================================================
--- lxml/trunk/src/lxml/tree.pxd (original)
+++ lxml/trunk/src/lxml/tree.pxd Fri May 2 19:15:07 2008
@@ -55,7 +55,7 @@
cdef extern from *: # actually "libxml/dict.h"
# libxml/dict.h appears to be broken to include in C
ctypedef struct xmlDict
- cdef char* xmlDictLookup(xmlDict* dict, char* name, int len)
+ cdef char* xmlDictLookup(xmlDict* dict, char* name, int len) nogil
cdef extern from "libxml/tree.h":
ctypedef struct xmlDoc
From scoder at codespeak.net Fri May 2 19:15:11 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:11 +0200 (CEST)
Subject: [Lxml-checkins] r54343 - in lxml/trunk: . src/lxml
Message-ID: <20080502171511.A0B4E2A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:11 2008
New Revision: 54343
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/proxy.pxi
Log:
r4122 at delle: sbehnel | 2008-05-02 19:11:59 +0200
fix and simplification
Modified: lxml/trunk/src/lxml/proxy.pxi
==============================================================================
--- lxml/trunk/src/lxml/proxy.pxi (original)
+++ lxml/trunk/src/lxml/proxy.pxi Fri May 2 19:15:11 2008
@@ -322,8 +322,7 @@
# we need to copy the names of tags and attributes iff the element
# is based on a different libxml2 tag name dictionary
- if doc._c_doc.dict is not c_source_doc.dict and \
- doc._c_doc.dict is not NULL and c_source_doc.dict is not NULL:
+ if doc._c_doc.dict is not c_source_doc.dict:
c_dict = doc._c_doc.dict
else:
c_dict = NULL
@@ -362,8 +361,10 @@
# 3) re-assign names from the target dict
if c_dict is not NULL:
c_name = tree.xmlDictLookup(c_dict, c_node.name, -1)
+ # c_name can be NULL on memory error, but we don't
+ # handle that here
if c_name is not NULL:
- c_element.name = c_name
+ c_node.name = c_name
if c_node is c_element:
# after the element, continue with its attributes
From scoder at codespeak.net Fri May 2 19:15:15 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:15:15 +0200 (CEST)
Subject: [Lxml-checkins] r54344 - lxml/trunk
Message-ID: <20080502171515.58D732A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:15:15 2008
New Revision: 54344
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
Log:
r4123 at delle: sbehnel | 2008-05-02 19:13:17 +0200
changelog
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 19:15:15 2008
@@ -17,6 +17,10 @@
Bugs fixed
----------
+* Moving a subtree from a document created in one thread into a
+ document of another thread could crash when the rest of the source
+ document is deleted while the subtree is still in use.
+
* Passing an nsmap when creating an Element will no longer strip
redundantly defined namespace URIs. This prevented the definition
of more than one prefix for a namespace on the same Element.
From scoder at codespeak.net Fri May 2 19:59:59 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 19:59:59 +0200 (CEST)
Subject: [Lxml-checkins] r54345 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080502175959.1362C2A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 19:59:57 2008
New Revision: 54345
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/tests/test_elementtree.py
Log:
r4130 at delle: sbehnel | 2008-05-02 19:55:34 +0200
cleanup
Modified: lxml/trunk/src/lxml/tests/test_elementtree.py
==============================================================================
--- lxml/trunk/src/lxml/tests/test_elementtree.py (original)
+++ lxml/trunk/src/lxml/tests/test_elementtree.py Fri May 2 19:59:57 2008
@@ -8,7 +8,7 @@
for IO related test cases.
"""
-import unittest, doctest
+import unittest
import os, re, tempfile, copy, operator, gc
from common_imports import StringIO, etree, ElementTree, cElementTree
From scoder at codespeak.net Fri May 2 20:00:03 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:00:03 +0200 (CEST)
Subject: [Lxml-checkins] r54346 - in lxml/trunk: . src/lxml/tests
Message-ID: <20080502180003.C61222A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 20:00:03 2008
New Revision: 54346
Added:
lxml/trunk/src/lxml/tests/test_threading.py
Modified:
lxml/trunk/ (props changed)
Log:
r4131 at delle: sbehnel | 2008-05-02 19:58:26 +0200
new test suite for threading tests
Added: lxml/trunk/src/lxml/tests/test_threading.py
==============================================================================
--- (empty file)
+++ lxml/trunk/src/lxml/tests/test_threading.py Fri May 2 20:00:03 2008
@@ -0,0 +1,38 @@
+# -*- coding: utf-8 -*-
+
+"""
+Tests for thread usage in lxml.etree.
+"""
+
+import unittest, threading
+
+from common_imports import etree, HelperTestCase
+
+class ThreadingTestCase(HelperTestCase):
+ """Threading tests"""
+ etree = etree
+
+ def test_subtree_copy(self):
+ tostring = self.etree.tostring
+ XML = self.etree.XML
+ xml = ""
+ main_root = XML("")
+
+ def run_thread():
+ thread_root = XML(xml)
+ main_root.append(thread_root[0])
+ del thread_root
+
+ thread = threading.Thread(target=run_thread)
+ thread.start()
+ thread.join()
+
+ self.assertEquals(xml, tostring(main_root))
+
+def test_suite():
+ suite = unittest.TestSuite()
+ suite.addTests([unittest.makeSuite(ThreadingTestCase)])
+ return suite
+
+if __name__ == '__main__':
+ print 'to test use test.py %s' % __file__
From scoder at codespeak.net Fri May 2 20:12:28 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:12:28 +0200 (CEST)
Subject: [Lxml-checkins] r54349 - in lxml/trunk: . doc
Message-ID: <20080502181228.7C57B169E74@codespeak.net>
Author: scoder
Date: Fri May 2 20:12:28 2008
New Revision: 54349
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/FAQ.txt
Log:
r4134 at delle: sbehnel | 2008-05-02 20:10:19 +0200
relieve FAQ on threading from 'big fat warning'
Modified: lxml/trunk/doc/FAQ.txt
==============================================================================
--- lxml/trunk/doc/FAQ.txt (original)
+++ lxml/trunk/doc/FAQ.txt Fri May 2 20:12:28 2008
@@ -565,29 +565,32 @@
Can I use threads to concurrently access the lxml API?
------------------------------------------------------
-Yes, although not carelessly.
+Short answer: yes, if you use lxml 2.1 and later.
-lxml frees the GIL (Python's global interpreter lock) internally when parsing
-from disk and memory, as long as you use either the default parser (which is
-replicated for each thread) or create a parser for each thread yourself. lxml
-also allows concurrency during validation (RelaxNG and XMLSchema) and XSL
-transformation. You can share RelaxNG, XMLSchema and (with restrictions) XSLT
-objects between threads. While you can also share parsers between threads,
-this will serialize the access to each of them, so it is better to ``copy()``
-parsers or to just use the default parser (which is automatically copied for
-each thread).
+Since version 1.1, lxml frees the GIL (Python's global interpreter
+lock) internally when parsing from disk and memory, as long as you use
+either the default parser (which is replicated for each thread) or
+create a parser for each thread yourself. lxml also allows
+concurrency during validation (RelaxNG and XMLSchema) and XSL
+transformation. You can share RelaxNG, XMLSchema and (with
+restrictions) XSLT objects between threads. While you can also share
+parsers between threads, this will serialize the access to each of
+them, so it is better to ``.copy()`` parsers or to just use the
+default parser if you do not need any special configuration.
Due to the way libxslt handles threading, applying a stylesheets is
most efficient if it was parsed in the same thread that executes it.
One way to achieve this is by caching stylesheets in thread-local
storage.
-Warning: You should generally avoid modifying trees in other threads than the
-one it was generated in. Although this should work in many cases, there are
-certain scenarios where the termination of a thread that parsed a tree can
-crash the application if subtrees of this tree were moved to other documents.
-You should be on the safe side when passing trees between threads if you
-either
+Warning: Before lxml 2.1, there were issues when moving subtrees
+between different threads. If you need code to run with older
+versions, you should generally avoid modifying trees in other threads
+than the one it was generated in. Although this should work in many
+cases, there are certain scenarios where the termination of a thread
+that parsed a tree can crash the application if subtrees of this tree
+were moved to other documents. You should be on the safe side when
+passing trees between threads if you either
a) do not modify these trees and do not move their elements to other trees, or
b) do not terminate threads while the trees they parsed are still in use
From scoder at codespeak.net Fri May 2 20:12:33 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:12:33 +0200 (CEST)
Subject: [Lxml-checkins] r54350 - in lxml/trunk: . doc
Message-ID: <20080502181233.B3E33169E77@codespeak.net>
Author: scoder
Date: Fri May 2 20:12:33 2008
New Revision: 54350
Modified:
lxml/trunk/ (props changed)
lxml/trunk/doc/main.txt
lxml/trunk/version.txt
Log:
r4135 at delle: sbehnel | 2008-05-02 20:10:38 +0200
pre-release changes
Modified: lxml/trunk/doc/main.txt
==============================================================================
--- lxml/trunk/doc/main.txt (original)
+++ lxml/trunk/doc/main.txt Fri May 2 20:12:33 2008
@@ -146,8 +146,8 @@
source release. If you can't wait, consider trying a less recent
release version first.
-The latest version is `lxml 2.1beta1`_, released 2008-04-15
-(`changes for 2.1beta1`_). `Older versions`_ are listed below.
+The latest version is `lxml 2.1beta2`_, released 2008-05-02
+(`changes for 2.1beta2`_). `Older versions`_ are listed below.
Please take a look at the `installation instructions`_!
@@ -215,7 +215,9 @@
Old Versions
------------
-.. _`PDF documentation`: lxmldoc-2.1beta1.pdf
+.. _`PDF documentation`: lxmldoc-2.1beta2.pdf
+
+* `lxml 2.1beta1`_, released 2008-04-15 (`changes for 2.1beta1`_)
* `lxml 2.1alpha1`_, released 2008-03-27 (`changes for 2.1alpha1`_)
@@ -281,6 +283,7 @@
* `lxml 0.5`_, released 2005-04-08
+.. _`lxml 2.1beta2`: lxml-2.1beta2.tgz
.. _`lxml 2.1beta1`: lxml-2.1beta1.tgz
.. _`lxml 2.1alpha1`: lxml-2.1alpha1.tgz
.. _`lxml 2.0.5`: lxml-2.0.5.tgz
@@ -314,6 +317,7 @@
.. _`lxml 0.5.1`: lxml-0.5.1.tgz
.. _`lxml 0.5`: lxml-0.5.tgz
+.. _`changes for 2.1beta2`: changes-2.1beta2.html
.. _`changes for 2.1beta1`: changes-2.1beta1.html
.. _`changes for 2.1alpha1`: changes-2.1alpha1.html
.. _`changes for 2.0.5`: changes-2.0.5.html
Modified: lxml/trunk/version.txt
==============================================================================
--- lxml/trunk/version.txt (original)
+++ lxml/trunk/version.txt Fri May 2 20:12:33 2008
@@ -1 +1 @@
-2.1beta1
+2.1beta2
From scoder at codespeak.net Fri May 2 20:18:01 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 20:18:01 +0200 (CEST)
Subject: [Lxml-checkins] r54351 - lxml/trunk
Message-ID: <20080502181801.CBD87169E78@codespeak.net>
Author: scoder
Date: Fri May 2 20:18:01 2008
New Revision: 54351
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
Log:
r4138 at delle: sbehnel | 2008-05-02 20:16:32 +0200
pre-release changes
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 20:18:01 2008
@@ -2,8 +2,8 @@
lxml changelog
==============
-Under development
-=================
+2.1beta2 (2008-05-02)
+=====================
Features added
--------------
From scoder at codespeak.net Fri May 2 21:56:30 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 21:56:30 +0200 (CEST)
Subject: [Lxml-checkins] r54352 - in lxml/trunk: . src/lxml/html
Message-ID: <20080502195630.2D9A4169E4C@codespeak.net>
Author: scoder
Date: Fri May 2 21:56:28 2008
New Revision: 54352
Modified:
lxml/trunk/ (props changed)
lxml/trunk/src/lxml/html/defs.py
Log:
r4140 at delle: sbehnel | 2008-05-02 21:46:15 +0200
use sets instead of lists in defs.py as most use cases only test for containment
Modified: lxml/trunk/src/lxml/html/defs.py
==============================================================================
--- lxml/trunk/src/lxml/html/defs.py (original)
+++ lxml/trunk/src/lxml/html/defs.py Fri May 2 21:56:28 2008
@@ -4,34 +4,40 @@
# Data taken from http://www.w3.org/TR/html401/index/elements.html
-empty_tags = [
+try:
+ frozenset
+except NameError:
+ from sets import Set as frozenset
+
+
+empty_tags = frozenset([
'area', 'base', 'basefont', 'br', 'col', 'frame', 'hr',
- 'img', 'input', 'isindex', 'link', 'meta', 'param']
+ 'img', 'input', 'isindex', 'link', 'meta', 'param'])
-deprecated_tags = [
+deprecated_tags = frozenset([
'applet', 'basefont', 'center', 'dir', 'font', 'isindex',
- 'menu', 's', 'strike', 'u']
+ 'menu', 's', 'strike', 'u'])
# archive actually takes a space-separated list of URIs
-link_attrs = [
+link_attrs = frozenset([
'action', 'archive', 'background', 'cite', 'classid',
'codebase', 'data', 'href', 'longdesc', 'profile', 'src',
'usemap',
# Not standard:
'dynsrc', 'lowsrc',
- ]
+ ])
# Not in the HTML 4 spec:
# onerror, onresize
-event_attrs = [
+event_attrs = frozenset([
'onblur', 'onchange', 'onclick', 'ondblclick', 'onerror',
'onfocus', 'onkeydown', 'onkeypress', 'onkeyup', 'onload',
'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover',
'onmouseup', 'onreset', 'onresize', 'onselect', 'onsubmit',
'onunload',
- ]
+ ])
-safe_attrs = [
+safe_attrs = frozenset([
'abbr', 'accept', 'accept-charset', 'accesskey', 'action', 'align',
'alt', 'axis', 'border', 'cellpadding', 'cellspacing', 'char', 'charoff',
'charset', 'checked', 'cite', 'class', 'clear', 'cols', 'colspan',
@@ -41,18 +47,18 @@
'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly',
'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape',
'size', 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title',
- 'type', 'usemap', 'valign', 'value', 'vspace', 'width']
+ 'type', 'usemap', 'valign', 'value', 'vspace', 'width'])
# From http://htmlhelp.com/reference/html40/olist.html
-top_level_tags = [
+top_level_tags = frozenset([
'html', 'head', 'body', 'frameset',
- ]
+ ])
-head_tags = [
+head_tags = frozenset([
'base', 'isindex', 'link', 'meta', 'script', 'style', 'title',
- ]
+ ])
-general_block_tags = [
+general_block_tags = frozenset([
'address',
'blockquote',
'center',
@@ -70,51 +76,51 @@
'noscript',
'p',
'pre',
- ]
+ ])
-list_tags = [
+list_tags = frozenset([
'dir', 'dl', 'dt', 'dd', 'li', 'menu', 'ol', 'ul',
- ]
+ ])
-table_tags = [
+table_tags = frozenset([
'table', 'caption', 'colgroup', 'col',
'thead', 'tfoot', 'tbody', 'tr', 'td', 'th',
- ]
+ ])
# just this one from
# http://www.georgehernandez.com/h/XComputers/HTML/2BlockLevel.htm
-block_tags = general_block_tags + list_tags + table_tags + [
+block_tags = general_block_tags | list_tags | table_tags | frozenset([
# Partial form tags
'fieldset', 'form', 'legend', 'optgroup', 'option',
- ]
+ ])
-form_tags = [
+form_tags = frozenset([
'form', 'button', 'fieldset', 'legend', 'input', 'label',
'select', 'optgroup', 'option', 'textarea',
- ]
+ ])
-special_inline_tags = [
+special_inline_tags = frozenset([
'a', 'applet', 'basefont', 'bdo', 'br', 'embed', 'font', 'iframe',
'img', 'map', 'area', 'object', 'param', 'q', 'script',
'span', 'sub', 'sup',
- ]
+ ])
-phrase_tags = [
+phrase_tags = frozenset([
'abbr', 'acronym', 'cite', 'code', 'del', 'dfn', 'em',
'ins', 'kbd', 'samp', 'strong', 'var',
- ]
+ ])
-font_style_tags = [
+font_style_tags = frozenset([
'b', 'big', 'i', 's', 'small', 'strike', 'tt', 'u',
- ]
+ ])
-frame_tags = [
+frame_tags = frozenset([
'frameset', 'frame', 'noframes',
- ]
+ ])
# These tags aren't standard
-nonstandard_tags = ['blink', 'marque']
+nonstandard_tags = frozenset(['blink', 'marque'])
-tags = (top_level_tags + head_tags + general_block_tags + list_tags
- + table_tags + form_tags + special_inline_tags + phrase_tags
- + font_style_tags + nonstandard_tags)
+tags = (top_level_tags | head_tags | general_block_tags | list_tags
+ | table_tags | form_tags | special_inline_tags | phrase_tags
+ | font_style_tags | nonstandard_tags)
From scoder at codespeak.net Fri May 2 21:56:35 2008
From: scoder at codespeak.net (scoder at codespeak.net)
Date: Fri, 2 May 2008 21:56:35 +0200 (CEST)
Subject: [Lxml-checkins] r54353 - in lxml/trunk: . src/lxml/html
Message-ID: <20080502195635.5C6BD2A00DB@codespeak.net>
Author: scoder
Date: Fri May 2 21:56:34 2008
New Revision: 54353
Modified:
lxml/trunk/ (props changed)
lxml/trunk/CHANGES.txt
lxml/trunk/src/lxml/html/__init__.py
lxml/trunk/src/lxml/html/clean.py
lxml/trunk/src/lxml/html/formfill.py
Log:
r4141 at delle: sbehnel | 2008-05-02 21:47:32 +0200
support XHTML tags in XPath expressions of lxml.html
Modified: lxml/trunk/CHANGES.txt
==============================================================================
--- lxml/trunk/CHANGES.txt (original)
+++ lxml/trunk/CHANGES.txt Fri May 2 21:56:34 2008
@@ -2,6 +2,21 @@
lxml changelog
==============
+Under development
+=================
+
+Features added
+--------------
+
+* Most features in lxml.html work for XHTML namespaced tag names.
+
+Bugs fixed
+----------
+
+Other changes
+-------------
+
+
2.1beta2 (2008-05-02)
=====================
Modified: lxml/trunk/src/lxml/html/__init__.py
==============================================================================
--- lxml/trunk/src/lxml/html/__init__.py (original)
+++ lxml/trunk/src/lxml/html/__init__.py Fri May 2 21:56:34 2008
@@ -22,16 +22,30 @@
'find_rel_links', 'find_class', 'make_links_absolute',
'resolve_base_href', 'iterlinks', 'rewrite_links', 'open_in_browser']
-_rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]")
+XHTML_NAMESPACE = "http://www.w3.org/1999/xhtml"
+
+_rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]|descendant-or-self::x:a[@rel]",
+ namespaces={'x':XHTML_NAMESPACE})
+_options_xpath = etree.XPath("descendant-or-self::option|descendant-or-self::x:option",
+ namespaces={'x':XHTML_NAMESPACE})
+_forms_xpath = etree.XPath("descendant-or-self::form|descendant-or-self::x:form",
+ namespaces={'x':XHTML_NAMESPACE})
#_class_xpath = etree.XPath(r"descendant-or-self::*[regexp:match(@class, concat('\b', $class_name, '\b'))]", {'regexp': 'http://exslt.org/regular-expressions'})
_class_xpath = etree.XPath("descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), concat(' ', $class_name, ' '))]")
_id_xpath = etree.XPath("descendant-or-self::*[@id=$id]")
_collect_string_content = etree.XPath("string()")
_css_url_re = re.compile(r'url\((.*?)\)', re.I)
_css_import_re = re.compile(r'@import "(.*?)"')
-_label_xpath = etree.XPath("//label[@for=$id]")
+_label_xpath = etree.XPath("//label[@for=$id]|//x:label[@for=$id]",
+ namespaces={'x':XHTML_NAMESPACE})
_archive_re = re.compile(r'[^ ]+')
+def _nons(tag):
+ if isinstance(tag, basestring):
+ if tag[0] == '{' and tag[1:len(XHTML_NAMESPACE)+1] == XHTML_NAMESPACE:
+ return tag.split('}')[-1]
+ return tag
+
class HtmlMixin(object):
def base_url(self):
@@ -48,7 +62,7 @@
"""
Return a list of all the forms
"""
- return list(self.getiterator('form'))
+ return _forms_xpath(self)
forms = property(forms, doc=forms.__doc__)
def body(self):
@@ -56,7 +70,7 @@
Return the element. Can be called from a child element
to get the document's head.
"""
- return self.xpath('//body')[0]
+ return self.xpath('//body|//x:body', namespaces={'x':XHTML_NAMESPACE})[0]
body = property(body, doc=body.__doc__)
def head(self):
@@ -64,7 +78,7 @@
Returns the element. Can be called from a child
element to get the document's head.
"""
- return self.xpath('//head')[0]
+ return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0]
head = property(head, doc=head.__doc__)
def _label__get(self):
@@ -85,7 +99,7 @@
raise TypeError(
"You cannot set a label for an element (%r) that has no id"
% self)
- if not label.tag == 'label':
+ if _nons(label.tag) != 'label':
raise TypeError(
"You can only assign label to a label element (not %r)"
% label)
@@ -228,7 +242,7 @@
tag once it has been applied.
"""
base_href = None
- basetags = self.xpath('//base[@href]')
+ basetags = self.xpath('//base[@href]|//x:base[@href]', namespaces={'x':XHTML_NAMESPACE})
for b in basetags:
base_href = b.get('href')
b.drop_tree()
@@ -249,11 +263,12 @@
link_attrs = defs.link_attrs
for el in self.getiterator():
attribs = el.attrib
- if el.tag != 'object':
+ tag = _nons(el.tag)
+ if tag != 'object':
for attrib in link_attrs:
if attrib in attribs:
yield (el, attrib, attribs[attrib], 0)
- elif el.tag == 'object':
+ elif tag == 'object':
codebase = None
##