From scoder at codespeak.net Tue Mar 2 10:30:30 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 10:30:30 +0100 (CET) Subject: [Lxml-checkins] r71612 - in lxml/trunk: . doc Message-ID: <20100302093030.91C1D282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 10:30:27 2010 New Revision: 71612 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/performance.txt Log: r5489 at lenny: sbehnel | 2010-03-02 08:49:20 +0100 updated performance numbers Modified: lxml/trunk/doc/performance.txt ============================================================================== --- lxml/trunk/doc/performance.txt (original) +++ lxml/trunk/doc/performance.txt Tue Mar 2 10:30:27 2010 @@ -86,15 +86,15 @@ a specific part of the API yourself, please consider sending it to the lxml mailing list. -The timings cited below compare lxml 2.2 (with libxml2 2.7.3) to the -February 2009 SVN versions of ElementTree (1.3alpha2) and cElementTree -(1.0.6). They were run single-threaded on a 1.8GHz Intel Core Duo -machine under Ubuntu Linux 8.10 (Intrepid). The C libraries were +The timings cited below compare lxml 2.3 (with libxml2 2.7.6) to the +latest developer versions of ElementTree (1.3beta2) and cElementTree +(1.0.6a3). They were run single-threaded on a 2.5GHz 64bit Intel Core +Duo machine under Ubuntu Linux 9.10 (Karmic). The C libraries were compiled with the same platform specific optimisation flags. The -Python interpreter (2.6.1) was manually compiled for the platform. -Note that many of the following ElementTree timings are therefore -better then what a normal Python installation with the standard -library (c)ElementTree modules would yield. +Python interpreter (2.6.4) was also manually compiled for the +platform. Note that many of the following ElementTree timings are +therefore better then what a normal Python installation with the +standard library (c)ElementTree modules would yield. .. _`bench_etree.py`: http://codespeak.net/svn/lxml/trunk/benchmark/bench_etree.py .. _`bench_xpath.py`: http://codespeak.net/svn/lxml/trunk/benchmark/bench_xpath.py @@ -135,105 +135,144 @@ 1.2 (which is part of the standard library since Python 2.5), lxml is still more than 7 times as fast as the much improved ElementTree 1.3:: - lxe: tostring_utf16 (SATR T1) 22.4042 msec/pass - cET: tostring_utf16 (SATR T1) 184.5090 msec/pass - ET : tostring_utf16 (SATR T1) 182.4350 msec/pass - - lxe: tostring_utf16 (UATR T1) 23.1769 msec/pass - cET: tostring_utf16 (UATR T1) 188.6780 msec/pass - ET : tostring_utf16 (UATR T1) 186.7781 msec/pass - - lxe: tostring_utf16 (S-TR T2) 21.8501 msec/pass - cET: tostring_utf16 (S-TR T2) 200.0139 msec/pass - ET : tostring_utf16 (S-TR T2) 190.8720 msec/pass - - lxe: tostring_utf8 (S-TR T2) 17.1690 msec/pass - cET: tostring_utf8 (S-TR T2) 192.3709 msec/pass - ET : tostring_utf8 (S-TR T2) 189.7140 msec/pass - - lxe: tostring_utf8 (U-TR T3) 4.9832 msec/pass - cET: tostring_utf8 (U-TR T3) 60.2911 msec/pass - ET : tostring_utf8 (U-TR T3) 57.8101 msec/pass - -The same applies to plain text serialisation. Note that cElementTree -does not currently support this, as it is a new feature in ET 1.3 and -lxml.etree 2.0:: - - lxe: tostring_text_ascii (S-TR T1) 4.3709 msec/pass - ET : tostring_text_ascii (S-TR T1) 83.9939 msec/pass - - lxe: tostring_text_ascii (S-TR T3) 1.3590 msec/pass - ET : tostring_text_ascii (S-TR T3) 26.6340 msec/pass - - lxe: tostring_text_utf16 (S-TR T1) 6.2978 msec/pass - ET : tostring_text_utf16 (S-TR T1) 84.7399 msec/pass - - lxe: tostring_text_utf16 (U-TR T1) 7.7510 msec/pass - ET : tostring_text_utf16 (U-TR T1) 79.9279 msec/pass + lxe: tostring_utf16 (S-TR T1) 9.8219 msec/pass + cET: tostring_utf16 (S-TR T1) 88.7740 msec/pass + ET : tostring_utf16 (S-TR T1) 99.6690 msec/pass + + lxe: tostring_utf16 (UATR T1) 10.3750 msec/pass + cET: tostring_utf16 (UATR T1) 90.7581 msec/pass + ET : tostring_utf16 (UATR T1) 102.3569 msec/pass + + lxe: tostring_utf16 (S-TR T2) 10.2711 msec/pass + cET: tostring_utf16 (S-TR T2) 93.5340 msec/pass + ET : tostring_utf16 (S-TR T2) 105.8500 msec/pass + + lxe: tostring_utf8 (S-TR T2) 7.1261 msec/pass + cET: tostring_utf8 (S-TR T2) 93.4091 msec/pass + ET : tostring_utf8 (S-TR T2) 105.5419 msec/pass + + lxe: tostring_utf8 (U-TR T3) 1.4591 msec/pass + cET: tostring_utf8 (U-TR T3) 29.6180 msec/pass + ET : tostring_utf8 (U-TR T3) 31.9080 msec/pass + +The same applies to plain text serialisation. Note that the +cElementTree version in the standard library does not currently +support this, as it is a new feature in ET 1.3 and lxml.etree 2.0:: + + lxe: tostring_text_ascii (S-TR T1) 1.9400 msec/pass + cET: tostring_text_ascii (S-TR T1) 41.6231 msec/pass + ET : tostring_text_ascii (S-TR T1) 52.7501 msec/pass + + lxe: tostring_text_ascii (S-TR T3) 0.5331 msec/pass + cET: tostring_text_ascii (S-TR T3) 12.9712 msec/pass + ET : tostring_text_ascii (S-TR T3) 15.3620 msec/pass + + lxe: tostring_text_utf16 (S-TR T1) 3.2430 msec/pass + cET: tostring_text_utf16 (S-TR T1) 41.9259 msec/pass + ET : tostring_text_utf16 (S-TR T1) 53.4091 msec/pass + + lxe: tostring_text_utf16 (U-TR T1) 3.6838 msec/pass + cET: tostring_text_utf16 (U-TR T1) 38.7859 msec/pass + ET : tostring_text_utf16 (U-TR T1) 50.8440 msec/pass Unlike ElementTree, the ``tostring()`` function in lxml also supports serialisation to a Python unicode string object:: - lxe: tostring_text_unicode (S-TR T1) 4.6940 msec/pass - lxe: tostring_text_unicode (U-TR T1) 6.3069 msec/pass - lxe: tostring_text_unicode (S-TR T3) 1.3652 msec/pass - lxe: tostring_text_unicode (U-TR T3) 2.0702 msec/pass - -For parsing, on the other hand, the advantage is clearly with -cElementTree. The (c)ET libraries use a very thin layer on top of the -expat parser, which is known to be extremely fast:: - - lxe: parse_stringIO (SAXR T1) 50.0100 msec/pass - cET: parse_stringIO (SAXR T1) 19.3238 msec/pass - ET : parse_stringIO (SAXR T1) 318.2330 msec/pass - - lxe: parse_stringIO (S-XR T3) 6.1851 msec/pass - cET: parse_stringIO (S-XR T3) 5.7080 msec/pass - ET : parse_stringIO (S-XR T3) 83.5931 msec/pass - - lxe: parse_stringIO (UAXR T3) 34.4319 msec/pass - cET: parse_stringIO (UAXR T3) 28.8520 msec/pass - ET : parse_stringIO (UAXR T3) 164.5968 msec/pass - -While about as fast for smaller documents, the expat parser allows cET -to be up to 2 times faster than lxml on plain parser performance for -large input documents. Similar timings can be observed for the -``iterparse()`` function:: - - lxe: iterparse_stringIO (SAXR T1) 57.8308 msec/pass - cET: iterparse_stringIO (SAXR T1) 23.8140 msec/pass - ET : iterparse_stringIO (SAXR T1) 349.5209 msec/pass - - lxe: iterparse_stringIO (UAXR T3) 37.2162 msec/pass - cET: iterparse_stringIO (UAXR T3) 30.2329 msec/pass - ET : iterparse_stringIO (UAXR T3) 171.4060 msec/pass + lxe: tostring_text_unicode (S-TR T1) 2.4869 msec/pass + lxe: tostring_text_unicode (U-TR T1) 3.0370 msec/pass + lxe: tostring_text_unicode (S-TR T3) 0.6518 msec/pass + lxe: tostring_text_unicode (U-TR T3) 0.7300 msec/pass + +For parsing, lxml.etree and cElementTree compete for the medal. +Depending on the input, either of the two can be faster. The (c)ET +libraries use a very thin layer on top of the expat parser, which is +known to be very fast. Here are some timings from the benchmarking +suite:: + + lxe: parse_stringIO (SAXR T1) 19.9990 msec/pass + cET: parse_stringIO (SAXR T1) 8.4970 msec/pass + ET : parse_stringIO (SAXR T1) 183.9781 msec/pass + + lxe: parse_stringIO (S-XR T3) 2.0790 msec/pass + cET: parse_stringIO (S-XR T3) 2.7430 msec/pass + ET : parse_stringIO (S-XR T3) 47.4229 msec/pass + + lxe: parse_stringIO (UAXR T3) 11.1630 msec/pass + cET: parse_stringIO (UAXR T3) 15.0940 msec/pass + ET : parse_stringIO (UAXR T3) 92.6890 msec/pass + +And another couple of timings `from a benchmark`_ that Fredrik Lundh +`used to promote cElementTree`_, comparing a number of different +parsers. First, parsing a 280KB XML file containing Shakespeare's +Hamlet:: + + lxml.etree.parse done in 0.005 seconds + cElementTree.parse done in 0.012 seconds + elementtree.ElementTree.parse done in 0.136 seconds + elementtree.XMLTreeBuilder: 6636 nodes read in 0.243 seconds + elementtree.SimpleXMLTreeBuilder: 6636 nodes read in 0.314 seconds + elementtree.SgmlopXMLTreeBuilder: 6636 nodes read in 0.104 seconds + minidom tree read in 0.137 seconds + +And a 3.4MB XML file containing the Old Testament:: + + lxml.etree.parse done in 0.031 seconds + cElementTree.parse done in 0.039 seconds + elementtree.ElementTree.parse done in 0.537 seconds + elementtree.XMLTreeBuilder: 25317 nodes read in 0.577 seconds + elementtree.SimpleXMLTreeBuilder: 25317 nodes read in 1.265 seconds + elementtree.SgmlopXMLTreeBuilder: 25317 nodes read in 0.331 seconds + minidom tree read in 0.643 seconds + +.. _`from a benchmark`: http://svn.effbot.org/public/elementtree-1.3/benchmark.py +.. _`used to promote cElementTree`: http://effbot.org/zone/celementtree.htm#benchmarks + +For plain parser performance, lxml.etree and cElementTree tend to stay +rather close to each other, usually within a factor of two, with +winners well distributed over both sides. Similar timings can be +observed for the ``iterparse()`` function:: + + lxe: iterparse_stringIO (SAXR T1) 24.8621 msec/pass + cET: iterparse_stringIO (SAXR T1) 17.3280 msec/pass + ET : iterparse_stringIO (SAXR T1) 199.1270 msec/pass + + lxe: iterparse_stringIO (UAXR T3) 12.3630 msec/pass + cET: iterparse_stringIO (UAXR T3) 17.5190 msec/pass + ET : iterparse_stringIO (UAXR T3) 95.8610 msec/pass However, if you benchmark the complete round-trip of a serialise-parse cycle, the numbers will look similar to these:: - lxe: write_utf8_parse_stringIO (S-TR T1) 60.2388 msec/pass - cET: write_utf8_parse_stringIO (S-TR T1) 314.9750 msec/pass - ET : write_utf8_parse_stringIO (S-TR T1) 616.4260 msec/pass - - lxe: write_utf8_parse_stringIO (UATR T2) 71.7540 msec/pass - cET: write_utf8_parse_stringIO (UATR T2) 364.4099 msec/pass - ET : write_utf8_parse_stringIO (UATR T2) 684.5109 msec/pass - - lxe: write_utf8_parse_stringIO (S-TR T3) 10.7441 msec/pass - cET: write_utf8_parse_stringIO (S-TR T3) 103.3869 msec/pass - ET : write_utf8_parse_stringIO (S-TR T3) 179.5921 msec/pass - - lxe: write_utf8_parse_stringIO (SATR T4) 1.1981 msec/pass - cET: write_utf8_parse_stringIO (SATR T4) 7.0901 msec/pass - ET : write_utf8_parse_stringIO (SATR T4) 10.4899 msec/pass + lxe: write_utf8_parse_stringIO (S-TR T1) 27.5791 msec/pass + cET: write_utf8_parse_stringIO (S-TR T1) 158.9060 msec/pass + ET : write_utf8_parse_stringIO (S-TR T1) 347.8320 msec/pass + + lxe: write_utf8_parse_stringIO (UATR T2) 34.4141 msec/pass + cET: write_utf8_parse_stringIO (UATR T2) 187.7041 msec/pass + ET : write_utf8_parse_stringIO (UATR T2) 388.9449 msec/pass + + lxe: write_utf8_parse_stringIO (S-TR T3) 3.7861 msec/pass + cET: write_utf8_parse_stringIO (S-TR T3) 52.4600 msec/pass + ET : write_utf8_parse_stringIO (S-TR T3) 101.4550 msec/pass + + lxe: write_utf8_parse_stringIO (SATR T4) 0.5522 msec/pass + cET: write_utf8_parse_stringIO (SATR T4) 3.8941 msec/pass + ET : write_utf8_parse_stringIO (SATR T4) 5.9431 msec/pass For applications that require a high parser throughput of large files, -and that do little to no serialization, cET is the best choice. Also -for iterparse applications that extract small amounts of data or -aggregate information from large XML data sets that do not fit into -memory. If it comes to round-trip performance, however, lxml tends to -be multiple times faster in total. So, whenever the input documents -are not considerably larger than the output, lxml is the clear winner. +and that do little to no serialization, both cET and lxml.etree are a +good choice. The cET library is particularly fast for iterparse +applications that extract small amounts of data or aggregate +information from large XML data sets that do not fit into memory. If +it comes to round-trip performance, however, lxml is multiple times +faster in total. So, whenever the input documents are not +considerably larger than the output, lxml is the clear winner. + +Again, note that the cET/ET timings are not based on the standard +library versions in Python 2.6, but on wastly improved, unreleased +developer versions. Especially the serialiser in the standard library +modules is several times slower than the benchmarked one, and at least +20 times slower than the one in lxml.etree. Regarding HTML parsing, Ian Bicking has done some `benchmarking on lxml's HTML parser`_, comparing it to a number of other famous HTML @@ -279,24 +318,24 @@ restructuring. This can be seen from the tree setup times of the benchmark (given in seconds):: - lxe: -- S- U- -A SA UA - T1: 0.0502 0.0572 0.0613 0.0494 0.0575 0.0615 - T2: 0.0602 0.0691 0.0747 0.0651 0.0745 0.0796 - T3: 0.0145 0.0157 0.0176 0.0392 0.0411 0.0415 - T4: 0.0003 0.0003 0.0003 0.0008 0.0008 0.0008 - cET: -- S- U- -A SA UA - T1: 0.0092 0.0094 0.0094 0.0094 0.0096 0.0093 - T2: 0.0152 0.0151 0.0152 0.0156 0.0154 0.0154 - T3: 0.0079 0.0080 0.0079 0.0106 0.0107 0.0134 - T4: 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 - ET : -- S- U- -A SA UA - T1: 0.1017 0.1715 0.1962 0.1080 0.2470 0.1049 - T2: 0.3130 0.3324 0.1130 0.3897 0.1158 0.4246 - T3: 0.0341 0.0323 0.0338 0.0358 0.3965 0.0359 - T4: 0.0006 0.0005 0.0006 0.0006 0.0007 0.0006 + lxe: -- S- U- -A SA UA + T1: 0.0407 0.0470 0.0506 0.0396 0.0464 0.0504 + T2: 0.0480 0.0557 0.0584 0.0520 0.0608 0.0627 + T3: 0.0118 0.0132 0.0136 0.0319 0.0322 0.0319 + T4: 0.0002 0.0002 0.0002 0.0006 0.0006 0.0006 + cET: -- S- U- -A SA UA + T1: 0.0045 0.0043 0.0043 0.0045 0.0043 0.0043 + T2: 0.0068 0.0069 0.0066 0.0078 0.0070 0.0069 + T3: 0.0040 0.0040 0.0040 0.0050 0.0052 0.0067 + T4: 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 + ET : -- S- U- -A SA UA + T1: 0.0479 0.1051 0.1279 0.0487 0.1597 0.0484 + T2: 0.1995 0.0553 0.2297 0.2550 0.0550 0.2881 + T3: 0.0177 0.0169 0.0174 0.0185 0.2895 0.0189 + T4: 0.0003 0.0002 0.0003 0.0003 0.0014 0.0003 -While lxml is still a lot faster than ET in most cases, cET can be up -to five times faster than lxml here. One of the reasons is that lxml +While lxml is still a lot faster than ET in most cases, cET can be +several times faster than lxml here. One of the reasons is that lxml must encode incoming string data and tag names into UTF-8, and additionally discard the created Python elements after their use, when they are no longer referenced. ET and cET represent the tree itself @@ -311,36 +350,36 @@ create a shallow copy of their list of children, lxml has to create a Python object for each child and collect them in a list:: - lxe: root_list_children (--TR T1) 0.0148 msec/pass - cET: root_list_children (--TR T1) 0.0050 msec/pass - ET : root_list_children (--TR T1) 0.0219 msec/pass - - lxe: root_list_children (--TR T2) 0.1719 msec/pass - cET: root_list_children (--TR T2) 0.0260 msec/pass - ET : root_list_children (--TR T2) 0.3390 msec/pass + lxe: root_list_children (--TR T1) 0.0079 msec/pass + cET: root_list_children (--TR T1) 0.0029 msec/pass + ET : root_list_children (--TR T1) 0.0100 msec/pass + + lxe: root_list_children (--TR T2) 0.0849 msec/pass + cET: root_list_children (--TR T2) 0.0110 msec/pass + ET : root_list_children (--TR T2) 0.1481 msec/pass This handicap is also visible when accessing single children:: - lxe: first_child (--TR T2) 0.1879 msec/pass - cET: first_child (--TR T2) 0.1760 msec/pass - ET : first_child (--TR T2) 0.8099 msec/pass - - lxe: last_child (--TR T1) 0.1910 msec/pass - cET: last_child (--TR T1) 0.1872 msec/pass - ET : last_child (--TR T1) 0.8099 msec/pass + lxe: first_child (--TR T2) 0.0699 msec/pass + cET: first_child (--TR T2) 0.0608 msec/pass + ET : first_child (--TR T2) 0.3419 msec/pass + + lxe: last_child (--TR T1) 0.0710 msec/pass + cET: last_child (--TR T1) 0.0648 msec/pass + ET : last_child (--TR T1) 0.3309 msec/pass ... unless you also add the time to find a child index in a bigger list. ET and cET use Python lists here, which are based on arrays. The data structure used by libxml2 is a linked tree, and thus, a linked list of children:: - lxe: middle_child (--TR T1) 0.2189 msec/pass - cET: middle_child (--TR T1) 0.1779 msec/pass - ET : middle_child (--TR T1) 0.8030 msec/pass - - lxe: middle_child (--TR T2) 2.4071 msec/pass - cET: middle_child (--TR T2) 0.1781 msec/pass - ET : middle_child (--TR T2) 0.8039 msec/pass + lxe: middle_child (--TR T1) 0.0989 msec/pass + cET: middle_child (--TR T1) 0.0598 msec/pass + ET : middle_child (--TR T1) 0.3390 msec/pass + + lxe: middle_child (--TR T2) 2.7599 msec/pass + cET: middle_child (--TR T2) 0.0620 msec/pass + ET : middle_child (--TR T2) 0.3610 msec/pass Element creation @@ -350,21 +389,21 @@ in. This results in a major performance difference for creating independent Elements that end up in independently created documents:: - lxe: create_elements (--TC T2) 2.1949 msec/pass - cET: create_elements (--TC T2) 0.1941 msec/pass - ET : create_elements (--TC T2) 1.2760 msec/pass + lxe: create_elements (--TC T2) 1.1640 msec/pass + cET: create_elements (--TC T2) 0.0808 msec/pass + ET : create_elements (--TC T2) 0.5801 msec/pass Therefore, it is always preferable to create Elements for the document they are supposed to end up in, either as SubElements of an Element or using the explicit ``Element.makeelement()`` call:: - lxe: makeelement (--TC T2) 1.8370 msec/pass - cET: makeelement (--TC T2) 0.3200 msec/pass - ET : makeelement (--TC T2) 1.5380 msec/pass - - lxe: create_subelements (--TC T2) 1.6761 msec/pass - cET: create_subelements (--TC T2) 0.2329 msec/pass - ET : create_subelements (--TC T2) 3.0999 msec/pass + lxe: makeelement (--TC T2) 1.2751 msec/pass + cET: makeelement (--TC T2) 0.1469 msec/pass + ET : makeelement (--TC T2) 0.7451 msec/pass + + lxe: create_subelements (--TC T2) 1.1470 msec/pass + cET: create_subelements (--TC T2) 0.1080 msec/pass + ET : create_subelements (--TC T2) 1.4369 msec/pass So, if the main performance bottleneck of an application is creating large XML trees in memory through calls to Element and SubElement, cET is the best @@ -381,13 +420,13 @@ The following benchmark appends all root children of the second tree to the root of the first tree:: - lxe: append_from_document (--TR T1,T2) 3.4299 msec/pass - cET: append_from_document (--TR T1,T2) 0.2639 msec/pass - ET : append_from_document (--TR T1,T2) 1.1489 msec/pass - - lxe: append_from_document (--TR T3,T4) 0.0429 msec/pass - cET: append_from_document (--TR T3,T4) 0.0169 msec/pass - ET : append_from_document (--TR T3,T4) 0.0780 msec/pass + lxe: append_from_document (--TR T1,T2) 2.0740 msec/pass + cET: append_from_document (--TR T1,T2) 0.1271 msec/pass + ET : append_from_document (--TR T1,T2) 0.4020 msec/pass + + lxe: append_from_document (--TR T3,T4) 0.0229 msec/pass + cET: append_from_document (--TR T3,T4) 0.0088 msec/pass + ET : append_from_document (--TR T3,T4) 0.0291 msec/pass Although these are fairly small numbers compared to parsing, this easily shows the different performance classes for lxml and (c)ET. Where the latter do not @@ -398,22 +437,22 @@ This difference is not always as visible, but applies to most parts of the API, like inserting newly created elements:: - lxe: insert_from_document (--TR T1,T2) 6.1119 msec/pass - cET: insert_from_document (--TR T1,T2) 0.4129 msec/pass - ET : insert_from_document (--TR T1,T2) 1.4160 msec/pass + lxe: insert_from_document (--TR T1,T2) 7.2598 msec/pass + cET: insert_from_document (--TR T1,T2) 0.1578 msec/pass + ET : insert_from_document (--TR T1,T2) 0.5150 msec/pass or replacing the child slice by a newly created element:: - lxe: replace_children_element (--TC T1) 0.1769 msec/pass - cET: replace_children_element (--TC T1) 0.0250 msec/pass - ET : replace_children_element (--TC T1) 0.1538 msec/pass + lxe: replace_children_element (--TC T1) 0.1149 msec/pass + cET: replace_children_element (--TC T1) 0.0110 msec/pass + ET : replace_children_element (--TC T1) 0.0558 msec/pass as opposed to replacing the slice with an existing element from the same document:: - lxe: replace_children (--TC T1) 0.0169 msec/pass - cET: replace_children (--TC T1) 0.0119 msec/pass - ET : replace_children (--TC T1) 0.0758 msec/pass + lxe: replace_children (--TC T1) 0.0091 msec/pass + cET: replace_children (--TC T1) 0.0060 msec/pass + ET : replace_children (--TC T1) 0.0188 msec/pass While these numbers are too small to provide a major performance impact in practice, you should keep this difference in mind when you @@ -425,17 +464,17 @@ Deep copying a tree is fast in lxml:: - lxe: deepcopy_all (--TR T1) 10.0670 msec/pass - cET: deepcopy_all (--TR T1) 115.8700 msec/pass - ET : deepcopy_all (--TR T1) 866.8201 msec/pass - - lxe: deepcopy_all (-ATR T2) 12.4321 msec/pass - cET: deepcopy_all (-ATR T2) 130.1000 msec/pass - ET : deepcopy_all (-ATR T2) 901.1638 msec/pass - - lxe: deepcopy_all (S-TR T3) 2.6951 msec/pass - cET: deepcopy_all (S-TR T3) 28.9950 msec/pass - ET : deepcopy_all (S-TR T3) 218.7109 msec/pass + lxe: deepcopy_all (--TR T1) 5.0900 msec/pass + cET: deepcopy_all (--TR T1) 57.9181 msec/pass + ET : deepcopy_all (--TR T1) 499.1000 msec/pass + + lxe: deepcopy_all (-ATR T2) 6.3980 msec/pass + cET: deepcopy_all (-ATR T2) 65.6390 msec/pass + ET : deepcopy_all (-ATR T2) 526.5379 msec/pass + + lxe: deepcopy_all (S-TR T3) 1.4491 msec/pass + cET: deepcopy_all (S-TR T3) 14.7018 msec/pass + ET : deepcopy_all (S-TR T3) 123.5120 msec/pass So, for example, if you have a database-like scenario where you parse in a large tree and then search and copy independent subtrees from it for further @@ -450,39 +489,39 @@ especially if few elements are of interest or the target element tag name is known, lxml is a good choice:: - lxe: getiterator_all (--TR T1) 4.7209 msec/pass - cET: getiterator_all (--TR T1) 45.8400 msec/pass - ET : getiterator_all (--TR T1) 22.9480 msec/pass - - lxe: getiterator_islice (--TR T2) 0.0398 msec/pass - cET: getiterator_islice (--TR T2) 0.3798 msec/pass - ET : getiterator_islice (--TR T2) 0.1900 msec/pass - - lxe: getiterator_tag (--TR T2) 0.0160 msec/pass - cET: getiterator_tag (--TR T2) 0.8149 msec/pass - ET : getiterator_tag (--TR T2) 0.3560 msec/pass - - lxe: getiterator_tag_all (--TR T2) 0.6580 msec/pass - cET: getiterator_tag_all (--TR T2) 46.3769 msec/pass - ET : getiterator_tag_all (--TR T2) 20.3989 msec/pass + lxe: getiterator_all (--TR T1) 1.6890 msec/pass + cET: getiterator_all (--TR T1) 23.8621 msec/pass + ET : getiterator_all (--TR T1) 11.1070 msec/pass + + lxe: getiterator_islice (--TR T2) 0.0188 msec/pass + cET: getiterator_islice (--TR T2) 0.1841 msec/pass + ET : getiterator_islice (--TR T2) 11.7059 msec/pass + + lxe: getiterator_tag (--TR T2) 0.0119 msec/pass + cET: getiterator_tag (--TR T2) 0.3560 msec/pass + ET : getiterator_tag (--TR T2) 10.6668 msec/pass + + lxe: getiterator_tag_all (--TR T2) 0.2429 msec/pass + cET: getiterator_tag_all (--TR T2) 20.3710 msec/pass + ET : getiterator_tag_all (--TR T2) 10.6280 msec/pass This translates directly into similar timings for ``Element.findall()``:: - lxe: findall (--TR T2) 6.7198 msec/pass - cET: findall (--TR T2) 51.2750 msec/pass - ET : findall (--TR T2) 26.9110 msec/pass - - lxe: findall (--TR T3) 1.4520 msec/pass - cET: findall (--TR T3) 14.2760 msec/pass - ET : findall (--TR T3) 8.4310 msec/pass - - lxe: findall_tag (--TR T2) 0.7401 msec/pass - cET: findall_tag (--TR T2) 46.5961 msec/pass - ET : findall_tag (--TR T2) 20.3760 msec/pass - - lxe: findall_tag (--TR T3) 0.3331 msec/pass - cET: findall_tag (--TR T3) 11.5960 msec/pass - ET : findall_tag (--TR T3) 5.4510 msec/pass + lxe: findall (--TR T2) 2.4588 msec/pass + cET: findall (--TR T2) 24.1358 msec/pass + ET : findall (--TR T2) 13.0949 msec/pass + + lxe: findall (--TR T3) 0.5939 msec/pass + cET: findall (--TR T3) 6.9802 msec/pass + ET : findall (--TR T3) 3.8991 msec/pass + + lxe: findall_tag (--TR T2) 0.2789 msec/pass + cET: findall_tag (--TR T2) 20.5719 msec/pass + ET : findall_tag (--TR T2) 10.8678 msec/pass + + lxe: findall_tag (--TR T3) 0.1638 msec/pass + cET: findall_tag (--TR T3) 5.0790 msec/pass + ET : findall_tag (--TR T3) 2.5120 msec/pass Note that all three libraries currently use the same Python implementation for ``.findall()``, except for their native tree @@ -499,38 +538,38 @@ of the lxml API you use. The most straight forward way is to call the ``xpath()`` method on an Element or ElementTree:: - lxe: xpath_method (--TC T1) 1.5750 msec/pass - lxe: xpath_method (--TC T2) 20.9570 msec/pass - lxe: xpath_method (--TC T3) 0.1199 msec/pass - lxe: xpath_method (--TC T4) 1.0121 msec/pass + lxe: xpath_method (--TC T1) 0.7598 msec/pass + lxe: xpath_method (--TC T2) 12.6798 msec/pass + lxe: xpath_method (--TC T3) 0.0758 msec/pass + lxe: xpath_method (--TC T4) 0.6182 msec/pass This is well suited for testing and when the XPath expressions are as diverse as the trees they are called on. However, if you have a single XPath expression that you want to apply to a larger number of different elements, the ``XPath`` class is the most efficient way to do it:: - lxe: xpath_class (--TC T1) 0.6301 msec/pass - lxe: xpath_class (--TC T2) 2.6128 msec/pass - lxe: xpath_class (--TC T3) 0.0498 msec/pass - lxe: xpath_class (--TC T4) 0.1400 msec/pass + lxe: xpath_class (--TC T1) 0.2189 msec/pass + lxe: xpath_class (--TC T2) 1.4110 msec/pass + lxe: xpath_class (--TC T3) 0.0319 msec/pass + lxe: xpath_class (--TC T4) 0.0880 msec/pass Note that this still allows you to use variables in the expression, so you can parse it once and then adapt it through variables at call time. In other cases, where you have a fixed Element or ElementTree and want to run different expressions on it, you should consider the ``XPathEvaluator``:: - lxe: xpath_element (--TR T1) 0.2739 msec/pass - lxe: xpath_element (--TR T2) 10.8800 msec/pass - lxe: xpath_element (--TR T3) 0.0660 msec/pass - lxe: xpath_element (--TR T4) 0.2739 msec/pass + lxe: xpath_element (--TR T1) 0.1669 msec/pass + lxe: xpath_element (--TR T2) 6.9060 msec/pass + lxe: xpath_element (--TR T3) 0.0451 msec/pass + lxe: xpath_element (--TR T4) 0.1681 msec/pass While it looks slightly slower, creating an XPath object for each of the expressions generates a much higher overhead here:: - lxe: xpath_class_repeat (--TC T1) 1.5399 msec/pass - lxe: xpath_class_repeat (--TC T2) 20.5159 msec/pass - lxe: xpath_class_repeat (--TC T3) 0.1178 msec/pass - lxe: xpath_class_repeat (--TC T4) 0.9880 msec/pass + lxe: xpath_class_repeat (--TC T1) 0.7451 msec/pass + lxe: xpath_class_repeat (--TC T2) 12.2290 msec/pass + lxe: xpath_class_repeat (--TC T3) 0.0730 msec/pass + lxe: xpath_class_repeat (--TC T4) 0.5970 msec/pass A longer example @@ -697,21 +736,21 @@ tree. It avoids step-by-step Python element instantiations along the path, which can substantially improve the access time:: - lxe: attribute (--TR T1) 6.9990 msec/pass - lxe: attribute (--TR T2) 29.2060 msec/pass - lxe: attribute (--TR T4) 6.9048 msec/pass - - lxe: objectpath (--TR T1) 3.5410 msec/pass - lxe: objectpath (--TR T2) 24.9801 msec/pass - lxe: objectpath (--TR T4) 3.5069 msec/pass - - lxe: attributes_deep (--TR T1) 16.9580 msec/pass - lxe: attributes_deep (--TR T2) 39.8140 msec/pass - lxe: attributes_deep (--TR T4) 16.9699 msec/pass - - lxe: objectpath_deep (--TR T1) 9.4180 msec/pass - lxe: objectpath_deep (--TR T2) 31.7512 msec/pass - lxe: objectpath_deep (--TR T4) 9.4421 msec/pass + lxe: attribute (--TR T1) 4.8928 msec/pass + lxe: attribute (--TR T2) 25.5480 msec/pass + lxe: attribute (--TR T4) 4.6349 msec/pass + + lxe: objectpath (--TR T1) 1.4842 msec/pass + lxe: objectpath (--TR T2) 21.1990 msec/pass + lxe: objectpath (--TR T4) 1.4892 msec/pass + + lxe: attributes_deep (--TR T1) 11.9710 msec/pass + lxe: attributes_deep (--TR T2) 32.4290 msec/pass + lxe: attributes_deep (--TR T4) 11.4839 msec/pass + + lxe: objectpath_deep (--TR T1) 4.8139 msec/pass + lxe: objectpath_deep (--TR T2) 24.6511 msec/pass + lxe: objectpath_deep (--TR T4) 4.7588 msec/pass Note, however, that parsing ObjectPath expressions is not for free either, so this is most effective for frequently accessing the same element. @@ -741,17 +780,17 @@ subtrees and elements) to cache, you can trade memory usage against access speed:: - lxe: attribute_cached (--TR T1) 5.1420 msec/pass - lxe: attribute_cached (--TR T2) 27.0739 msec/pass - lxe: attribute_cached (--TR T4) 5.1429 msec/pass - - lxe: attributes_deep_cached (--TR T1) 7.0908 msec/pass - lxe: attributes_deep_cached (--TR T2) 29.5591 msec/pass - lxe: attributes_deep_cached (--TR T4) 7.1721 msec/pass - - lxe: objectpath_deep_cached (--TR T1) 2.2731 msec/pass - lxe: objectpath_deep_cached (--TR T2) 23.1631 msec/pass - lxe: objectpath_deep_cached (--TR T4) 2.3179 msec/pass + lxe: attribute_cached (--TR T1) 3.8228 msec/pass + lxe: attribute_cached (--TR T2) 23.7138 msec/pass + lxe: attribute_cached (--TR T4) 3.5269 msec/pass + + lxe: attributes_deep_cached (--TR T1) 4.6771 msec/pass + lxe: attributes_deep_cached (--TR T2) 24.8699 msec/pass + lxe: attributes_deep_cached (--TR T4) 4.3321 msec/pass + + lxe: objectpath_deep_cached (--TR T1) 1.1430 msec/pass + lxe: objectpath_deep_cached (--TR T2) 19.7470 msec/pass + lxe: objectpath_deep_cached (--TR T4) 1.1740 msec/pass Things to note: you cannot currently use ``weakref.WeakKeyDictionary`` objects for this as lxml's element objects do not support weak references (which are From scoder at codespeak.net Tue Mar 2 10:30:40 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 10:30:40 +0100 (CET) Subject: [Lxml-checkins] r71613 - in lxml/trunk: . doc Message-ID: <20100302093040.64210282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 10:30:36 2010 New Revision: 71613 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/FAQ.txt Log: r5490 at lenny: sbehnel | 2010-03-02 08:50:35 +0100 FAQ fixes Modified: lxml/trunk/doc/FAQ.txt ============================================================================== --- lxml/trunk/doc/FAQ.txt (original) +++ lxml/trunk/doc/FAQ.txt Tue Mar 2 10:30:36 2010 @@ -161,9 +161,10 @@ * xml:base Support for XML Schema is currently not 100% complete in libxml2, but -is definitely very close to compliance. Schematron is supported, -although not necessarily complete. libxml2 also supports loading -documents through HTTP and FTP. +is definitely very close to compliance. Schematron is supported in +two ways, the best being the original ISO Schematron reference +implementation via XSLT. libxml2 also supports loading documents +through HTTP and FTP. Who uses lxml? @@ -174,8 +175,7 @@ facilitate some kind of document management. Many people who deploy Zope_ or Plone_ use it together with lxml. Therefore, it is hard to get an idea of who uses it, and the following list of 'users and -projects we know of' is definitely not a complete list of lxml's -users. +projects we know of' is very far from a complete list of lxml's users. Also note that the compatibility to the ElementTree library does not require projects to set a hard dependency on lxml - as long as they do From scoder at codespeak.net Tue Mar 2 10:30:44 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 10:30:44 +0100 (CET) Subject: [Lxml-checkins] r71614 - lxml/trunk Message-ID: <20100302093044.6C83F282BD8@codespeak.net> Author: scoder Date: Tue Mar 2 10:30:42 2010 New Revision: 71614 Modified: lxml/trunk/ (props changed) lxml/trunk/setup.py Log: r5491 at lenny: sbehnel | 2010-03-02 10:30:15 +0100 lxml 2.3 will have official support for Py3.1.2 and later Modified: lxml/trunk/setup.py ============================================================================== --- lxml/trunk/setup.py (original) +++ lxml/trunk/setup.py Tue Mar 2 10:30:42 2010 @@ -105,6 +105,8 @@ 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.0', + 'Programming Language :: Python :: 3.1', + 'Programming Language :: Python :: 3.2', 'Programming Language :: C', 'Operating System :: OS Independent', 'Topic :: Text Processing :: Markup :: HTML', From scoder at codespeak.net Tue Mar 2 13:53:37 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:53:37 +0100 (CET) Subject: [Lxml-checkins] r71621 - in lxml/trunk: . src/lxml Message-ID: <20100302125337.109A451057@codespeak.net> Author: scoder Date: Tue Mar 2 13:53:36 2010 New Revision: 71621 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/cleanup.pxi Log: r5495 at lenny: sbehnel | 2010-03-02 11:12:46 +0100 code cleanup Modified: lxml/trunk/src/lxml/cleanup.pxi ============================================================================== --- lxml/trunk/src/lxml/cleanup.pxi (original) +++ lxml/trunk/src/lxml/cleanup.pxi Tue Mar 2 13:53:36 2010 @@ -121,7 +121,7 @@ if strip_pis: _removeSiblings(element._c_node, tree.XML_PI_NODE, with_tail) - # tag names are passes as C pointers as this allows us to take + # tag names are passed as C pointers as this allows us to take # them from the doc dict and do pointer comparisons c_ns_tags = cstd.malloc(sizeof(char*) * len(ns_tags) * 2 + 2) if c_ns_tags is NULL: @@ -152,8 +152,8 @@ while c_child is not NULL: c_next = _nextElement(c_child) if c_child.type == tree.XML_ELEMENT_NODE: - for i in range(c_tag_count): - if _tagMatchesExactly(c_child, c_ns_tags[2*i], c_ns_tags[2*i+1]): + for i in range(0, c_tag_count*2, 2): + if _tagMatchesExactly(c_child, c_ns_tags[i], c_ns_tags[i+1]): if not with_tail: tree.xmlUnlinkNode(c_child) _removeNode(doc, c_child) From scoder at codespeak.net Tue Mar 2 13:53:46 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:53:46 +0100 (CET) Subject: [Lxml-checkins] r71622 - in lxml/trunk: . src/lxml src/lxml/html src/lxml/tests Message-ID: <20100302125346.3C57551057@codespeak.net> Author: scoder Date: Tue Mar 2 13:53:43 2010 New Revision: 71622 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/apihelpers.pxi lxml/trunk/src/lxml/html/__init__.py lxml/trunk/src/lxml/lxml.objectify.pyx lxml/trunk/src/lxml/parser.pxi lxml/trunk/src/lxml/tests/test_elementtree.py lxml/trunk/src/lxml/tests/test_etree.py lxml/trunk/src/lxml/xpath.pxi Log: r5496 at lenny: sbehnel | 2010-03-02 11:26:46 +0100 Py3 fixes Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Tue Mar 2 13:53:43 2010 @@ -672,14 +672,14 @@ if python.IS_PYTHON3: return u'' else: - return '' + return b'' else: return None if scount == 1: return funicode(c_text) # the rest is not performance critical anymore - result = '' + result = b'' while c_node is not NULL: result = result + c_node.content c_node = _textNodeOrSkip(c_node.next) Modified: lxml/trunk/src/lxml/html/__init__.py ============================================================================== --- lxml/trunk/src/lxml/html/__init__.py (original) +++ lxml/trunk/src/lxml/html/__init__.py Tue Mar 2 13:53:43 2010 @@ -1007,7 +1007,7 @@ serialisation_method = 'html' for el in self: # it's rare that we actually get here, so let's not use ''.join() - content += etree.tostring(el, method=serialisation_method) + content += etree.tostring(el, method=serialisation_method, encoding=unicode) return content def _value__set(self, value): del self[:] Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Tue Mar 2 13:53:43 2010 @@ -1688,16 +1688,16 @@ c_ns = cetree.findOrBuildNodeNsPrefix( doc, c_node, _XML_SCHEMA_NS, 'xsd') if c_ns is not NULL: - if ':' in typename_utf8: - prefix, name = typename_utf8.split(':', 1) + if b':' in typename_utf8: + prefix, name = typename_utf8.split(b':', 1) if c_ns.prefix is NULL or c_ns.prefix[0] == c'\0': typename_utf8 = name elif cstd.strcmp(_cstr(prefix), c_ns.prefix) != 0: prefix = c_ns.prefix - typename_utf8 = prefix + ':' + name + typename_utf8 = prefix + b':' + name elif c_ns.prefix is not NULL or c_ns.prefix[0] != c'\0': prefix = c_ns.prefix - typename_utf8 = prefix + ':' + typename_utf8 + typename_utf8 = prefix + b':' + typename_utf8 c_ns = cetree.findOrBuildNodeNsPrefix( doc, c_node, _XML_SCHEMA_INSTANCE_NS, 'xsi') tree.xmlSetNsProp(c_node, c_ns, "type", _cstr(typename_utf8)) Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Tue Mar 2 13:53:43 2010 @@ -274,7 +274,7 @@ url = _encodeFilename(url) self._c_url = _cstr(url) self._url = url - self._bytes = '' + self._bytes = b'' self._bytes_read = 0 cdef xmlparser.xmlParserInputBuffer* _createParserInputBuffer(self): Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Tue Mar 2 13:53:43 2010 @@ -443,7 +443,7 @@ XML = self.etree.XML root = XML(_bytes('')) - keys = root.attrib.keys() + keys = list(root.attrib.keys()) keys.sort() self.assertEquals(['alpha', 'beta', 'gamma'], keys) @@ -451,7 +451,7 @@ XML = self.etree.XML root = XML(_bytes('')) - keys = root.keys() + keys = list(root.keys()) keys.sort() self.assertEquals(['alpha', 'beta', 'gamma'], keys) @@ -469,7 +469,7 @@ XML = self.etree.XML root = XML(_bytes('')) - keys = root.keys() + keys = list(root.keys()) keys.sort() self.assertEquals(['bar', '{http://ns.codespeak.net/test}baz'], keys) @@ -478,7 +478,7 @@ XML = self.etree.XML root = XML(_bytes('')) - values = root.attrib.values() + values = list(root.attrib.values()) values.sort() self.assertEquals(['Alpha', 'Beta', 'Gamma'], values) @@ -486,7 +486,7 @@ XML = self.etree.XML root = XML(_bytes('')) - values = root.attrib.values() + values = list(root.attrib.values()) values.sort() self.assertEquals( ['Bar', 'Baz'], values) @@ -2596,7 +2596,7 @@ self.assertEquals('test', root[0].get('{%s}a' % ns_href)) xml2 = tostring(root) - self.assertTrue(':a=' in xml2, xml2) + self.assertTrue(_bytes(':a=') in xml2, xml2) root2 = fromstring(xml2) self.assertEquals('test', root[0].get('{%s}a' % ns_href)) @@ -2614,7 +2614,7 @@ root[0].set('{%s}a' % ns_href, 'TEST') xml2 = tostring(root) - self.assertTrue(':a=' in xml2, xml2) + self.assertTrue(_bytes(':a=') in xml2, xml2) root2 = fromstring(xml2) self.assertEquals('TEST', root[0].get('{%s}a' % ns_href)) Modified: lxml/trunk/src/lxml/tests/test_etree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_etree.py (original) +++ lxml/trunk/src/lxml/tests/test_etree.py Tue Mar 2 13:53:43 2010 @@ -389,13 +389,13 @@ root = XML(xml) self.etree.strip_tags(root, 'a') - self.assertEquals(re.sub(_bytes(']*>'), '', xml).replace('
', '

'), + self.assertEquals(re.sub(_bytes(']*>'), _bytes(''), xml).replace(_bytes('
'), _bytes('

')), self._writeElement(root)) root = XML(xml) self.etree.strip_tags(root, 'a', 'br') - self.assertEquals(re.sub(_bytes(']*>'), '', - re.sub(_bytes(']*>'), '', xml)), + self.assertEquals(re.sub(_bytes(']*>'), _bytes(''), + re.sub(_bytes(']*>'), _bytes(''), xml)), self._writeElement(root)) def test_strip_tags_ns(self): @@ -2447,7 +2447,7 @@ xml = _bytes('\n') tree = etree.parse(BytesIO(xml)) - self.assertEquals(xml.replace('', doctype_string), + self.assertEquals(xml.replace(_bytes(''), doctype_string), etree.tostring(tree, doctype=doctype_string)) def test_xml_base(self): Modified: lxml/trunk/src/lxml/xpath.pxi ============================================================================== --- lxml/trunk/src/lxml/xpath.pxi (original) +++ lxml/trunk/src/lxml/xpath.pxi Tue Mar 2 13:53:43 2010 @@ -477,8 +477,8 @@ cdef object _replace_strings cdef object _find_namespaces -_replace_strings = re.compile('("[^"]*")|(\'[^\']*\')').sub -_find_namespaces = re.compile('({[^}]+})').findall +_replace_strings = re.compile(b'("[^"]*")|(\'[^\']*\')').sub +_find_namespaces = re.compile(b'({[^}]+})').findall cdef class ETXPath(XPath): u"""ETXPath(self, path, extensions=None, regexp=True, smart_strings=True) @@ -502,7 +502,7 @@ cdef list namespace_defs = [] cdef int i path_utf = _utf8(path) - stripped_path = _replace_strings('', path_utf) # remove string literals + stripped_path = _replace_strings(b'', path_utf) # remove string literals i = 1 for namespace_def in _find_namespaces(stripped_path): if namespace_def not in namespace_defs: @@ -515,7 +515,7 @@ namespaces[ python.PyUnicode_FromEncodedObject(prefix, 'UTF-8', 'strict') ] = namespace - prefix_str = prefix + ':' + prefix_str = prefix + b':' # FIXME: this also replaces {namespaces} within strings! path_utf = path_utf.replace(namespace_def, prefix_str) path = python.PyUnicode_FromEncodedObject(path_utf, 'UTF-8', 'strict') From scoder at codespeak.net Tue Mar 2 13:53:48 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:53:48 +0100 (CET) Subject: [Lxml-checkins] r71623 - in lxml/trunk: . src/lxml Message-ID: <20100302125348.E88605105A@codespeak.net> Author: scoder Date: Tue Mar 2 13:53:47 2010 New Revision: 71623 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/xslt.pxi Log: r5497 at lenny: sbehnel | 2010-03-02 13:32:37 +0100 support UTF-8 encoded output in 'unicode(xslt_result)' when no encoding was specified Modified: lxml/trunk/src/lxml/xslt.pxi ============================================================================== --- lxml/trunk/src/lxml/xslt.pxi (original) +++ lxml/trunk/src/lxml/xslt.pxi Tue Mar 2 13:53:47 2010 @@ -707,10 +707,11 @@ if s is NULL: return u'' encoding = self._xslt._c_style.encoding - if encoding is NULL: - encoding = 'ascii' try: - result = python.PyUnicode_Decode(s, l, encoding, 'strict') + if encoding is NULL: + result = s[:l].decode('UTF-8') + else: + result = python.PyUnicode_Decode(s, l, encoding, 'strict') finally: tree.xmlFree(s) return _stripEncodingDeclaration(result) From scoder at codespeak.net Tue Mar 2 13:53:52 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:53:52 +0100 (CET) Subject: [Lxml-checkins] r71624 - lxml/trunk Message-ID: <20100302125352.EB89251057@codespeak.net> Author: scoder Date: Tue Mar 2 13:53:50 2010 New Revision: 71624 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt Log: r5498 at lenny: sbehnel | 2010-03-02 13:32:51 +0100 changelog Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Tue Mar 2 13:53:50 2010 @@ -100,6 +100,8 @@ Other changes ------------- +* Official support for Python 3.1.2 and later. + * Static MS Windows builds can now download their dependencies themselves. From scoder at codespeak.net Tue Mar 2 13:53:56 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:53:56 +0100 (CET) Subject: [Lxml-checkins] r71625 - in lxml/trunk: . doc Message-ID: <20100302125356.08F0851057@codespeak.net> Author: scoder Date: Tue Mar 2 13:53:54 2010 New Revision: 71625 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/xpathxslt.txt Log: r5499 at lenny: sbehnel | 2010-03-02 13:53:29 +0100 clarified section on XPath namespaces and prefixes Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Tue Mar 2 13:53:54 2010 @@ -9,12 +9,13 @@ .. 1 XPath 1.1 The ``xpath()`` method - 1.2 XPath return values - 1.3 Generating XPath expressions - 1.4 The ``XPath`` class - 1.5 The ``XPathEvaluator`` classes - 1.6 ``ETXPath`` - 1.7 Error handling + 1.2 Namespaces and prefixes + 1.3 XPath return values + 1.4 Generating XPath expressions + 1.5 The ``XPath`` class + 1.6 The ``XPathEvaluator`` classes + 1.7 ``ETXPath`` + 1.8 Error handling 2 XSLT 2.1 XSLT result objects 2.2 Stylesheet parameters @@ -22,7 +23,6 @@ 2.4 Dealing with stylesheet complexity 2.5 Profiling - The usual setup procedure: .. sourcecode:: pycon @@ -120,9 +120,14 @@ >>> print(root.xpath("$text", text = "Hello World!")) Hello World! -Optionally, you can provide a ``namespaces`` keyword argument, which should be -a dictionary mapping the namespace prefixes used in the XPath expression to -namespace URIs: + +Namespaces and prefixes +----------------------- + +If your XPath expression uses namespace prefixes, you must define them +in a prefix mapping. To this end, pass a dictionary to the +``namespaces`` keyword argument that maps the namespace prefixes used +in the XPath expression to namespace URIs: .. sourcecode:: pycon @@ -144,8 +149,18 @@ >>> r[0].text 'Text' -There is also an optional ``extensions`` argument which is used to define -`custom extension functions`_ in Python that are local to this evaluation. +The prefixes you choose here are not linked to the prefixes used +inside the XML document. The document may use whatever prefixes it +likes, including the empty prefix, without breaking the above code. + +Note that XPath does not have a notion of a default namespace. The +empty prefix is therefore undefined for XPath and cannot be used in +namespace prefix mappings. + +There is also an optional ``extensions`` argument which is used to +define `custom extension functions`_ in Python that are local to this +evaluation. The namespace prefixes that they use in the XPath +expression must also be defined in the namespace prefix mapping. XPath return values From scoder at codespeak.net Tue Mar 2 13:57:20 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:57:20 +0100 (CET) Subject: [Lxml-checkins] r71626 - in lxml/trunk: . doc Message-ID: <20100302125720.EE86151057@codespeak.net> Author: scoder Date: Tue Mar 2 13:57:10 2010 New Revision: 71626 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/xpathxslt.txt Log: r5505 at lenny: sbehnel | 2010-03-02 13:57:07 +0100 doc fix Modified: lxml/trunk/doc/xpathxslt.txt ============================================================================== --- lxml/trunk/doc/xpathxslt.txt (original) +++ lxml/trunk/doc/xpathxslt.txt Tue Mar 2 13:57:10 2010 @@ -150,7 +150,7 @@ 'Text' The prefixes you choose here are not linked to the prefixes used -inside the XML document. The document may use whatever prefixes it +inside the XML document. The document may define whatever prefixes it likes, including the empty prefix, without breaking the above code. Note that XPath does not have a notion of a default namespace. The From scoder at codespeak.net Tue Mar 2 13:59:21 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 13:59:21 +0100 (CET) Subject: [Lxml-checkins] r71628 - in lxml/branch/lxml-2.2: . doc Message-ID: <20100302125921.31DEE51057@codespeak.net> Author: scoder Date: Tue Mar 2 13:59:19 2010 New Revision: 71628 Modified: lxml/branch/lxml-2.2/ (props changed) lxml/branch/lxml-2.2/INSTALL.txt (props changed) lxml/branch/lxml-2.2/doc/xpathxslt.txt Log: trunk merge: XPath doc update Modified: lxml/branch/lxml-2.2/doc/xpathxslt.txt ============================================================================== --- lxml/branch/lxml-2.2/doc/xpathxslt.txt (original) +++ lxml/branch/lxml-2.2/doc/xpathxslt.txt Tue Mar 2 13:59:19 2010 @@ -9,12 +9,13 @@ .. 1 XPath 1.1 The ``xpath()`` method - 1.2 XPath return values - 1.3 Generating XPath expressions - 1.4 The ``XPath`` class - 1.5 The ``XPathEvaluator`` classes - 1.6 ``ETXPath`` - 1.7 Error handling + 1.2 Namespaces and prefixes + 1.3 XPath return values + 1.4 Generating XPath expressions + 1.5 The ``XPath`` class + 1.6 The ``XPathEvaluator`` classes + 1.7 ``ETXPath`` + 1.8 Error handling 2 XSLT 2.1 XSLT result objects 2.2 Stylesheet parameters @@ -22,7 +23,6 @@ 2.4 Dealing with stylesheet complexity 2.5 Profiling - The usual setup procedure: .. sourcecode:: pycon @@ -120,9 +120,14 @@ >>> print(root.xpath("$text", text = "Hello World!")) Hello World! -Optionally, you can provide a ``namespaces`` keyword argument, which should be -a dictionary mapping the namespace prefixes used in the XPath expression to -namespace URIs: + +Namespaces and prefixes +----------------------- + +If your XPath expression uses namespace prefixes, you must define them +in a prefix mapping. To this end, pass a dictionary to the +``namespaces`` keyword argument that maps the namespace prefixes used +in the XPath expression to namespace URIs: .. sourcecode:: pycon @@ -144,8 +149,18 @@ >>> r[0].text 'Text' -There is also an optional ``extensions`` argument which is used to define -`custom extension functions`_ in Python that are local to this evaluation. +The prefixes you choose here are not linked to the prefixes used +inside the XML document. The document may define whatever prefixes it +likes, including the empty prefix, without breaking the above code. + +Note that XPath does not have a notion of a default namespace. The +empty prefix is therefore undefined for XPath and cannot be used in +namespace prefix mappings. + +There is also an optional ``extensions`` argument which is used to +define `custom extension functions`_ in Python that are local to this +evaluation. The namespace prefixes that they use in the XPath +expression must also be defined in the namespace prefix mapping. XPath return values From scoder at codespeak.net Tue Mar 2 17:15:11 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 17:15:11 +0100 (CET) Subject: [Lxml-checkins] r71647 - in lxml/branch/lxml-2.2: . doc Message-ID: <20100302161511.CF1AE282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 17:15:10 2010 New Revision: 71647 Modified: lxml/branch/lxml-2.2/CHANGES.txt lxml/branch/lxml-2.2/doc/main.txt lxml/branch/lxml-2.2/version.txt Log: prepare release of lxml 2.2.6 Modified: lxml/branch/lxml-2.2/CHANGES.txt ============================================================================== --- lxml/branch/lxml-2.2/CHANGES.txt (original) +++ lxml/branch/lxml-2.2/CHANGES.txt Tue Mar 2 17:15:10 2010 @@ -2,6 +2,15 @@ lxml changelog ============== +2.2.6 (2010-03-02) +================== + +Bugs fixed +---------- + +* Fixed several Python 3 regressions by building with Cython 0.11.3. + + 2.2.5 (2010-02-28) ================== Modified: lxml/branch/lxml-2.2/doc/main.txt ============================================================================== --- lxml/branch/lxml-2.2/doc/main.txt (original) +++ lxml/branch/lxml-2.2/doc/main.txt Tue Mar 2 17:15:10 2010 @@ -147,8 +147,8 @@ source release. If you can't wait, consider trying a less recent release version first. -The latest version is `lxml 2.2.5`_, released 2010-02-28 -(`changes for 2.2.5`_). `Older versions`_ are listed below. +The latest version is `lxml 2.2.6`_, released 2010-03-02 +(`changes for 2.2.6`_). `Older versions`_ are listed below. Please take a look at the `installation instructions`_! @@ -221,7 +221,9 @@ `_ and the `current in-development version `_. -.. _`PDF documentation`: lxmldoc-2.2.5.pdf +.. _`PDF documentation`: lxmldoc-2.2.6.pdf + +* `lxml 2.2.5`_, released 2010-02-28 (`changes for 2.2.5`_) * `lxml 2.2.4`_, released 2009-11-11 (`changes for 2.2.4`_) @@ -329,6 +331,7 @@ * `lxml 0.5`_, released 2005-04-08 +.. _`lxml 2.2.6`: lxml-2.2.6.tgz .. _`lxml 2.2.5`: lxml-2.2.5.tgz .. _`lxml 2.2.4`: lxml-2.2.4.tgz .. _`lxml 2.2.3`: lxml-2.2.3.tgz @@ -383,8 +386,9 @@ .. _`lxml 0.5.1`: lxml-0.5.1.tgz .. _`lxml 0.5`: lxml-0.5.tgz -.. _`changes for 2.2.4`: changes-2.2.4.html +.. _`changes for 2.2.6`: changes-2.2.6.html .. _`changes for 2.2.5`: changes-2.2.5.html +.. _`changes for 2.2.4`: changes-2.2.4.html .. _`changes for 2.2.3`: changes-2.2.3.html .. _`changes for 2.2.2`: changes-2.2.2.html .. _`changes for 2.2.1`: changes-2.2.1.html Modified: lxml/branch/lxml-2.2/version.txt ============================================================================== --- lxml/branch/lxml-2.2/version.txt (original) +++ lxml/branch/lxml-2.2/version.txt Tue Mar 2 17:15:10 2010 @@ -1 +1 @@ -2.2.5 +2.2.6 From scoder at codespeak.net Tue Mar 2 22:34:01 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 22:34:01 +0100 (CET) Subject: [Lxml-checkins] r71670 - in lxml/trunk: . src/lxml/html Message-ID: <20100302213401.F3059282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 22:34:00 2010 New Revision: 71670 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/html/_html5builder.py Log: r5507 at lenny: sbehnel | 2010-03-02 22:33:42 +0100 fix html5lib TreeBuilder subclass for newer library versions Modified: lxml/trunk/src/lxml/html/_html5builder.py ============================================================================== --- lxml/trunk/src/lxml/html/_html5builder.py (original) +++ lxml/trunk/src/lxml/html/_html5builder.py Tue Mar 2 22:34:00 2010 @@ -32,12 +32,12 @@ commentClass = None fragmentClass = Document - def __init__(self): + def __init__(self, *args, **kwargs): html_builder = etree_builders.getETreeModule(html, fullTree=False) etree_builder = etree_builders.getETreeModule(etree, fullTree=False) self.elementClass = html_builder.Element self.commentClass = etree_builder.Comment - _base.TreeBuilder.__init__(self) + _base.TreeBuilder.__init__(self, *args, **kwargs) def reset(self): _base.TreeBuilder.reset(self) From scoder at codespeak.net Tue Mar 2 22:34:07 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 22:34:07 +0100 (CET) Subject: [Lxml-checkins] r71671 - in lxml/trunk: . src/lxml Message-ID: <20100302213407.53193282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 22:34:06 2010 New Revision: 71671 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/cleanup.pxi Log: r5508 at lenny: sbehnel | 2010-03-02 22:33:53 +0100 Py3 fix Modified: lxml/trunk/src/lxml/cleanup.pxi ============================================================================== --- lxml/trunk/src/lxml/cleanup.pxi (original) +++ lxml/trunk/src/lxml/cleanup.pxi Tue Mar 2 22:34:06 2010 @@ -295,7 +295,7 @@ else: ns_tags.append(_getNsTag(tag)) - return [ (ns, tag if tag != '*' else None) + return [ (ns, tag if tag != b'*' else None) for ns, tag in _sortedTagList(ns_tags) ] cdef Py_ssize_t _mapTagsToCharArray(xmlDoc* c_doc, list ns_tags, From scoder at codespeak.net Tue Mar 2 22:36:45 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 22:36:45 +0100 (CET) Subject: [Lxml-checkins] r71672 - in lxml/branch/lxml-2.2: . src/lxml/html Message-ID: <20100302213645.1811F282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 22:36:43 2010 New Revision: 71672 Modified: lxml/branch/lxml-2.2/ (props changed) lxml/branch/lxml-2.2/INSTALL.txt (props changed) lxml/branch/lxml-2.2/src/lxml/html/_html5builder.py Log: trunk merge: fix html5lib parser integration Modified: lxml/branch/lxml-2.2/src/lxml/html/_html5builder.py ============================================================================== --- lxml/branch/lxml-2.2/src/lxml/html/_html5builder.py (original) +++ lxml/branch/lxml-2.2/src/lxml/html/_html5builder.py Tue Mar 2 22:36:43 2010 @@ -32,12 +32,12 @@ commentClass = None fragmentClass = Document - def __init__(self): + def __init__(self, *args, **kwargs): html_builder = etree_builders.getETreeModule(html, fullTree=False) etree_builder = etree_builders.getETreeModule(etree, fullTree=False) self.elementClass = html_builder.Element self.commentClass = etree_builder.Comment - _base.TreeBuilder.__init__(self) + _base.TreeBuilder.__init__(self, *args, **kwargs) def reset(self): _base.TreeBuilder.reset(self) From scoder at codespeak.net Tue Mar 2 22:37:25 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Tue, 2 Mar 2010 22:37:25 +0100 (CET) Subject: [Lxml-checkins] r71673 - lxml/tag/lxml-2.2.6 Message-ID: <20100302213725.D0F4D282BD4@codespeak.net> Author: scoder Date: Tue Mar 2 22:37:24 2010 New Revision: 71673 Added: lxml/tag/lxml-2.2.6/ - copied from r71647, lxml/branch/lxml-2.2/ Log: new tag for lxml 2.2.6 From scoder at codespeak.net Sat Mar 13 15:06:03 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 13 Mar 2010 15:06:03 +0100 (CET) Subject: [Lxml-checkins] r72199 - in lxml/trunk: . src/lxml Message-ID: <20100313140603.786DE282BF2@codespeak.net> Author: scoder Date: Sat Mar 13 15:06:01 2010 New Revision: 72199 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.objectify.pyx Log: r5513 at lenny: sbehnel | 2010-03-13 14:53:09 +0100 code cleanup Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Sat Mar 13 15:06:01 2010 @@ -60,7 +60,7 @@ cdef object TREE_PYTYPE_NAME TREE_PYTYPE_NAME = u"TREE" -cdef _unicodeAndUtf8(s): +cdef tuple _unicodeAndUtf8(s): return (s, python.PyUnicode_AsUTF8String(s)) def set_pytype_attribute_tag(attribute_tag=None): From scoder at codespeak.net Sat Mar 13 15:06:09 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 13 Mar 2010 15:06:09 +0100 (CET) Subject: [Lxml-checkins] r72200 - lxml/trunk Message-ID: <20100313140609.17633282BF2@codespeak.net> Author: scoder Date: Sat Mar 13 15:06:07 2010 New Revision: 72200 Modified: lxml/trunk/ (props changed) lxml/trunk/buildlibxml.py lxml/trunk/setupinfo.py Log: r5514 at lenny: sbehnel | 2010-03-13 15:05:54 +0100 build lib dependencies in parallel if supported Modified: lxml/trunk/buildlibxml.py ============================================================================== --- lxml/trunk/buildlibxml.py (original) +++ lxml/trunk/buildlibxml.py Sat Mar 13 15:06:07 2010 @@ -7,8 +7,19 @@ except ImportError: from urllib.parse import urlsplit from urllib.request import urlretrieve - - + +multi_make_options = [] +try: + from multiprocessing import cpu_count +except ImportError: + pass +else: + cpus = cpu_count() + if cpus > 1: + if cpus > 5: + cpus = 5 + multi_make_options = ['-j%d' % (cpus+1)] + # use pre-built libraries on Windows @@ -229,18 +240,27 @@ if not os.path.exists(dir): os.makedirs(dir) -def cmmi(configure_cmd, build_dir, **call_setup): +def cmmi(configure_cmd, build_dir, multicore=None, **call_setup): print('Starting build in %s' % build_dir) call_subprocess(configure_cmd, cwd=build_dir, **call_setup) + if not multicore: + make_jobs = multi_make_options + elif int(multicore) > 1: + make_jobs = ['-j%s' % multicore] + else: + make_jobs = [] call_subprocess( - ['make'], cwd=build_dir, **call_setup) + ['make'] + make_jobs, + cwd=build_dir, **call_setup) call_subprocess( - ['make', 'install'], cwd=build_dir, **call_setup) + ['make'] + make_jobs + ['install'], + cwd=build_dir, **call_setup) def build_libxml2xslt(download_dir, build_dir, static_include_dirs, static_library_dirs, static_cflags, static_binaries, - libxml2_version=None, libxslt_version=None, libiconv_version=None): + libxml2_version=None, libxslt_version=None, libiconv_version=None, + multicore=None): safe_mkdir(download_dir) safe_mkdir(build_dir) libiconv_dir = unpack_tarball(download_libiconv(download_dir, libiconv_version), build_dir) @@ -278,13 +298,13 @@ ] # build libiconv - cmmi(configure_cmd, libiconv_dir, **call_setup) + cmmi(configure_cmd, libiconv_dir, multicore, **call_setup) # build libxml2 libxml2_configure_cmd = configure_cmd + [ '--without-python', '--with-iconv=%s' % prefix] - cmmi(libxml2_configure_cmd, libxml2_dir, **call_setup) + cmmi(libxml2_configure_cmd, libxml2_dir, multicore, **call_setup) # build libxslt libxslt_configure_cmd = configure_cmd + [ @@ -295,7 +315,7 @@ libxslt_configure_cmd += [ '--without-crypto', ] - cmmi(libxslt_configure_cmd, libxslt_dir, **call_setup) + cmmi(libxslt_configure_cmd, libxslt_dir, multicore, **call_setup) # collect build setup for lxml xslt_config = os.path.join(prefix, 'bin', 'xslt-config') Modified: lxml/trunk/setupinfo.py ============================================================================== --- lxml/trunk/setupinfo.py (original) +++ lxml/trunk/setupinfo.py Sat Mar 13 15:06:07 2010 @@ -52,7 +52,8 @@ static_cflags, static_binaries, libiconv_version=OPTION_LIBICONV_VERSION, libxml2_version=OPTION_LIBXML2_VERSION, - libxslt_version=OPTION_LIBXSLT_VERSION) + libxslt_version=OPTION_LIBXSLT_VERSION, + multicore=OPTION_MULTICORE) if CYTHON_INSTALLED: source_extension = ".pyx" @@ -345,3 +346,4 @@ OPTION_LIBXML2_VERSION = option_value('libxml2-version') OPTION_LIBXSLT_VERSION = option_value('libxslt-version') OPTION_LIBICONV_VERSION = option_value('libiconv-version') +OPTION_MULTICORE = option_value('multicore') From scoder at codespeak.net Mon Mar 15 13:12:13 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 15 Mar 2010 13:12:13 +0100 (CET) Subject: [Lxml-checkins] r72236 - lxml/trunk Message-ID: <20100315121213.976F1282B9C@codespeak.net> Author: scoder Date: Mon Mar 15 13:12:10 2010 New Revision: 72236 Modified: lxml/trunk/ (props changed) lxml/trunk/buildlibxml.py Log: r5519 at lenny: sbehnel | 2010-03-15 13:12:00 +0100 disable doc building during libxml2 install Modified: lxml/trunk/buildlibxml.py ============================================================================== --- lxml/trunk/buildlibxml.py (original) +++ lxml/trunk/buildlibxml.py Mon Mar 15 13:12:10 2010 @@ -304,6 +304,8 @@ libxml2_configure_cmd = configure_cmd + [ '--without-python', '--with-iconv=%s' % prefix] + if libxml2_version and tuple(map(tryint, libxml2_version.split('.'))) >= (2,7,3): + libxml2_configure_cmd.append('--enable-rebuild-docs=no') cmmi(libxml2_configure_cmd, libxml2_dir, multicore, **call_setup) # build libxslt From scoder at codespeak.net Mon Mar 15 13:50:17 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Mon, 15 Mar 2010 13:50:17 +0100 (CET) Subject: [Lxml-checkins] r72240 - lxml/trunk Message-ID: <20100315125017.BD9B5282B9C@codespeak.net> Author: scoder Date: Mon Mar 15 13:50:15 2010 New Revision: 72240 Modified: lxml/trunk/ (props changed) lxml/trunk/buildlibxml.py Log: r5521 at lenny: sbehnel | 2010-03-15 13:50:10 +0100 safety fix Modified: lxml/trunk/buildlibxml.py ============================================================================== --- lxml/trunk/buildlibxml.py (original) +++ lxml/trunk/buildlibxml.py Mon Mar 15 13:50:15 2010 @@ -304,8 +304,11 @@ libxml2_configure_cmd = configure_cmd + [ '--without-python', '--with-iconv=%s' % prefix] - if libxml2_version and tuple(map(tryint, libxml2_version.split('.'))) >= (2,7,3): - libxml2_configure_cmd.append('--enable-rebuild-docs=no') + try: + if libxml2_version and tuple(map(tryint, libxml2_version.split('.'))) >= (2,7,3): + libxml2_configure_cmd.append('--enable-rebuild-docs=no') + except Exception: + pass # this isn't required, so ignore any errors cmmi(libxml2_configure_cmd, libxml2_dir, multicore, **call_setup) # build libxslt From scoder at codespeak.net Thu Mar 18 08:45:54 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 18 Mar 2010 08:45:54 +0100 (CET) Subject: [Lxml-checkins] r72351 - lxml/trunk Message-ID: <20100318074554.1B2A8282BDE@codespeak.net> Author: scoder Date: Thu Mar 18 08:45:52 2010 New Revision: 72351 Modified: lxml/trunk/ (props changed) lxml/trunk/INSTALL.txt Log: r5523 at lenny: sbehnel | 2010-03-18 08:45:46 +0100 remove installation instructions for ActivePython's pypm Modified: lxml/trunk/INSTALL.txt ============================================================================== --- lxml/trunk/INSTALL.txt (original) +++ lxml/trunk/INSTALL.txt Thu Mar 18 08:45:52 2010 @@ -8,10 +8,9 @@ .. 1 Requirements 2 Installation - 3 Installation in ActivePython - 4 Building lxml from sources - 5 MS Windows - 6 MacOS-X + 3 Building lxml from sources + 4 MS Windows + 5 MacOS-X Requirements @@ -71,30 +70,6 @@ .. _PyPI: http://cheeseshop.python.org/pypi/lxml -Installation in ActivePython ----------------------------- - -ActiveState_ provides ready-made lxml builds for different platforms -in its `package repository`_ for the PyPM_ package manager. PyPM is -similar to apt-get in that there is a repository of automaticaly -pre-built packages for Windows, Mac and Linux. - -To install lxml in ActivePython, type the following on one of these -operating systems:: - - $ pypm install lxml - -To test the installation, try:: - - $ python -c "import lxml; print lxml.__file__" - -This should show you the directory where the package was installed. - -.. _ActiveState: http://www.activestate.com/ -.. _PyPM: http://docs.activestate.com/activepython/2.6/pypm.html -.. _`package repository`: http://pypm.activestate.com/ - - Building lxml from sources -------------------------- From scoder at codespeak.net Thu Mar 18 08:47:08 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 18 Mar 2010 08:47:08 +0100 (CET) Subject: [Lxml-checkins] r72352 - lxml/branch/lxml-2.2 Message-ID: <20100318074708.F056B282BDE@codespeak.net> Author: scoder Date: Thu Mar 18 08:47:07 2010 New Revision: 72352 Modified: lxml/branch/lxml-2.2/ (props changed) lxml/branch/lxml-2.2/INSTALL.txt (contents, props changed) Log: trunk merge: docs Modified: lxml/branch/lxml-2.2/INSTALL.txt ============================================================================== --- lxml/branch/lxml-2.2/INSTALL.txt (original) +++ lxml/branch/lxml-2.2/INSTALL.txt Thu Mar 18 08:47:07 2010 @@ -8,10 +8,9 @@ .. 1 Requirements 2 Installation - 3 Installation in ActivePython - 4 Building lxml from sources - 5 MS Windows - 6 MacOS-X + 3 Building lxml from sources + 4 MS Windows + 5 MacOS-X Requirements @@ -71,30 +70,6 @@ .. _PyPI: http://cheeseshop.python.org/pypi/lxml -Installation in ActivePython ----------------------------- - -ActiveState_ provides ready-made lxml builds for different platforms -in its `package repository`_ for the PyPM_ package manager. PyPM is -similar to apt-get in that there is a repository of automaticaly -pre-built packages for Windows, Mac and Linux. - -To install lxml in ActivePython, type the following on one of these -operating systems:: - - $ pypm install lxml - -To test the installation, try:: - - $ python -c "import lxml; print lxml.__file__" - -This should show you the directory where the package was installed. - -.. _ActiveState: http://www.activestate.com/ -.. _PyPM: http://docs.activestate.com/activepython/2.6/pypm.html -.. _`package repository`: http://pypm.activestate.com/ - - Building lxml from sources -------------------------- From scoder at codespeak.net Sat Mar 20 10:45:59 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Sat, 20 Mar 2010 10:45:59 +0100 (CET) Subject: [Lxml-checkins] r72437 - in lxml/trunk: . src/lxml/tests Message-ID: <20100320094559.19E1C282BD4@codespeak.net> Author: scoder Date: Sat Mar 20 10:45:55 2010 New Revision: 72437 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/tests/test_elementtree.py Log: r5526 at lenny: sbehnel | 2010-03-20 10:45:45 +0100 ET 1.3 test fix Modified: lxml/trunk/src/lxml/tests/test_elementtree.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_elementtree.py (original) +++ lxml/trunk/src/lxml/tests/test_elementtree.py Sat Mar 20 10:45:55 2010 @@ -3426,9 +3426,6 @@ class Target(object): pass - parser = self.etree.XMLParser() - self.assertEquals(None, parser.target) - target = Target() parser = self.etree.XMLParser(target=target) From scoder at codespeak.net Thu Mar 25 14:15:27 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 25 Mar 2010 14:15:27 +0100 (CET) Subject: [Lxml-checkins] r72800 - lxml/trunk Message-ID: <20100325131527.52B31282BE5@codespeak.net> Author: scoder Date: Thu Mar 25 14:15:25 2010 New Revision: 72800 Modified: lxml/trunk/ (props changed) lxml/trunk/buildlibxml.py Log: r5528 at lenny: sbehnel | 2010-03-25 14:15:20 +0100 Py3 build fix Modified: lxml/trunk/buildlibxml.py ============================================================================== --- lxml/trunk/buildlibxml.py (original) +++ lxml/trunk/buildlibxml.py Thu Mar 25 14:15:25 2010 @@ -126,7 +126,7 @@ match = version_re.search(fn) if match: version_string = match.group(1) - versions.append((map(tryint, version_string.split('.')), + versions.append((tuple(map(tryint, version_string.split('.'))), version_string)) if versions: versions.sort() @@ -274,7 +274,7 @@ if sys.platform in ('darwin',): import platform # We compile Universal if we are on a machine > 10.3 - major_version, minor_version = map(int, platform.mac_ver()[0].split('.')[:2]) + major_version, minor_version = tuple(map(int, platform.mac_ver()[0].split('.')[:2])) if major_version > 7: env = os.environ.copy() if minor_version < 6: From scoder at codespeak.net Thu Mar 25 14:17:14 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Thu, 25 Mar 2010 14:17:14 +0100 (CET) Subject: [Lxml-checkins] r72801 - lxml/trunk Message-ID: <20100325131714.CC8DF282BE5@codespeak.net> Author: scoder Date: Thu Mar 25 14:17:13 2010 New Revision: 72801 Modified: lxml/trunk/ (props changed) lxml/trunk/buildlibxml.py Log: r5530 at lenny: sbehnel | 2010-03-25 14:17:06 +0100 Py3 build fix Modified: lxml/trunk/buildlibxml.py ============================================================================== --- lxml/trunk/buildlibxml.py (original) +++ lxml/trunk/buildlibxml.py Thu Mar 25 14:17:13 2010 @@ -5,7 +5,7 @@ from urlparse import urlsplit, urljoin from urllib import urlretrieve except ImportError: - from urllib.parse import urlsplit + from urllib.parse import urlsplit, urljoin from urllib.request import urlretrieve multi_make_options = [] From scoder at codespeak.net Fri Mar 26 06:45:25 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 26 Mar 2010 06:45:25 +0100 (CET) Subject: [Lxml-checkins] r72883 - in lxml/trunk: . doc src/lxml Message-ID: <20100326054525.628E3282BF0@codespeak.net> Author: scoder Date: Fri Mar 26 06:45:19 2010 New Revision: 72883 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/tutorial.txt lxml/trunk/src/lxml/apihelpers.pxi Log: r5532 at lenny: sbehnel | 2010-03-26 06:45:12 +0100 doc updates Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Fri Mar 26 06:45:19 2010 @@ -1013,13 +1013,41 @@ Hello World +lxml.etree allows you to look up the current namespaces defined for a +node through the ``.nsmap`` property: + +.. sourcecode:: pycon + + >>> xhtml.nsmap + {None: 'http://www.w3.org/1999/xhtml'} + +Note, however, that this includes all prefixes known in the context of +an Element, not only those that it defines itself. + +.. sourcecode:: pycon + + >>> root = etree.Element('root', nsmap={'a': 'http://a.b/c'}) + >>> child = etree.SubElement(root, 'child', + ... nsmap={'b': 'http://b.c/d'}) + >>> len(root.nsmap) + 1 + >>> len(child.nsmap) + 2 + >>> child.nsmap['a'] + 'http://a.b/c' + >>> child.nsmap['b'] + 'http://b.c/d' + +Therefore, modifying the returned dict cannot have any meaningful +impact on the Element. Any changes to it are ignored. + Namespaces on attributes work alike, but since version 2.3, lxml.etree will make sure that the attribute uses a prefixed namespace -declaration. This is because unprefixed attribute names are not -considered being in a namespace by the XML namespace specification -(`section 6.2`_), so they may end up loosing their namespace on a -serialise-parse roundtrip, even if they appear in a namespaced -element. +declaration if one was defined. This is because unprefixed attribute +names are not considered being in a namespace by the XML namespace +specification (`section 6.2`_), so they may end up loosing their +namespace on a serialise-parse roundtrip, even if they appear in a +namespaced element. .. sourcecode:: pycon Modified: lxml/trunk/src/lxml/apihelpers.pxi ============================================================================== --- lxml/trunk/src/lxml/apihelpers.pxi (original) +++ lxml/trunk/src/lxml/apihelpers.pxi Fri Mar 26 06:45:19 2010 @@ -298,8 +298,8 @@ xmlNode* node cdef int _removeUnusedNamespaceDeclarations(xmlNode* c_element) except -1: - u"""Remove any namespace declarations from a subtree that do not used - by any of its elements (or attributes). + u"""Remove any namespace declarations from a subtree that are not used by + any of its elements (or attributes). """ cdef _ns_node_ref* c_ns_list cdef _ns_node_ref* c_nsref_ptr From scoder at codespeak.net Fri Mar 26 20:35:57 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 26 Mar 2010 20:35:57 +0100 (CET) Subject: [Lxml-checkins] r72924 - in lxml/trunk: . src/lxml Message-ID: <20100326193557.07100282B9C@codespeak.net> Author: scoder Date: Fri Mar 26 20:35:56 2010 New Revision: 72924 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/parser.pxi Log: r5534 at lenny: sbehnel | 2010-03-26 08:11:00 +0100 parser exception refactoring to better match ET 1.3 Modified: lxml/trunk/src/lxml/parser.pxi ============================================================================== --- lxml/trunk/src/lxml/parser.pxi (original) +++ lxml/trunk/src/lxml/parser.pxi Fri Mar 26 20:35:56 2010 @@ -15,22 +15,22 @@ For compatibility with ElementTree 1.3 and later. """ - pass - -class XMLSyntaxError(ParseError): - u"""Syntax error while parsing an XML document. - """ def __init__(self, message, code, line, column): if python.PY_VERSION_HEX >= 0x02050000: # Python >= 2.5 uses new style class exceptions - super(_XMLSyntaxError, self).__init__(message) + super(_ParseError, self).__init__(message) else: - ParseError.__init__(self, message) + _XMLSyntaxError.__init__(self, message) self.position = (line, column) self.code = code -cdef object _XMLSyntaxError -_XMLSyntaxError = XMLSyntaxError +class XMLSyntaxError(ParseError): + u"""Syntax error while parsing an XML document. + """ + pass + +cdef object _XMLSyntaxError = XMLSyntaxError +cdef object _ParseError = ParseError class ParserError(LxmlError): u"""Internal lxml parser error. From scoder at codespeak.net Fri Mar 26 20:36:01 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 26 Mar 2010 20:36:01 +0100 (CET) Subject: [Lxml-checkins] r72925 - in lxml/trunk: . src/lxml Message-ID: <20100326193601.14B78282BF3@codespeak.net> Author: scoder Date: Fri Mar 26 20:35:59 2010 New Revision: 72925 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r5535 at lenny: sbehnel | 2010-03-26 08:11:36 +0100 doc fix Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Mar 26 20:35:59 2010 @@ -1212,10 +1212,13 @@ Iterate over the following or preceding siblings of this element. - The direction is determined by the 'preceding' keyword which defaults - to False, i.e. forward iteration over the following siblings. The - generated elements can be restricted to a specific tag name with the - 'tag' keyword. + The direction is determined by the 'preceding' keyword which + defaults to False, i.e. forward iteration over the following + siblings. When True, the iterator yields the preceding + siblings in reverse document order, i.e. starting right before + the current element and going left. The generated elements + can be restricted to a specific tag name with the 'tag' + keyword. """ return SiblingsIterator(self, tag, preceding=preceding) @@ -1845,7 +1848,7 @@ u"""iterfind(self, path) Iterates over all elements matching the ElementPath expression. - Same as getroot().finditer(path). + Same as getroot().iterfind(path). """ self._assertHasRoot() root = self.getroot() From scoder at codespeak.net Fri Mar 26 20:36:04 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 26 Mar 2010 20:36:04 +0100 (CET) Subject: [Lxml-checkins] r72926 - in lxml/trunk: . src/lxml Message-ID: <20100326193604.342A1282B9C@codespeak.net> Author: scoder Date: Fri Mar 26 20:36:02 2010 New Revision: 72926 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/_elementpath.py Log: r5536 at lenny: sbehnel | 2010-03-26 08:12:15 +0100 major ElementPath update to match ET 1.3 (as of Py3.2) Modified: lxml/trunk/src/lxml/_elementpath.py ============================================================================== --- lxml/trunk/src/lxml/_elementpath.py (original) +++ lxml/trunk/src/lxml/_elementpath.py Fri Mar 26 20:36:02 2010 @@ -1,6 +1,6 @@ # # ElementTree -# $Id: ElementPath.py 3276 2007-09-12 06:52:30Z fredrik $ +# $Id: ElementPath.py 3375 2008-02-13 08:05:08Z fredrik $ # # limited xpath support for element trees # @@ -9,8 +9,12 @@ # 2003-05-28 fl added support for // etc # 2003-08-27 fl fixed parsing of periods in element names # 2007-09-10 fl new selection engine +# 2007-09-12 fl fixed parent selector +# 2007-09-13 fl added iterfind; changed findall to return a list +# 2007-11-30 fl added namespaces support +# 2009-10-30 fl added child element value filter # -# Copyright (c) 2003-2007 by Fredrik Lundh. All rights reserved. +# Copyright (c) 2003-2009 by Fredrik Lundh. All rights reserved. # # fredrik at pythonware.com # http://www.pythonware.com @@ -18,7 +22,7 @@ # -------------------------------------------------------------------- # The ElementTree toolkit is # -# Copyright (c) 1999-2007 by Fredrik Lundh +# Copyright (c) 1999-2009 by Fredrik Lundh # # By obtaining, using, and/or copying this software and/or its # associated documentation, you agree that you have read, understood, @@ -51,7 +55,7 @@ import re -xpath_tokenizer = re.compile( +xpath_tokenizer_re = re.compile( "(" "'[^']*'|\"[^\"]*\"|" "::|" @@ -59,15 +63,30 @@ "\.\.|" "\(\)|" "[/.*:\[\]\(\)@=])|" - "((?:\{[^}]+\})?[^/:\[\]\(\)@=\s]+)|" + "((?:\{[^}]+\})?[^/\[\]\(\)@=\s]+)|" "\s+" - ).findall + ) -def prepare_tag(next, token): +def xpath_tokenizer(pattern, namespaces=None): + for token in xpath_tokenizer_re.findall(pattern): + tag = token[1] + if tag and tag[0] != "{" and ":" in tag: + try: + prefix, uri = tag.split(":", 1) + if not namespaces: + raise KeyError + yield token[0], "{%s}%s" % (namespaces[prefix], uri) + except KeyError: + raise SyntaxError("prefix %r not found in prefix map" % prefix) + else: + yield token + + +def prepare_child(next, token): tag = token[1] def select(result): for elem in result: - for e in elem.iterchildren(tag=tag): + for e in elem.iterchildren(tag): yield e return select @@ -78,26 +97,26 @@ yield e return select -def prepare_dot(next, token): +def prepare_self(next, token): def select(result): return result return select -def prepare_iter(next, token): +def prepare_descendant(next, token): token = next() if token[0] == "*": tag = "*" elif not token[0]: tag = token[1] else: - raise SyntaxError + raise SyntaxError("invalid descendant") def select(result): for elem in result: - for e in elem.iterdescendants(tag=tag): + for e in elem.iterdescendants(tag): yield e return select -def prepare_dot_dot(next, token): +def prepare_parent(next, token): def select(result): for elem in result: parent = elem.getparent() @@ -106,52 +125,93 @@ return select def prepare_predicate(next, token): - # this one should probably be refactored... - token = next() - if token[0] == "@": - # attribute - token = next() - if token[0]: - raise SyntaxError("invalid attribute predicate") - key = token[1] + # FIXME: replace with real parser!!! refs: + # http://effbot.org/zone/simple-iterator-parser.htm + # http://javascript.crockford.com/tdop/tdop.html + signature = [] + predicate = [] + while 1: token = next() if token[0] == "]": - def select(result): - for elem in result: - if elem.get(key) is not None: + break + if token[0] and token[0][:1] in "'\"": + token = "'", token[0][1:-1] + signature.append(token[0] or "-") + predicate.append(token[1]) + signature = "".join(signature) + # use signature to determine predicate type + if signature == "@-": + # [@attribute] predicate + key = predicate[1] + def select(result): + for elem in result: + if elem.get(key) is not None: + yield elem + return select + if signature == "@-='": + # [@attribute='value'] + key = predicate[1] + value = predicate[-1] + def select(result): + for elem in result: + if elem.get(key) == value: + yield elem + return select + if signature == "-" and not re.match("\d+$", predicate[0]): + # [tag] + tag = predicate[0] + def select(result): + for elem in result: + for _ in elem.iterchildren(tag): + yield elem + break + return select + if signature == "-='" and not re.match("\d+$", predicate[0]): + # [tag='value'] + tag = predicate[0] + value = predicate[-1] + def select(result): + for elem in result: + for e in elem.iterchildren(tag): + if "".join(e.itertext()) == value: yield elem - elif token[0] == "=": - value = next()[0] - if value[:1] == "'" or value[:1] == '"': - value = value[1:-1] + break + return select + if signature == "-" or signature == "-()" or signature == "-()-": + # [index] or [last()] or [last()-index] + if signature == "-": + index = int(predicate[0]) - 1 + else: + if predicate[0] != "last": + raise SyntaxError("unsupported function") + if signature == "-()-": + try: + index = int(predicate[2]) - 1 + except ValueError: + raise SyntaxError("unsupported expression") else: - raise SyntaxError("invalid comparison target") - token = next() - def select(result): - for elem in result: - if elem.get(key) == value: - yield elem - if token[0] != "]": - raise SyntaxError("invalid attribute predicate") - elif not token[0]: - tag = token[1] - token = next() - if token[0] != "]": - raise SyntaxError("invalid node predicate") + index = -1 def select(result): for elem in result: - if find(elem, tag) is not None: - yield elem - else: - raise SyntaxError("invalid predicate") - return select + parent = elem.getparent() + if parent is None: + continue + try: + # FIXME: what if the selector is "*" ? + elems = list(parent.iterchildren(elem.tag)) + if elems[index] is elem: + yield elem + except IndexError: + pass + return select + raise SyntaxError("invalid predicate") ops = { - "": prepare_tag, + "": prepare_child, "*": prepare_star, - ".": prepare_dot, - "..": prepare_dot_dot, - "//": prepare_iter, + ".": prepare_self, + "..": prepare_parent, + "//": prepare_descendant, "[": prepare_predicate, } @@ -159,8 +219,10 @@ # -------------------------------------------------------------------- -def _build_path_iterator(path): +def _build_path_iterator(path, namespaces): # compile selector pattern + if path[-1:] == "/": + path = path + "*" # implicit all (FIXME: keep this?) try: return _cache[path] except KeyError: @@ -170,13 +232,12 @@ if path[:1] == "/": raise SyntaxError("cannot use absolute path on element") - stream = iter(xpath_tokenizer(path)) + stream = iter(xpath_tokenizer(path, namespaces)) try: _next = stream.next except AttributeError: # Python 3 - def _next(): - return next(stream) + _next = stream.__next__ token = _next() selector = [] while 1: @@ -196,9 +257,8 @@ ## # Iterate over the matching nodes -def iterfind(elem, path): - # execute selector pattern - selector = _build_path_iterator(path) +def iterfind(elem, path, namespaces=None): + selector = _build_path_iterator(path, namespaces) result = iter((elem,)) for select in selector: result = select(result) @@ -207,8 +267,8 @@ ## # Find first matching object. -def find(elem, path): - it = iterfind(elem, path) +def find(elem, path, namespaces=None): + it = iterfind(elem, path, namespaces) try: try: _next = it.next @@ -222,14 +282,14 @@ ## # Find all matching objects. -def findall(elem, path): - return list(iterfind(elem, path)) +def findall(elem, path, namespaces=None): + return list(iterfind(elem, path, namespaces)) ## # Find text for first matching object. -def findtext(elem, path, default=None): - el = find(elem, path) +def findtext(elem, path, default=None, namespaces=None): + el = find(elem, path, namespaces) if el is None: return default else: From scoder at codespeak.net Fri Mar 26 20:36:06 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 26 Mar 2010 20:36:06 +0100 (CET) Subject: [Lxml-checkins] r72927 - in lxml/trunk: . src/lxml Message-ID: <20100326193606.D64AA282BF5@codespeak.net> Author: scoder Date: Fri Mar 26 20:36:05 2010 New Revision: 72927 Modified: lxml/trunk/ (props changed) lxml/trunk/src/lxml/lxml.etree.pyx Log: r5537 at lenny: sbehnel | 2010-03-26 08:23:50 +0100 accept prefix-namespace mapping in ElementPath operations Modified: lxml/trunk/src/lxml/lxml.etree.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.etree.pyx (original) +++ lxml/trunk/src/lxml/lxml.etree.pyx Fri Mar 26 20:36:05 2010 @@ -1323,41 +1323,57 @@ return _makeElement(_tag, NULL, self._doc, None, None, None, attrib, nsmap, _extra) - def find(self, path): - u"""find(self, path) + def find(self, path, namespaces=None): + u"""find(self, path, namespaces=None) Finds the first matching subelement, by tag name or path. + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ if isinstance(path, QName): path = (path).text - return _elementpath.find(self, path) + return _elementpath.find(self, path, namespaces) - def findtext(self, path, default=None): - u"""findtext(self, path, default=None) + def findtext(self, path, default=None, namespaces=None): + u"""findtext(self, path, default=None, namespaces=None) Finds text for the first matching subelement, by tag name or path. + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ if isinstance(path, QName): path = (path).text - return _elementpath.findtext(self, path, default) + return _elementpath.findtext(self, path, default, namespaces) - def findall(self, path): - u"""findall(self, path) + def findall(self, path, namespaces=None): + u"""findall(self, path, namespaces=None) Finds all matching subelements, by tag name or path. + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ if isinstance(path, QName): path = (path).text - return _elementpath.findall(self, path) + return _elementpath.findall(self, path, namespaces) - def iterfind(self, path): - u"""iterfind(self, path) + def iterfind(self, path, namespaces=None): + u"""iterfind(self, path, namespaces=None) Iterates over all matching subelements, by tag name or path. + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ if isinstance(path, QName): path = (path).text - return _elementpath.iterfind(self, path) + return _elementpath.iterfind(self, path, namespaces) def xpath(self, _path, *, namespaces=None, extensions=None, smart_strings=True, **_variables): @@ -1796,11 +1812,15 @@ return () return root.iter(tag) - def find(self, path): - u"""find(self, path) + def find(self, path, namespaces=None): + u"""find(self, path, namespaces=None) Finds the first toplevel element with given tag. Same as ``tree.getroot().find(path)``. + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ self._assertHasRoot() root = self.getroot() @@ -1810,13 +1830,17 @@ path = u"." + path elif start == b"/": path = b"." + path - return root.find(path) + return root.find(path, namespaces) - def findtext(self, path, default=None): - u"""findtext(self, path, default=None) + def findtext(self, path, default=None, namespaces=None): + u"""findtext(self, path, default=None, namespaces=None) Finds the text for the first element matching the ElementPath expression. Same as getroot().findtext(path) + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ self._assertHasRoot() root = self.getroot() @@ -1826,13 +1850,17 @@ path = u"." + path elif start == b"/": path = b"." + path - return root.findtext(path, default) + return root.findtext(path, default, namespaces) - def findall(self, path): - u"""findall(self, path) + def findall(self, path, namespaces=None): + u"""findall(self, path, namespaces=None) Finds all elements matching the ElementPath expression. Same as getroot().findall(path). + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ self._assertHasRoot() root = self.getroot() @@ -1842,13 +1870,17 @@ path = u"." + path elif start == b"/": path = b"." + path - return root.findall(path) + return root.findall(path, namespaces) - def iterfind(self, path): - u"""iterfind(self, path) + def iterfind(self, path, namespaces=None): + u"""iterfind(self, path, namespaces=None) Iterates over all elements matching the ElementPath expression. Same as getroot().iterfind(path). + + The optional ``namespaces`` argument accepts a + prefix-to-namespace mapping that allows the usage of XPath + prefixes in the path expression. """ self._assertHasRoot() root = self.getroot() @@ -1858,7 +1890,7 @@ path = u"." + path elif start == b"/": path = b"." + path - return root.iterfind(path) + return root.iterfind(path, namespaces) def xpath(self, _path, *, namespaces=None, extensions=None, smart_strings=True, **_variables): From scoder at codespeak.net Fri Mar 26 20:36:09 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Fri, 26 Mar 2010 20:36:09 +0100 (CET) Subject: [Lxml-checkins] r72928 - in lxml/trunk: . doc Message-ID: <20100326193609.EF1EB282B9C@codespeak.net> Author: scoder Date: Fri Mar 26 20:36:08 2010 New Revision: 72928 Modified: lxml/trunk/ (props changed) lxml/trunk/doc/tutorial.txt Log: r5538 at lenny: sbehnel | 2010-03-26 20:35:50 +0100 doc update Modified: lxml/trunk/doc/tutorial.txt ============================================================================== --- lxml/trunk/doc/tutorial.txt (original) +++ lxml/trunk/doc/tutorial.txt Fri Mar 26 20:36:08 2010 @@ -990,12 +990,22 @@ .. _`namespace prefixes`: http://www.w3.org/TR/xml-names/#ns-qualnames -As you can see, prefixes only become important when you serialise the result. -However, the above code becomes somewhat verbose due to the lengthy namespace -names. And retyping or copying a string over and over again is error prone. -It is therefore common practice to store a namespace URI in a global variable. -To adapt the namespace prefixes for serialisation, you can also pass a mapping -to the Element factory, e.g. to define the default namespace: +The notation that ElementTree uses was originally brought up by `James +Clark`_. It has the major advantage of providing a universally +qualified name for a tag, regardless of any prefixes that may or may +not have been used or defined in a document. By moving the +indirection of prefixes out of the way, it makes namespace aware code +much clearer and safer. + +.. _`James Clark`: http://www.jclark.com/xml/xmlns.htm + +As you can see from the example, prefixes only become important when +you serialise the result. However, the above code looks somewhat +verbose due to the lengthy namespace names. And retyping or copying a +string over and over again is error prone. It is therefore common +practice to store a namespace URI in a global variable. To adapt the +namespace prefixes for serialisation, you can also pass a mapping to +the Element factory function, e.g. to define the default namespace: .. sourcecode:: pycon @@ -1043,11 +1053,11 @@ Namespaces on attributes work alike, but since version 2.3, lxml.etree will make sure that the attribute uses a prefixed namespace -declaration if one was defined. This is because unprefixed attribute -names are not considered being in a namespace by the XML namespace -specification (`section 6.2`_), so they may end up loosing their -namespace on a serialise-parse roundtrip, even if they appear in a -namespaced element. +declaration. This is because unprefixed attribute names are not +considered being in a namespace by the XML namespace specification +(`section 6.2`_), so they may end up loosing their namespace on a +serialise-parse roundtrip, even if they appear in a namespaced +element. .. sourcecode:: pycon From scoder at codespeak.net Wed Mar 31 12:52:28 2010 From: scoder at codespeak.net (scoder at codespeak.net) Date: Wed, 31 Mar 2010 12:52:28 +0200 (CEST) Subject: [Lxml-checkins] r73205 - in lxml/trunk: . src/lxml src/lxml/tests Message-ID: <20100331105228.7DFC7282BD8@codespeak.net> Author: scoder Date: Wed Mar 31 12:52:25 2010 New Revision: 73205 Modified: lxml/trunk/ (props changed) lxml/trunk/CHANGES.txt lxml/trunk/src/lxml/lxml.objectify.pyx lxml/trunk/src/lxml/tests/test_objectify.py Log: r5544 at lenny: sbehnel | 2010-03-31 12:52:16 +0200 make ObjectifiedDataElements hashable Modified: lxml/trunk/CHANGES.txt ============================================================================== --- lxml/trunk/CHANGES.txt (original) +++ lxml/trunk/CHANGES.txt Wed Mar 31 12:52:25 2010 @@ -58,6 +58,8 @@ Bugs fixed ---------- +* ObjectifiedDataElements in lxml.objectify were not hashable. + * Crash in XPath evaluation when reading smart strings from a document other than the original context document. Modified: lxml/trunk/src/lxml/lxml.objectify.pyx ============================================================================== --- lxml/trunk/src/lxml/lxml.objectify.pyx (original) +++ lxml/trunk/src/lxml/lxml.objectify.pyx Wed Mar 31 12:52:25 2010 @@ -699,6 +699,9 @@ def __richcmp__(self, other, int op): return _richcmpPyvals(self, other, op) + def __hash__(self): + return hash(_parseNumber(self)) + def __add__(self, other): return _numericValueOf(self) + _numericValueOf(other) @@ -795,6 +798,9 @@ def __richcmp__(self, other, int op): return _richcmpPyvals(self, other, op) + def __hash__(self): + return hash(textOf(self._c_node) or u'') + def __add__(self, other): text = _strValueOf(self) other = _strValueOf(other) @@ -845,6 +851,9 @@ else: return python.PyObject_RichCompare(self, None, op) + def __hash__(self): + return hash(None) + property pyval: def __get__(self): return None @@ -864,6 +873,9 @@ def __richcmp__(self, other, int op): return _richcmpPyvals(self, other, op) + def __hash__(self): + return hash(__parseBool(textOf(self._c_node))) + def __str__(self): return unicode(__parseBool(textOf(self._c_node))) Modified: lxml/trunk/src/lxml/tests/test_objectify.py ============================================================================== --- lxml/trunk/src/lxml/tests/test_objectify.py (original) +++ lxml/trunk/src/lxml/tests/test_objectify.py Wed Mar 31 12:52:25 2010 @@ -744,6 +744,7 @@ self.assertFalse(isinstance(root.none, objectify.NoneElement)) self.assertFalse(isinstance(root.none[0], objectify.NoneElement)) self.assert_(isinstance(root.none[1], objectify.NoneElement)) + self.assertEquals(hash(root.none[1]), hash(None)) self.assertEquals(root.none[1], None) self.assertFalse(root.none[1]) @@ -763,6 +764,7 @@ self.assertEquals(True + root.bool, True + root.bool) self.assertEquals(root.bool * root.bool, True * True) self.assertEquals(int(root.bool), int(True)) + self.assertEquals(hash(root.bool), hash(True)) self.assertEquals(complex(root.bool), complex(True)) self.assert_(isinstance(root.bool, objectify.BoolElement)) @@ -772,6 +774,7 @@ self.assertEquals(False + root.bool, False + root.bool) self.assertEquals(root.bool * root.bool, False * False) self.assertEquals(int(root.bool), int(False)) + self.assertEquals(hash(root.bool), hash(False)) self.assertEquals(complex(root.bool), complex(False)) self.assert_(isinstance(root.bool, objectify.BoolElement)) @@ -848,6 +851,11 @@ val = 5 self.assertRaises(TypeError, el.__mod__, val) + def test_type_str_hash(self): + v = "1" + el = objectify.DataElement(v) + self.assertEquals(hash(el), hash("1")) + def test_type_str_as_int(self): v = "1" el = objectify.DataElement(v) @@ -957,6 +965,10 @@ self.assert_(isinstance(value, objectify.IntElement)) self.assertEquals(value, 5) + def test_data_element_int_hash(self): + value = objectify.DataElement(123) + self.assertEquals(hash(value), hash(123)) + def test_type_float(self): Element = self.Element SubElement = self.etree.SubElement @@ -969,6 +981,10 @@ self.assert_(isinstance(value, objectify.FloatElement)) self.assertEquals(value, 5.5) + def test_data_element_float_hash(self): + value = objectify.DataElement(5.5) + self.assertEquals(hash(value), hash(5.5)) + def test_data_element_xsitypes(self): for xsi, objclass in xsitype2objclass.items(): # 1 is a valid value for all ObjectifiedDataElement classes