[Lxml-checkins] r43961 - lxml/branch/html/src/lxml/html/tests
ianb at codespeak.net
ianb at codespeak.net
Fri Jun 1 07:09:38 CEST 2007
Author: ianb
Date: Fri Jun 1 07:09:38 2007
New Revision: 43961
Modified:
lxml/branch/html/src/lxml/html/tests/test_basic.txt
Log:
added some more tests for basic functionality
Modified: lxml/branch/html/src/lxml/html/tests/test_basic.txt
==============================================================================
--- lxml/branch/html/src/lxml/html/tests/test_basic.txt (original)
+++ lxml/branch/html/src/lxml/html/tests/test_basic.txt Fri Jun 1 07:09:38 2007
@@ -1,6 +1,6 @@
lxml.html adds a find_class method to elements::
- >>> from lxml.html import HTML, tostring
+ >>> from lxml.html import HTML, tostring, parse_element
>>> from lxml.html.clean import clean, clean_html
>>> from lxml.html import usedoctest
>>> h = HTML('''
@@ -25,7 +25,7 @@
['P1', 'P2']
Also added is a get_rel_links, which you can use to search for links
-like ``<a rel="$something">``:
+like ``<a rel="$something">``::
>>> h = HTML('''
... <a href="1">test 1</a>
@@ -37,4 +37,46 @@
>>> print [e.attrib['href'] for e in h.find_rel_links('nofollow')]
[]
+Another method is ``get_element_by_id`` that does what it says::
+ >>> print tostring(HTML('''
+ ... <div>
+ ... <span id="test">stuff</span>
+ ... </div>''').get_element_by_id('test'))
+ <span id="test">stuff</span>
+
+Or to get the content of an element without the tags, use text_content()::
+
+ >>> el = parse_element('''
+ ... <div>This is <a href="foo">a <b>bold</b> link</a></div>''')
+ >>> el.text_content()
+ 'This is a bold link'
+
+Or drop both tags (leaving content) or the entire element, like::
+
+ >>> doc = HTML('''
+ ... <html>
+ ... <body>
+ ... <div id="body">
+ ... This is a <a href="foo" id="link">test</a> of stuff.
+ ... </div>
+ ... <div>footer</div>
+ ... </body>
+ ... </html>''')
+ >>> doc.get_element_by_id('link').drop_tag()
+ >>> print tostring(doc)
+ <html>
+ <body>
+ <div id="body">
+ This is a test of stuff.
+ </div>
+ <div>footer</div>
+ </body>
+ </html>
+ >>> doc.get_element_by_id('body').drop_element()
+ >>> print tostring(doc)
+ <html>
+ <body>
+ <div>footer</div>
+ </body>
+ </html>
More information about the lxml-checkins
mailing list