[Lxml-checkins] r43961 - lxml/branch/html/src/lxml/html/tests

ianb at codespeak.net ianb at codespeak.net
Fri Jun 1 07:09:38 CEST 2007


Author: ianb
Date: Fri Jun  1 07:09:38 2007
New Revision: 43961

Modified:
   lxml/branch/html/src/lxml/html/tests/test_basic.txt
Log:
added some more tests for basic functionality

Modified: lxml/branch/html/src/lxml/html/tests/test_basic.txt
==============================================================================
--- lxml/branch/html/src/lxml/html/tests/test_basic.txt	(original)
+++ lxml/branch/html/src/lxml/html/tests/test_basic.txt	Fri Jun  1 07:09:38 2007
@@ -1,6 +1,6 @@
 lxml.html adds a find_class method to elements::
 
-    >>> from lxml.html import HTML, tostring
+    >>> from lxml.html import HTML, tostring, parse_element
     >>> from lxml.html.clean import clean, clean_html
     >>> from lxml.html import usedoctest
     >>> h = HTML('''
@@ -25,7 +25,7 @@
     ['P1', 'P2']
 
 Also added is a get_rel_links, which you can use to search for links
-like ``<a rel="$something">``:
+like ``<a rel="$something">``::
 
     >>> h = HTML('''
     ... <a href="1">test 1</a>
@@ -37,4 +37,46 @@
     >>> print [e.attrib['href'] for e in h.find_rel_links('nofollow')]
     []
 
+Another method is ``get_element_by_id`` that does what it says::
 
+    >>> print tostring(HTML('''
+    ... <div>
+    ...  <span id="test">stuff</span>
+    ... </div>''').get_element_by_id('test'))
+    <span id="test">stuff</span>
+
+Or to get the content of an element without the tags, use text_content()::
+
+    >>> el = parse_element('''
+    ... <div>This is <a href="foo">a <b>bold</b> link</a></div>''')
+    >>> el.text_content()
+    'This is a bold link'
+
+Or drop both tags (leaving content) or the entire element, like::
+
+    >>> doc = HTML('''
+    ... <html>
+    ...  <body>
+    ...   <div id="body">
+    ...    This is a <a href="foo" id="link">test</a> of stuff.
+    ...   </div>
+    ...   <div>footer</div>
+    ...  </body>
+    ... </html>''')
+    >>> doc.get_element_by_id('link').drop_tag()
+    >>> print tostring(doc)
+    <html>
+     <body>
+      <div id="body">
+       This is a test of stuff.
+      </div>
+      <div>footer</div>
+     </body>
+    </html>
+    >>> doc.get_element_by_id('body').drop_element()
+    >>> print tostring(doc)
+    <html>
+     <body>
+      <div>footer</div>
+     </body>
+    </html>


More information about the lxml-checkins mailing list