[lxml-dev] html branch
Ian Bicking
ianb at colorstudy.com
Tue May 29 17:37:04 CEST 2007
I've started a branch with lxml.html, in
http://codespeak.net/svn/lxml/branch/html
It currently includes:
lxml.doctestcompare: XML/HTML doctests
lxml.usedoctest: enable the doctest from within a doctest
lxml.html.usedoctest: enable the doctest, using the HTML parser
lxml.html:
* lxml.html.HtmlMixin, defining on each element:
- remove_element: element removes itself from a tree
- remove_tag: element removes itself but not its children from a tree
- find_rel_links: find <a rel="?">
- find_class: find <* class="?">
* HTML: parser
* parse_elements: parse fragment, return list of elements
* parse_element: parse fragment, return single element
* Element: apparently a highly broken element factory (segfaults?!)
* tostring: HTML serialization
lxml.defs: lists of HTML tags (e.g., block_tags)
lxml.clean: clean Javascript and other problem code from HTML
lxml.rewritelinks: change the links in a document
lxml.htmldiff: make human-readable diffs and blame reports
The usedoctest modules are based on a really horrible hack. It seems to
work, except for some reason lxml/html/tests/test_clean.txt is sometimes
run without the doctest change. The other doctests aren't run like
this, and when you explicitly run the test (e.g., python test.py
test_clean) it runs fine. So something weird with the test runner, I guess.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
| Write code, do good | http://topp.openplans.org/careers
More information about the lxml-dev
mailing list