[lxml-dev] Some HTML target processing issues

Stefan Behnel stefan_ml at behnel.de
Fri Aug 8 14:01:37 CEST 2008


Hi,

please keep the list involved.

Max Ivanov wrote:
> Then how could I add tolerance to unknown tag into HTMLParser?

You can't change the parser. It already parses with the "recover" option,
so it tries to keep going as long as possible. The problem here is that
when you use a target parser, it currently raises an exception at the end
if errors occurred during the parsing. It *might* be better to disable
that based on the recover option, but I'll have to look into that.


>> Can you come up with a patch with a couple of simple test cases for
>> src/lxml/tests/test_htmlparser.py that show the three problems you
>> describe?
>> That usually makes them easier (read: faster) to fix. There are some
>> target
>> parser test cases in test_etree.py and test_elementtree.py that you can
>> look at for inspiration.
>
> Thx, I'll try to write tests, but I've never done it before. It looks
> quite clear, but I've no idea how to run tests itself.

It's pretty easy. Each test has a method in the test case class that will
be called by the test runner. Reading a few of the existing test methods
should get you going. There is a script "test.py" in the root directory
that you can call to run the tests ("make test" does that, for example).
It will walk through the directory hierarchy and collect all test classes
it finds into a unit test suite (based on the unittest module), and then
run them. Try "python test.py -vv" to get some verbose output.

Stefan



More information about the lxml-dev mailing list