[lxml-dev] lhtml
Ian Bicking
ianb at colorstudy.com
Fri May 25 18:11:12 CEST 2007
Stefan Behnel wrote:
> Hi Ian,
>
> Ian Bicking wrote:
>> Ian Bicking wrote:
>>> I really want to take all our HTML-related routines and put them into a
>>> proper package
>> And maybe a bit of advice -- we could just do this as a set of functions
>> (what we currently have), or potentially explore objectify and add the
>> routines as methods. E.g., el.find_by_class('classname')
>
> You're not using objectify as a base, are you? I mean, HTML is mainly about
> text, so objectify will not help you much.
I'm not using it now, no. But if I used objectify as a base, it would
be to add methods like .html_serialize() to elements, or any number of
other handy methods. At least "handy" for dealing with the mixed
content that HTML has, which is relatively uncommon in other XML.
>> This feels like a cleaner API, but I'm worried that it will mean
>> problems when mixing non-objectify-HTML with other elements, and if
>> there's problems with threads or memory overhead, or any other issues.
>> I don't really mind functions, which is why I am unsure; OTOH, almost
>> every function has a first argument of "el", which makes them seem like
>> methods.
>
> What about implementing the HTML namespace in a couple of Element subclasses
> and add the methods where they are appropriate? That sounds like a nice API to me.
The HTML() parser doesn't actually use namespaces. Well, maybe it does
if you give it XHTML, or maybe you really have to use XML() to get that.
It's never come up because I don't deal with any XHTML sites (because
there are almost no XHTML sites ;).
I'm not entirely clear on how namespaces fit in. Most of the methods
would apply to all HTML elements, but HTML 4 elements aren't easy to
distinguish.
> Any chance you could post your code somewhere so that I could take a look at
> what you're really contributing here?
Sure; I started collecting a few of the routines from various libraries
yesterday. There's still stuff in Deliverance and htmldiff that I
haven't integrated. I haven't copied over any tests and there may be
broken imports in many of the modules, but it should give you a vague
idea of scope. (I'm actually looking for a home for htmldiff, so it's
possible it could also go in this library; it's at
https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/htmldiff2.py
and
https://svn.openplans.org/svn/opencore/trunk/opencore/nui/wiki/test_htmldiff2.txt)
Anyway, it's not too big so I'll just attach the stuff I have collected.
--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
| Write code, do good | http://topp.openplans.org/careers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lhtml.tar.gz
Type: application/x-gzip
Size: 5480 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070525/9e5fcd8c/attachment.bin
More information about the lxml-dev
mailing list