[lxml-dev] pyquery
Ian Bicking
ianb at colorstudy.com
Mon Dec 1 20:32:06 CET 2008
Olivier Lauzanne wrote:
> Hello,
>
> First thanks for lxml it's great.
> But I miss an interface on top of it. Something like jquery
> <http://jquery.com> or hpricot <http://code.whytheluckystiff.net/hpricot/>.
> Is there any work in progress to go toward something like that in python ?
>
> Missing a jquery like API in python, I started reproducing the jquery
> API in python by using lxml and released it a few days ago : pyquery
> <http://pypi.python.org/pypi/pyquery>
Some of this overlaps with what lxml.html already does, and some would
already be appropriate there. jQuery is a bit unusual in a Python
context, because it only deals with sets of elements. But it's not
unreasonable.
Some things in jQuery are a result of Javascript, where the equivalent
in Python would use a different syntax. For instance:
>>> p.attr("id")
'hello'
>>> p.attr("id", "plop")
[]
Would more typically be:
>>> p.attrib['id']
'hello'
>>> p.attrib['id'] = 'plop'
Javascript just doesn't have anything like __getitem__/__setitem__, and
doesn't really have getters and setters (at least on many browsers) so
it also has to use functions to get and set values. Also note you don't
allow things like p.attr('id', None), which should be valid (probably
meaning an attribute deletion).
Of course if you have CSS patches to CSSSelect (e.g., for :first --
though I thought that worked?) it would be good to have them in lxml
directly. Or if there are patches to make it easier to subclass
CSSSelector, that'd be fine too -- there's a number of useful extensions
to selectors in jQuery (e.g., input:checkbox), but it'd be nice to keep
CSSSelect itself more strictly CSS 3. The $() constructor is also
overloaded to do a lot more than selection, but that's kind of out of
style for Python -- alternate class methods would be preferable.
You also seem to be using lxml.etree in places where lxml.html would
definitely be better. E.g., for setting .html:
children = lxml.html.fragments_fromstring(html)
if children and isinstance(children[0], basestring):
parent.text = children.pop(0)
else:
parent.text = None
parent[:] = children
Also to get the HTML contents, (parent.text or '')+''.join(tostring(el)
for el in parent). I'm sure there's several other things.
--
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
More information about the lxml-dev
mailing list