[lxml-dev] pyquery

Ian Bicking ianb at colorstudy.com
Mon Dec 1 20:32:06 CET 2008


Olivier Lauzanne wrote:
> Hello,
> 
> First thanks for lxml it's great.
> But I miss an interface on top of it. Something like jquery 
> <http://jquery.com> or hpricot <http://code.whytheluckystiff.net/hpricot/>.
> Is there any work in progress to go toward something like that in python ?
> 
> Missing a jquery like API in python, I started reproducing the jquery 
> API in python by using lxml and released it a few days ago : pyquery 
> <http://pypi.python.org/pypi/pyquery>

Some of this overlaps with what lxml.html already does, and some would 
already be appropriate there.  jQuery is a bit unusual in a Python 
context, because it only deals with sets of elements.  But it's not 
unreasonable.

Some things in jQuery are a result of Javascript, where the equivalent 
in Python would use a different syntax.  For instance:

   >>> p.attr("id")
   'hello'
   >>> p.attr("id", "plop")
   []

Would more typically be:

   >>> p.attrib['id']
   'hello'
   >>> p.attrib['id'] = 'plop'

Javascript just doesn't have anything like __getitem__/__setitem__, and 
doesn't really have getters and setters (at least on many browsers) so 
it also has to use functions to get and set values.  Also note you don't 
allow things like p.attr('id', None), which should be valid (probably 
meaning an attribute deletion).

Of course if you have CSS patches to CSSSelect (e.g., for :first -- 
though I thought that worked?) it would be good to have them in lxml 
directly.  Or if there are patches to make it easier to subclass 
CSSSelector, that'd be fine too -- there's a number of useful extensions 
to selectors in jQuery (e.g., input:checkbox), but it'd be nice to keep 
CSSSelect itself more strictly CSS 3.  The $() constructor is also 
overloaded to do a lot more than selection, but that's kind of out of 
style for Python -- alternate class methods would be preferable.

You also seem to be using lxml.etree in places where lxml.html would 
definitely be better.  E.g., for setting .html:

children = lxml.html.fragments_fromstring(html)
if children and isinstance(children[0], basestring):
     parent.text = children.pop(0)
else:
     parent.text = None
parent[:] = children

Also to get the HTML contents, (parent.text or '')+''.join(tostring(el) 
for el in parent).  I'm sure there's several other things.


-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org


More information about the lxml-dev mailing list