[lxml-dev] pyquery
Olivier Lauzanne
olauzanne at gmail.com
Tue Dec 2 14:23:30 CET 2008
On Mon, Dec 1, 2008 at 8:32 PM, Ian Bicking <ianb at colorstudy.com> wrote:
> Olivier Lauzanne wrote:
>
>> Hello,
>>
>> First thanks for lxml it's great.
>> But I miss an interface on top of it. Something like jquery <
>> http://jquery.com> or hpricot <http://code.whytheluckystiff.net/hpricot/
>> >.
>> Is there any work in progress to go toward something like that in python ?
>>
>> Missing a jquery like API in python, I started reproducing the jquery API
>> in python by using lxml and released it a few days ago : pyquery <
>> http://pypi.python.org/pypi/pyquery>
>>
>
> Some of this overlaps with what lxml.html already does, and some would
> already be appropriate there. jQuery is a bit unusual in a Python context,
> because it only deals with sets of elements. But it's not unreasonable.
>
In lxml.html, it seems there is very specific code for each html tag. I
think the css query approach is more powerfull and simple. And it can
provide a similar enough api. Instead of doing p.inputs you just do
p('input').
Dealing with sets of elements is something that I came to love about jquery.
And I don't think it's actually unpythonic in any way. It's just a different
approach. It's just like getting an element of a string gives you a string
back and not a character.
>
> Some things in jQuery are a result of Javascript, where the equivalent in
> Python would use a different syntax. For instance:
>
> >>> p.attr("id")
> 'hello'
> >>> p.attr("id", "plop")
> []
>
> Would more typically be:
>
> >>> p.attrib['id']
> 'hello'
> >>> p.attrib['id'] = 'plop'
>
> Javascript just doesn't have anything like __getitem__/__setitem__, and
> doesn't really have getters and setters (at least on many browsers) so it
> also has to use functions to get and set values. Also note you don't allow
> things like p.attr('id', None), which should be valid (probably meaning an
> attribute deletion).
>
attr('id', None) doesn't work, but it doesn't work in jquery either, there
actually is a method called removeAttr for that purpose.
You're right, jquery isn't always perfectly pythonic, it doesn't use
setters, and method names use the hungarian notation which isn't pythonic
and which I don't like. But it is object oriented (very much so) and allow
"streamed" method application, calling method over method over method on the
same object, which you can't do if you use a python setter. Also jquery
misses a method to access the full html string of a tag (you can only access
innerHtml) which sucks.
On the other hand it is has the advantage of being simple, well known, used
and documented API. So it felt like it would already be good to replicate
it. Also reproducing the jquery API has the advantage of making it trivial
to move a functionality in a web application from server to client, or
client to server. And then if people started using it and if there was a
consensus that it should be changed it could always be done then. But I'm
open enough if you have a vision of a better API, but it would have to be a
significantly better API to compensate for the fact of not using a well
known API.
>
> Of course if you have CSS patches to CSSSelect (e.g., for :first -- though
> I thought that worked?) it would be good to have them in lxml directly. Or
> if there are patches to make it easier to subclass CSSSelector, that'd be
> fine too -- there's a number of useful extensions to selectors in jQuery
> (e.g., input:checkbox), but it'd be nice to keep CSSSelect itself more
> strictly CSS 3. The $() constructor is also overloaded to do a lot more
> than selection, but that's kind of out of style for Python -- alternate
> class methods would be preferable.
>
I don't have patches yet, but I have seen where they can be done. I was
planning on monkey-patching, I perfectly agree that CSSSelect should remain
standard compliant. I'll check if I can do something cleaner than
monkey-patching.
>
> You also seem to be using lxml.etree in places where lxml.html would
> definitely be better. E.g., for setting .html:
>
> children = lxml.html.fragments_fromstring(html)
> if children and isinstance(children[0], basestring):
> parent.text = children.pop(0)
> else:
> parent.text = None
> parent[:] = children
>
> Also to get the HTML contents, (parent.text or '')+''.join(tostring(el) for
> el in parent). I'm sure there's several other things.
>
Thanks for the info, I'll look into it. pyquery was the occasion for me to
learn lxml so I may have overlooked some more things.
Also jquery hacks are a common practice when working on complex
applications, you can't understand the logic of the application (or just
don't want to modify it) so you just hack the modification in another layer
on top of the application, this layer can be javasscript but I think it's
kind of the same idea that is used in deliverance. I would like to have a
wsgi application where I could do some quick hacks like that on server side,
maybe in deliverance or in its own wsgi middleware. What do you think ?
Thanks for your answer,
Olivier Lauzanne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20081202/00851283/attachment.htm
More information about the lxml-dev
mailing list