[lxml-dev] pyquery

Olivier Lauzanne olauzanne at gmail.com
Fri Dec 5 16:27:26 CET 2008


I released pyquery 0.2 with a much more complete API.
http://pypi.python.org/pypi/pyquery

On Wed, Dec 3, 2008 at 7:40 PM, Ian Bicking <ianb at colorstudy.com> wrote:

> Olivier Lauzanne wrote:
>
>>
>>
>> On Mon, Dec 1, 2008 at 8:32 PM, Ian Bicking <ianb at colorstudy.com <mailto:
>> ianb at colorstudy.com>> wrote:
>>
>>    Olivier Lauzanne wrote:
>>
>>        Hello,
>>
>>        First thanks for lxml it's great.
>>        But I miss an interface on top of it. Something like jquery
>>        <http://jquery.com> or hpricot
>>        <http://code.whytheluckystiff.net/hpricot/>.
>>
>>        Is there any work in progress to go toward something like that
>>        in python ?
>>
>>        Missing a jquery like API in python, I started reproducing the
>>        jquery API in python by using lxml and released it a few days
>>        ago : pyquery <http://pypi.python.org/pypi/pyquery>
>>
>>
>>    Some of this overlaps with what lxml.html already does, and some
>>    would already be appropriate there.  jQuery is a bit unusual in a
>>    Python context, because it only deals with sets of elements.  But
>>    it's not unreasonable.
>>
>>
>> In lxml.html, it seems there is very specific code for each html tag. I
>> think the css query approach is more powerfull and simple. And it can
>> provide a similar enough api. Instead of doing p.inputs you just do
>> p('input').
>>
>
> In most cases there's something distinct about those attributes.  For
> instance p.inputs gives you special form fields.  If course
> p.cssselect('input,select,textarea') also works (and if you don't mind a
> honking long XPath query you could do that too).
>
>  Dealing with sets of elements is something that I came to love about
>> jquery. And I don't think it's actually unpythonic in any way. It's just a
>> different approach. It's just like getting an element of a string gives you
>> a string back and not a character.
>>
>
> Well... it is unpythonic in that sets and items are treated differently in
> Python (except the oddball case of strings, as you mention).  It's more a
> question of whether it is justifiably unpythonic... and I'm not disputing
> that it can be.
>
>     Some things in jQuery are a result of Javascript, where the
>>    equivalent in Python would use a different syntax.  For instance:
>>
>>     >>> p.attr("id")
>>     'hello'
>>     >>> p.attr("id", "plop")
>>     []
>>
>>    Would more typically be:
>>
>>     >>> p.attrib['id']
>>     'hello'
>>     >>> p.attrib['id'] = 'plop'
>>
>>    Javascript just doesn't have anything like __getitem__/__setitem__,
>>    and doesn't really have getters and setters (at least on many
>>    browsers) so it also has to use functions to get and set values.
>>     Also note you don't allow things like p.attr('id', None), which
>>    should be valid (probably meaning an attribute deletion).
>>
>>
>> attr('id', None) doesn't work, but it doesn't work in jquery either, there
>> actually is a method called removeAttr for that purpose.
>>
>
> Well, it would be easy to make it work, just don't use None as your
> sentinel.
>

It works in the 0.2 version that I just released.


>
>
>  You're right, jquery isn't always perfectly pythonic, it doesn't use
>> setters, and method names use the hungarian notation which isn't pythonic
>> and which I don't like. But it is object oriented (very much so) and allow
>> "streamed" method application, calling method over method over method on the
>> same object, which you can't do if you use a python setter. Also jquery
>> misses a method to access the full html string of a tag (you can only access
>> innerHtml) which sucks.
>>
>
> There's a very small (4-line?) outerHtml plugin for jquery, BTW.
>

Cool.


>
>
>  On the other hand it is has the advantage of being simple,  well known,
>> used and documented API. So it felt like it would already be good to
>> replicate it. Also reproducing the jquery API has the advantage of making it
>> trivial to move a functionality in a web application from server to client,
>> or client to server. And then if people started using it and if there was a
>> consensus that it should be changed it could always be done then. But I'm
>> open enough if you have a vision of a better API, but it would have to be a
>> significantly better API to compensate for the fact of not using a well
>> known API.
>>
>
> I think there are arguably places where setters and getters are just
> simpler and look nicer.  I guess I see the jQuery technique for these
> specifically as a way of turning a deficiency in Javascript (lack of getters
> and setters) into an advantage (chaining)... but I'm not sure it's enough of
> an advantage to make it worth it.
>
> For instance, el.html and el.html = '...' seems nicer to me than el.html()
> and el.html('...'), and all you lose is the ability to do something like
> el.html('...').attr('foo', 'bar'), and that doesn't seem like such a big
> thing.
>

You're right. But I still think that the fact of being compatible with a
known API is good.


>
> Also there's two APIs: jQuery and lxml.  There's some advantage to reusing
> the lxml APIs as well, I think, so that for instance el.attrib and
> el.get().attrib are the same.  (I'm not sure you actually implemented
> .get()?)
>

No this get is not implemented yet. It seems that it's in jQuery only for
backward compatibility http://docs.jquery.com/Core/get


>
> It might be good, or it might be sloppy, to actually support both APIs to
> the degree they don't overlap (e.g., .attr vs. .attrib).
>

Gael Pasgrimaud <http://www.bitbucket.org/gawel> started contributing to
pyquery (and he contributed a lot !) and he created a more pythonic API for
the attributes alongside the jQuery one.


>
>     Of course if you have CSS patches to CSSSelect (e.g., for :first --
>>    though I thought that worked?) it would be good to have them in lxml
>>    directly.  Or if there are patches to make it easier to subclass
>>    CSSSelector, that'd be fine too -- there's a number of useful
>>    extensions to selectors in jQuery (e.g., input:checkbox), but it'd
>>    be nice to keep CSSSelect itself more strictly CSS 3.  The $()
>>    constructor is also overloaded to do a lot more than selection, but
>>    that's kind of out of style for Python -- alternate class methods
>>    would be preferable.
>>
>>
>> I don't have patches yet, but I have seen where they can be done. I was
>> planning on monkey-patching, I perfectly agree that CSSSelect should remain
>> standard compliant. I'll check if I can do something cleaner than
>> monkey-patching.
>>
>
> Probably some of the functions would have to turn into methods of a class,
> and then you'd subclass that to add custom selectors and XPath translations
> of those selectors.
>

Didn't had time for it yet, but I'll look into it.


>
>
>     You also seem to be using lxml.etree in places where lxml.html would
>>    definitely be better.  E.g., for setting .html:
>>
>>    children = lxml.html.fragments_fromstring(html)
>>    if children and isinstance(children[0], basestring):
>>       parent.text = children.pop(0)
>>    else:
>>       parent.text = None
>>    parent[:] = children
>>
>>    Also to get the HTML contents, (parent.text or
>>    '')+''.join(tostring(el) for el in parent).  I'm sure there's
>>    several other things.
>>
>>
>> Thanks for the info, I'll look into it. pyquery was the occasion for me to
>> learn lxml so I may have overlooked some more things.
>>
>> Also jquery hacks are a common practice when working on complex
>> applications, you can't understand the logic of the application (or just
>> don't want to modify it) so you just hack the modification in another layer
>> on top of the application, this layer can be javasscript but I think it's
>> kind of the same idea that is used in deliverance. I would like to have a
>> wsgi application where I could do some quick hacks like that on server side,
>> maybe in deliverance or in its own wsgi middleware. What do you think ?
>>
>
> Yeah, that could be possible -- people have asked for the ability to do
> arbitrary code-based transitions in Deliverance -- for the reasons you
> describe, like not wanting to touch the underlying application -- and this
> would probably be a very comfortable technique for people, especially if
> they are more front-end oriented.  Like people have asked for the ability to
> do something that I guess would be expressed like doc('ul#menu
> li').prepend('&gt;), when they want some kind of text separators in a list.
>

Gael also created an api for getting urls from wsgi applications so I think
pyquery is getting really close from something that is actually usable :)

-
Olivier Lauzanne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20081205/de331fa8/attachment-0001.htm 


More information about the lxml-dev mailing list