[lxml-dev] pyquery
Olivier Lauzanne
olauzanne at gmail.com
Fri Dec 5 16:27:26 CET 2008
I released pyquery 0.2 with a much more complete API.
http://pypi.python.org/pypi/pyquery
On Wed, Dec 3, 2008 at 7:40 PM, Ian Bicking <ianb at colorstudy.com> wrote:
> Olivier Lauzanne wrote:
>
>>
>>
>> On Mon, Dec 1, 2008 at 8:32 PM, Ian Bicking <ianb at colorstudy.com <mailto:
>> ianb at colorstudy.com>> wrote:
>>
>> Olivier Lauzanne wrote:
>>
>> Hello,
>>
>> First thanks for lxml it's great.
>> But I miss an interface on top of it. Something like jquery
>> <http://jquery.com> or hpricot
>> <http://code.whytheluckystiff.net/hpricot/>.
>>
>> Is there any work in progress to go toward something like that
>> in python ?
>>
>> Missing a jquery like API in python, I started reproducing the
>> jquery API in python by using lxml and released it a few days
>> ago : pyquery <http://pypi.python.org/pypi/pyquery>
>>
>>
>> Some of this overlaps with what lxml.html already does, and some
>> would already be appropriate there. jQuery is a bit unusual in a
>> Python context, because it only deals with sets of elements. But
>> it's not unreasonable.
>>
>>
>> In lxml.html, it seems there is very specific code for each html tag. I
>> think the css query approach is more powerfull and simple. And it can
>> provide a similar enough api. Instead of doing p.inputs you just do
>> p('input').
>>
>
> In most cases there's something distinct about those attributes. For
> instance p.inputs gives you special form fields. If course
> p.cssselect('input,select,textarea') also works (and if you don't mind a
> honking long XPath query you could do that too).
>
> Dealing with sets of elements is something that I came to love about
>> jquery. And I don't think it's actually unpythonic in any way. It's just a
>> different approach. It's just like getting an element of a string gives you
>> a string back and not a character.
>>
>
> Well... it is unpythonic in that sets and items are treated differently in
> Python (except the oddball case of strings, as you mention). It's more a
> question of whether it is justifiably unpythonic... and I'm not disputing
> that it can be.
>
> Some things in jQuery are a result of Javascript, where the
>> equivalent in Python would use a different syntax. For instance:
>>
>> >>> p.attr("id")
>> 'hello'
>> >>> p.attr("id", "plop")
>> []
>>
>> Would more typically be:
>>
>> >>> p.attrib['id']
>> 'hello'
>> >>> p.attrib['id'] = 'plop'
>>
>> Javascript just doesn't have anything like __getitem__/__setitem__,
>> and doesn't really have getters and setters (at least on many
>> browsers) so it also has to use functions to get and set values.
>> Also note you don't allow things like p.attr('id', None), which
>> should be valid (probably meaning an attribute deletion).
>>
>>
>> attr('id', None) doesn't work, but it doesn't work in jquery either, there
>> actually is a method called removeAttr for that purpose.
>>
>
> Well, it would be easy to make it work, just don't use None as your
> sentinel.
>
It works in the 0.2 version that I just released.
>
>
> You're right, jquery isn't always perfectly pythonic, it doesn't use
>> setters, and method names use the hungarian notation which isn't pythonic
>> and which I don't like. But it is object oriented (very much so) and allow
>> "streamed" method application, calling method over method over method on the
>> same object, which you can't do if you use a python setter. Also jquery
>> misses a method to access the full html string of a tag (you can only access
>> innerHtml) which sucks.
>>
>
> There's a very small (4-line?) outerHtml plugin for jquery, BTW.
>
Cool.
>
>
> On the other hand it is has the advantage of being simple, well known,
>> used and documented API. So it felt like it would already be good to
>> replicate it. Also reproducing the jquery API has the advantage of making it
>> trivial to move a functionality in a web application from server to client,
>> or client to server. And then if people started using it and if there was a
>> consensus that it should be changed it could always be done then. But I'm
>> open enough if you have a vision of a better API, but it would have to be a
>> significantly better API to compensate for the fact of not using a well
>> known API.
>>
>
> I think there are arguably places where setters and getters are just
> simpler and look nicer. I guess I see the jQuery technique for these
> specifically as a way of turning a deficiency in Javascript (lack of getters
> and setters) into an advantage (chaining)... but I'm not sure it's enough of
> an advantage to make it worth it.
>
> For instance, el.html and el.html = '...' seems nicer to me than el.html()
> and el.html('...'), and all you lose is the ability to do something like
> el.html('...').attr('foo', 'bar'), and that doesn't seem like such a big
> thing.
>
You're right. But I still think that the fact of being compatible with a
known API is good.
>
> Also there's two APIs: jQuery and lxml. There's some advantage to reusing
> the lxml APIs as well, I think, so that for instance el.attrib and
> el.get().attrib are the same. (I'm not sure you actually implemented
> .get()?)
>
No this get is not implemented yet. It seems that it's in jQuery only for
backward compatibility http://docs.jquery.com/Core/get
>
> It might be good, or it might be sloppy, to actually support both APIs to
> the degree they don't overlap (e.g., .attr vs. .attrib).
>
Gael Pasgrimaud <http://www.bitbucket.org/gawel> started contributing to
pyquery (and he contributed a lot !) and he created a more pythonic API for
the attributes alongside the jQuery one.
>
> Of course if you have CSS patches to CSSSelect (e.g., for :first --
>> though I thought that worked?) it would be good to have them in lxml
>> directly. Or if there are patches to make it easier to subclass
>> CSSSelector, that'd be fine too -- there's a number of useful
>> extensions to selectors in jQuery (e.g., input:checkbox), but it'd
>> be nice to keep CSSSelect itself more strictly CSS 3. The $()
>> constructor is also overloaded to do a lot more than selection, but
>> that's kind of out of style for Python -- alternate class methods
>> would be preferable.
>>
>>
>> I don't have patches yet, but I have seen where they can be done. I was
>> planning on monkey-patching, I perfectly agree that CSSSelect should remain
>> standard compliant. I'll check if I can do something cleaner than
>> monkey-patching.
>>
>
> Probably some of the functions would have to turn into methods of a class,
> and then you'd subclass that to add custom selectors and XPath translations
> of those selectors.
>
Didn't had time for it yet, but I'll look into it.
>
>
> You also seem to be using lxml.etree in places where lxml.html would
>> definitely be better. E.g., for setting .html:
>>
>> children = lxml.html.fragments_fromstring(html)
>> if children and isinstance(children[0], basestring):
>> parent.text = children.pop(0)
>> else:
>> parent.text = None
>> parent[:] = children
>>
>> Also to get the HTML contents, (parent.text or
>> '')+''.join(tostring(el) for el in parent). I'm sure there's
>> several other things.
>>
>>
>> Thanks for the info, I'll look into it. pyquery was the occasion for me to
>> learn lxml so I may have overlooked some more things.
>>
>> Also jquery hacks are a common practice when working on complex
>> applications, you can't understand the logic of the application (or just
>> don't want to modify it) so you just hack the modification in another layer
>> on top of the application, this layer can be javasscript but I think it's
>> kind of the same idea that is used in deliverance. I would like to have a
>> wsgi application where I could do some quick hacks like that on server side,
>> maybe in deliverance or in its own wsgi middleware. What do you think ?
>>
>
> Yeah, that could be possible -- people have asked for the ability to do
> arbitrary code-based transitions in Deliverance -- for the reasons you
> describe, like not wanting to touch the underlying application -- and this
> would probably be a very comfortable technique for people, especially if
> they are more front-end oriented. Like people have asked for the ability to
> do something that I guess would be expressed like doc('ul#menu
> li').prepend('>), when they want some kind of text separators in a list.
>
Gael also created an api for getting urls from wsgi applications so I think
pyquery is getting really close from something that is actually usable :)
-
Olivier Lauzanne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20081205/de331fa8/attachment-0001.htm
More information about the lxml-dev
mailing list