[lxml-dev] pyquery
Ian Bicking
ianb at colorstudy.com
Wed Dec 3 19:40:42 CET 2008
Olivier Lauzanne wrote:
>
>
> On Mon, Dec 1, 2008 at 8:32 PM, Ian Bicking <ianb at colorstudy.com
> <mailto:ianb at colorstudy.com>> wrote:
>
> Olivier Lauzanne wrote:
>
> Hello,
>
> First thanks for lxml it's great.
> But I miss an interface on top of it. Something like jquery
> <http://jquery.com> or hpricot
> <http://code.whytheluckystiff.net/hpricot/>.
>
> Is there any work in progress to go toward something like that
> in python ?
>
> Missing a jquery like API in python, I started reproducing the
> jquery API in python by using lxml and released it a few days
> ago : pyquery <http://pypi.python.org/pypi/pyquery>
>
>
> Some of this overlaps with what lxml.html already does, and some
> would already be appropriate there. jQuery is a bit unusual in a
> Python context, because it only deals with sets of elements. But
> it's not unreasonable.
>
>
> In lxml.html, it seems there is very specific code for each html tag. I
> think the css query approach is more powerfull and simple. And it can
> provide a similar enough api. Instead of doing p.inputs you just do
> p('input').
In most cases there's something distinct about those attributes. For
instance p.inputs gives you special form fields. If course
p.cssselect('input,select,textarea') also works (and if you don't mind a
honking long XPath query you could do that too).
> Dealing with sets of elements is something that I came to love about
> jquery. And I don't think it's actually unpythonic in any way. It's just
> a different approach. It's just like getting an element of a string
> gives you a string back and not a character.
Well... it is unpythonic in that sets and items are treated differently
in Python (except the oddball case of strings, as you mention). It's
more a question of whether it is justifiably unpythonic... and I'm not
disputing that it can be.
> Some things in jQuery are a result of Javascript, where the
> equivalent in Python would use a different syntax. For instance:
>
> >>> p.attr("id")
> 'hello'
> >>> p.attr("id", "plop")
> []
>
> Would more typically be:
>
> >>> p.attrib['id']
> 'hello'
> >>> p.attrib['id'] = 'plop'
>
> Javascript just doesn't have anything like __getitem__/__setitem__,
> and doesn't really have getters and setters (at least on many
> browsers) so it also has to use functions to get and set values.
> Also note you don't allow things like p.attr('id', None), which
> should be valid (probably meaning an attribute deletion).
>
>
> attr('id', None) doesn't work, but it doesn't work in jquery either,
> there actually is a method called removeAttr for that purpose.
Well, it would be easy to make it work, just don't use None as your
sentinel.
> You're right, jquery isn't always perfectly pythonic, it doesn't use
> setters, and method names use the hungarian notation which isn't
> pythonic and which I don't like. But it is object oriented (very much
> so) and allow "streamed" method application, calling method over method
> over method on the same object, which you can't do if you use a python
> setter. Also jquery misses a method to access the full html string of a
> tag (you can only access innerHtml) which sucks.
There's a very small (4-line?) outerHtml plugin for jquery, BTW.
> On the other hand it is has the advantage of being simple, well known,
> used and documented API. So it felt like it would already be good to
> replicate it. Also reproducing the jquery API has the advantage of
> making it trivial to move a functionality in a web application from
> server to client, or client to server. And then if people started using
> it and if there was a consensus that it should be changed it could
> always be done then. But I'm open enough if you have a vision of a
> better API, but it would have to be a significantly better API to
> compensate for the fact of not using a well known API.
I think there are arguably places where setters and getters are just
simpler and look nicer. I guess I see the jQuery technique for these
specifically as a way of turning a deficiency in Javascript (lack of
getters and setters) into an advantage (chaining)... but I'm not sure
it's enough of an advantage to make it worth it.
For instance, el.html and el.html = '...' seems nicer to me than
el.html() and el.html('...'), and all you lose is the ability to do
something like el.html('...').attr('foo', 'bar'), and that doesn't seem
like such a big thing.
Also there's two APIs: jQuery and lxml. There's some advantage to
reusing the lxml APIs as well, I think, so that for instance el.attrib
and el.get().attrib are the same. (I'm not sure you actually
implemented .get()?)
It might be good, or it might be sloppy, to actually support both APIs
to the degree they don't overlap (e.g., .attr vs. .attrib).
> Of course if you have CSS patches to CSSSelect (e.g., for :first --
> though I thought that worked?) it would be good to have them in lxml
> directly. Or if there are patches to make it easier to subclass
> CSSSelector, that'd be fine too -- there's a number of useful
> extensions to selectors in jQuery (e.g., input:checkbox), but it'd
> be nice to keep CSSSelect itself more strictly CSS 3. The $()
> constructor is also overloaded to do a lot more than selection, but
> that's kind of out of style for Python -- alternate class methods
> would be preferable.
>
>
> I don't have patches yet, but I have seen where they can be done. I was
> planning on monkey-patching, I perfectly agree that CSSSelect should
> remain standard compliant. I'll check if I can do something cleaner than
> monkey-patching.
Probably some of the functions would have to turn into methods of a
class, and then you'd subclass that to add custom selectors and XPath
translations of those selectors.
> You also seem to be using lxml.etree in places where lxml.html would
> definitely be better. E.g., for setting .html:
>
> children = lxml.html.fragments_fromstring(html)
> if children and isinstance(children[0], basestring):
> parent.text = children.pop(0)
> else:
> parent.text = None
> parent[:] = children
>
> Also to get the HTML contents, (parent.text or
> '')+''.join(tostring(el) for el in parent). I'm sure there's
> several other things.
>
>
> Thanks for the info, I'll look into it. pyquery was the occasion for me
> to learn lxml so I may have overlooked some more things.
>
> Also jquery hacks are a common practice when working on complex
> applications, you can't understand the logic of the application (or just
> don't want to modify it) so you just hack the modification in another
> layer on top of the application, this layer can be javasscript but I
> think it's kind of the same idea that is used in deliverance. I would
> like to have a wsgi application where I could do some quick hacks like
> that on server side, maybe in deliverance or in its own wsgi middleware.
> What do you think ?
Yeah, that could be possible -- people have asked for the ability to do
arbitrary code-based transitions in Deliverance -- for the reasons you
describe, like not wanting to touch the underlying application -- and
this would probably be a very comfortable technique for people,
especially if they are more front-end oriented. Like people have asked
for the ability to do something that I guess would be expressed like
doc('ul#menu li').prepend('>), when they want some kind of text
separators in a list.
--
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
More information about the lxml-dev
mailing list