[lxml-dev] lxml.html and forms
Stefan Behnel
stefan_ml at behnel.de
Mon Jul 16 10:19:49 CEST 2007
Hi Ian,
Ian Bicking wrote:
> I feel a little bad adding a bunch of stuff to lxml.html when it's
> supposed to get all stable.
And you should! You're lucky that there will be more than one 2.0alpha version. :)
No, seriously. We do this for fun, right? And adding cool stuff from time to
time is a pretty good way to keep up the motivation.
> So with my last commit you can do things like:
>
> from lxml.html import parse, open_in_browser
> url = 'http://tripsweb.rtachicago.com/'
> page = parse(url)
> page.make_links_absolute(url)
> form = page.forms[0]
> form.inputs['Orig'].value = '1535 W Leland'
> form.inputs['Dest'].value = '847 W Bertrand'
> res = form.submit()
> res_page = parse(res)
> res_page.make_links_absolute(res_page.geturl())
> open_in_browser(res_page)
Sounds like you should put something like that into the docs. (hint, hint)
> It's kind of like Mechanize, only of course better. There's some things
> I still haven't figured out. Some data structures are convenient, but
> maybe have some non-obvious aspects. Like form.inputs, which doesn't
> always return elements (for things like checkboxes it can return
> something that is more like a logical element).
Have I ever encouraged you to look at objectify? It has special data Elements
that behave like normal Python data classes, but are actually objects.
Something similar could apply here, you could use a string-like Element for
"input" and a boolean-like Element for "checkbox". Hmmm, and radio buttons
could be lists?
Although a boolean-like Element always has the disadvantage that bool() would
behave different for it than for an in-tree element (i.e.: does it have children?)
> Also, I'd like to merge
> in most of the functionality of lxml.html.formfill (except for
> error-filling), so where form.form_values() currently returns a list of
> all the values as they'd be if submitted, I'd like to make it settable.
> And maybe even have form.form_values return something that would
> modify inputs in-place, like form.form_values['Orig'] = '1535 W Leland'
> mean the equivalent of form.inputs['Orig'].value = '1535 W Leland'.
Hmmm, I already stumbled over the name "form_values" when it actually behaves
more like "form_items". This looks like it should be a dictionary-like class,
but it's actually more like a hash bag, as parameters can repeat. Those don't
seem to have an intuitive mapping to Python idioms, at least not when the most
common use case with unique keys is supposed to be convienient.
Although, you could actually return a subclass of "list" in form_values that
also supports __getitem__ and __setitem__ with string keys. Then, at least, it
would be consistent for reading *and* writing. That sounds nicely polymorphic
and is sufficiently close to a dict to be helpful in the most common case, but
stays mainly a list for the general case. You could then call it "inputitems"
to let it match with "inputs" and dicts.
> Another option question is actual form submission. Right now it uses
> urllib. But I like httplib2, for instance, and I'd like it to be
> possible to use that.
What about a module global setting? You would most likely not want to use both.
Alternatively, you could provide a simple interface that takes a URL and a
list of name-value pairs and opens it. Then implement it for both libraries
and provide an optional keyword argument in submit() that takes a callable
function with that signature (or maybe an instance of a dedicated abstract
superclass, if you want to make the interface visible).
> Also, I'm wondering about how to keep track of the URL when a page is
> parsed. Stefan mentioned if you use parse(url) it would keep track of
> that... where? I'd like it to be possible to keep the URL around for
> any kind of parsing, e.g., with document_fromstring(html, url=X).
You can pass a "base_url" keyword arg to HTML(). If you want to read the
original URL, wrap a document in an ElementTree and read its "docinfo.URL"
property.
Stefan
More information about the lxml-dev
mailing list