[lxml-dev] lxml.html and forms
Stefan Behnel
stefan_ml at behnel.de
Mon Jul 16 23:05:07 CEST 2007
Ian Bicking wrote:
> Stefan Behnel wrote:
> And really what form_values gives is intended for urllib.urlencode, and
> maybe can just be left that way. The order doesn't matter as much to
> the Python side, as it's just intrinsic in the way the page is laid out.
> That is, you can't (usually) "make item 4 be (name, value)", because
> item 4 already has a name, and the value might be constrained anyway.
> You could say, possibly, "make the second text input with name X have
> value Y", but that's relatively uncommon in forms and still more
> constrained than a general dictionary interface. I.e., you can't invent
> new names, you can't change the order of the fields, and constrained
> fields like checkboxes stay constrained. So maybe keep form_values, and
> use something else entirely that is more dict-like for this more dynamic
> get/set structure. Something a bit like form.inputs, but maybe fully
> embrace the wrapperness of it.
Makes sense to me.
> That thing would be more strictly dict-like, and every key would map to
> some structure that represents the entirety of what represents that key
> in the form. So a single text input would map to a string.
Sure.
> A single
> checkbox to a boolean (kind of... it's a little fuzzy; it kind of maps
> to None/the-value-of-the-checkbox, but I could allow a true/false setter
> as well).
Hmm, except for an empty string value, Python's idea of a truth value would
match that. And as you said, changing the form structure is not really
intended, so you'd normally not change the value string but rather the
"checked" property. So, assigning a truth value would simply change that,
whereas a string value could still change the value property. The return value
would then be the string value or None.
For the special case of an empty string, you could return a string subclass
that evaluates to the bool value True. Not sure if I like this, though, sounds
like too much magic - and you never know where values end up in in application
code... Maybe it's a rare enough corner case to accept this, though. Or isn't
there a Unicode character like "zero width space" or something like that, that
we could return instead?
> Multi-select to a set, etc. Radio buttons would map to a
> single value, but I'd also want to give some access to the possible set
> of values (since unlike a text box there is a constrained set of
> possible values).
Ok, so, how would you set them?
>>> form.inputs["my_radio_name"] = "new_value"
Like this? This would then deselect all other radio buttons with the name
"my_radio_name" and only select the one with the "new_value" value. If we
adopt this, reading the property should definitely return the selected value
as a single string:
>>> form.inputs["my_radio_name"]
'new_value'
Maybe we could return a subclass with an "element" property that returns the
Element that carries that value?
>>> form.inputs["my_radio_name"].element
<Element 'radio' at ...>
> Right now you get that with
> form.inputs['radio_name'].value_options, but that won't work with a
> flatter dictionary.
Why not? I actually like that.
> Maybe there'd generally be a
> form_values.options('field_name'), which would be None for
> unconstrained, and a set for constrained fields.
Sounds too generic for a simple case. You shouldn't forget that you can't
really fill a form without knowing what is a radio button and what is a
checkbox, so there is not much to gain by providing a generic API.
hasattr(el, "value_options")
is also easy to write and reads better than
el.value_options is None
>>> Another option question is actual form submission. Right now it uses
>>> urllib. But I like httplib2, for instance, and I'd like it to be
>>> possible to use that.
>>
>> Alternatively, you could provide a simple interface that takes a URL
>> and a list of name-value pairs and opens it.
>
> That's what I was thinking of. I don't like module global settings at
> all. Passing it in to submit seems fine. I was thinking about using a
> class variable too, if you wanted to subclass the elements, or just set
> it manually on a particular instance. Maybe it would be attached to the
> tree object? E.g.:
>
> foo = parse(blah)
> foo.getroottree().urlfetch = my_url_fetch
That wouldn't work, as ElementTrees (and Elements) are not kept alive by the
tree, so you can't store state in them.
> I was also thinking about whether I should return a new parsed page, or
> just a file-like, or what. Or a file-like object that has a method to
> get the page, perhaps; e.g., new_page = form.submit().document(). I
> don't think the url fetching function would need to do any of this, it
> would just have a very minimal interface and the submit method would
> wrap it up in whatever seems most convenient.
You can't return a parsed tree as the server reply can be anything from XML to
weird binary. I think a file-like serves most purposes. Maybe an additional
"parse()" method would work here, but I don't think it's necessary.
>>> reply_tree = parse(form.submit())
works just fine, is intuitive and avoids overhead.
> OK, I guess that keyword argument should be available in all the parsing
> functions.
"string" parsing functions. Sure.
> Maybe I should add a property to elements too, that fetches
> that information from the tree. And possibly something in parse that
> uses fp.geturl() if it is available.
etree already does that internally:
cdef _getFilenameForFile(source):
"""Given a Python File or Gzip object, give filename back.
Returns None if not a file object.
"""
# file instances have a name attribute
if hasattr(source, 'name'):
return source.name
# gzip file instances have a filename attribute
if hasattr(source, 'filename'):
return source.filename
# urllib2
if hasattr(source, 'geturl'):
return source.geturl()
return None
Stefan
More information about the lxml-dev
mailing list