[lxml-dev] lxml.html and forms

Ian Bicking ianb at colorstudy.com
Mon Jul 16 23:33:05 CEST 2007


Stefan Behnel wrote:
>> A single
>> checkbox to a boolean (kind of... it's a little fuzzy; it kind of maps
>> to None/the-value-of-the-checkbox, but I could allow a true/false setter
>> as well).
> 
> Hmm, except for an empty string value, Python's idea of a truth value would
> match that. And as you said, changing the form structure is not really
> intended, so you'd normally not change the value string but rather the
> "checked" property. So, assigning a truth value would simply change that,
> whereas a string value could still change the value property. The return value
> would then be the string value or None.
> 
> For the special case of an empty string, you could return a string subclass
> that evaluates to the bool value True. Not sure if I like this, though, sounds
> like too much magic - and you never know where values end up in in application
> code... Maybe it's a rare enough corner case to accept this, though. Or isn't
> there a Unicode character like "zero width space" or something like that, that
> we could return instead?

The empty string is definitely a corner case, as many server-side 
languages would treat that as false already.

Maybe it could just be returned as True in that case.  This could break 
code that expects a string, but it's such a strange case anyway that I 
don't mind too much.  Or I could return a string subclass of str that is 
true, which is also very weird, but again it's very much a corner case 
so maybe it's not that big a deal.  If you don't give a value to a 
checkbox it defaults to "on" anyway, so only an explicit value="" causes 
this.

>> Multi-select to a set, etc. Radio buttons would map to a
>> single value, but I'd also want to give some access to the possible set
>> of values (since unlike a text box there is a constrained set of
>> possible values).
> 
> Ok, so, how would you set them?
> 
>    >>> form.inputs["my_radio_name"] = "new_value"
> 
> Like this? This would then deselect all other radio buttons with the name
> "my_radio_name" and only select the one with the "new_value" value. If we
> adopt this, reading the property should definitely return the selected value
> as a single string:
> 
>    >>> form.inputs["my_radio_name"]
>    'new_value'

Yes, right now it works like:

   form.inputs['my_radio_name'].value = 'new_value'

Where form.inputs['my_radio_name'] is a subclass of list, which contains 
all the radio input elements and also allows this group setting.  If 
it's a group of checkboxes, it's:

   form.inputs['my_checkbox_name'].value.add('value1')

Which checks the checkbox with the value 'value1'.  You can also assign 
to value, which clears the set and assigns values from the iterator you 
give.  So basically I could take what I have now, and just always 
get/set .value to create a flatish dictionary.  And if you assign 
directly to the dictionary, it would clear the current values and then 
update with the values you give, just like the set works.

Whether this should replace or augment .inputs, I'm not sure.  I think 
augment, since .inputs gives you access to all the elements, which 
sometimes you will want.

> Maybe we could return a subclass with an "element" property that returns the
> Element that carries that value?
> 
>    >>> form.inputs["my_radio_name"].element
>    <Element 'radio' at ...>

Then we have something stringish, but isn't quite a string.  And when 
you an assignment, you get back something that's different than what you 
assigned.  It all feels too magic to me.  I think we can just have two 
accessors, one that gives you elements (like the current form.inputs) 
and one that gives you values only.

>> Right now you get that with
>> form.inputs['radio_name'].value_options, but that won't work with a
>> flatter dictionary.
> 
> Why not? I actually like that.

You'd also have to augment the string-like object, since 
form.inputs['radio_name'] would be the value of the currently checked 
radio button.

>> Maybe there'd generally be a
>> form_values.options('field_name'), which would be None for
>> unconstrained, and a set for constrained fields.
> 
> Sounds too generic for a simple case. You shouldn't forget that you can't
> really fill a form without knowing what is a radio button and what is a
> checkbox, so there is not much to gain by providing a generic API.
> 
>    hasattr(el, "value_options")
> 
> is also easy to write and reads better than
> 
>    el.value_options is None

Yes, most of the time you'll be filling out forms that you expect to 
have very particular fields.  But it's useful generally.  With a flat 
dictionary it's hard to get access to per-field information, so there 
has to be some other means of access.

Anyway, currently value_options is only set on those elements and 
objects where it makes sense.

>>>> Another option question is actual form submission.  Right now it uses
>>>> urllib.  But I like httplib2, for instance, and I'd like it to be
>>>> possible to use that.
>>> Alternatively, you could provide a simple interface that takes a URL
>>> and a list of name-value pairs and opens it.
>> That's what I was thinking of.  I don't like module global settings at
>> all.  Passing it in to submit seems fine.  I was thinking about using a
>> class variable too, if you wanted to subclass the elements, or just set
>> it manually on a particular instance.  Maybe it would be attached to the
>> tree object?  E.g.:
>>
>>   foo = parse(blah)
>>   foo.getroottree().urlfetch = my_url_fetch
> 
> That wouldn't work, as ElementTrees (and Elements) are not kept alive by the
> tree, so you can't store state in them.

Hrm... that's too bad.  I'd like to keep some kind of local information 
around, ideally inherited as you go from page to page.  I really hate 
global settings.

>> I was also thinking about whether I should return a new parsed page, or
>> just a file-like, or what.  Or a file-like object that has a method to
>> get the page, perhaps; e.g., new_page = form.submit().document().  I
>> don't think the url fetching function would need to do any of this, it
>> would just have a very minimal interface and the submit method would
>> wrap it up in whatever seems most convenient.
> 
> You can't return a parsed tree as the server reply can be anything from XML to
> weird binary. I think a file-like serves most purposes. Maybe an additional
> "parse()" method would work here, but I don't think it's necessary.
> 
>    >>> reply_tree = parse(form.submit())
> 
> works just fine, is intuitive and avoids overhead.

Yeah, you are probably right.  The etree parse method works just fine 
right now, especially if it already picks up the url.



-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
             : Write code, do good : http://topp.openplans.org/careers


More information about the lxml-dev mailing list