[lxml-dev] lxml forms and cookies...

John J Lee jjl at pobox.com
Tue Dec 23 13:50:47 CET 2008


On Mon, 22 Dec 2008, Douglas Mayle wrote:

> Hey everyone, I've been trying to use the lxml forms with client
> cookies to handle html logins.  Using cookielib, I'm able to manually
> login to a page using either urllib or urllib2 and have it work.  The
> moment I try to use lxml.html.submit_form(), however, it fails.  I've
> tried sniffing the packets, and it turns out that lxml is never
> sending cookies, which makes me guess that lxml is using neither
> urllib nor urllib2.  How do I use cookes with lxml?

lxml.html.submit_form() has an open_http parameter:

import urllib
import urllib2
import urlparse
import lxml.html

def url_with_query(url, values):
     parts = urlparse.urlparse(url)
     rest, (query, frag) = parts[:-2], parts[-2:]
     return urlparse.urlunparse(rest + (urllib.urlencode(values), None))

def make_open_http():
     opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
     opener.addheaders = []  # pretend we're a human -- don't do this
     def open_http(method, url, values={}):
         if method == "POST":
             return opener.open(url, urllib.urlencode(values))
         else:
             return opener.open(url_with_query(url, values))
     return open_http

open_http = make_open_http()
tree = lxml.html.fromstring(open_http("GET", "http://python.org").read())
form = tree.forms[0]
form.fields["q"] = "lxml"
submit_values = {"submit": form.fields["submit"]}
response = lxml.html.submit_form(form,
                                  extra_values=submit_values,
                                  open_http=open_http)
html = response.read()
doc = lxml.html.fromstring(html)
lxml.html.open_in_browser(doc)


John



More information about the lxml-dev mailing list