[lxml-dev] Forms, Cookies, Headers, and Time

Douglas Mayle douglas at openplans.org
Thu Apr 23 17:24:31 CEST 2009


Ahh, randomly enough, the thread you link to is the one I started.   
After browsing through the lxml code, it turned out that there was no  
need to pass an open_http parameter, as the default method did almost  
exactly the same thing as the code sample given and so monkey patching  
the library (the standard way to add cookie support) already works.   
Unfortunately, I found out that passing a URL directly to lxml causes  
it to use libxml's native downloading support, which has no support  
for cookies.  As such, you have to handle all of the downloading of  
content yourself (except when taking advantage of lxml forms).

As to waiting 2-3 seconds before requests, you can just put sleeps  
into your code, or find some sort of bandwidth throttling package...

Douglas Mayle

On Apr 23, 2009, at 2:41 AM, Akafubu Kibombo wrote:

> I am trying to write a script which fetches a url, logs into the  
> site, then fetches particular items from the page, and goes to the  
> next page, fetching the same type of files on the new page until  
> there are no new pages to fetch from. So I need form and cooke  
> handling, as well as manipulating the headers. What do I need to  
> use? I found this thread, but I don't understand it: http://codespeak.net/pipermail/lxml-dev/2008-December/004272.html 
> .
>
> Also, I don't want to wipe out the server with so many requests, is  
> there a "wait 2 - 3 seconds before fetching the next element" type  
> function?..
>
> Thank you so, so much.
>
> -A.F.
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090423/638581ac/attachment.htm 


More information about the lxml-dev mailing list