[lxml-dev] Forms, Cookies, Headers, and Time

Douglas Mayle douglas at openplans.org
Thu Apr 23 17:24:22 CEST 2009


I wrote a tool to sync safari books downloads that does similar things  
to what you're talking about.  I found the various issues you run into  
with form and cookie handling when using lxml (and wrote an article  
about it here: http://douglas.mayle.org/2009/03/05/syncing-safari-downloads-intro-screen-scraping/ 
  ).  I spent some time making sure the code was clean and very well  
documented, so it should help you to get started.  The example is here:
http://projects.mayle.org/hg/safarisync/file/23cfad04ce3a/safarisync/safarisync/safarisync.py

Douglas Mayle

On Apr 23, 2009, at 2:41 AM, Akafubu Kibombo wrote:

> I am trying to write a script which fetches a url, logs into the  
> site, then fetches particular items from the page, and goes to the  
> next page, fetching the same type of files on the new page until  
> there are no new pages to fetch from. So I need form and cooke  
> handling, as well as manipulating the headers. What do I need to  
> use? I found this thread, but I don't understand it: http://codespeak.net/pipermail/lxml-dev/2008-December/004272.html 
> .
>
> Also, I don't want to wipe out the server with so many requests, is  
> there a "wait 2 - 3 seconds before fetching the next element" type  
> function?..
>
> Thank you so, so much.
>
> -A.F.
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090423/20916a1a/attachment.htm 


More information about the lxml-dev mailing list