[lxml-dev] Forms, Cookies, Headers, and Time
Douglas Mayle
douglas at openplans.org
Thu Apr 23 17:24:22 CEST 2009
I wrote a tool to sync safari books downloads that does similar things
to what you're talking about. I found the various issues you run into
with form and cookie handling when using lxml (and wrote an article
about it here: http://douglas.mayle.org/2009/03/05/syncing-safari-downloads-intro-screen-scraping/
). I spent some time making sure the code was clean and very well
documented, so it should help you to get started. The example is here:
http://projects.mayle.org/hg/safarisync/file/23cfad04ce3a/safarisync/safarisync/safarisync.py
Douglas Mayle
On Apr 23, 2009, at 2:41 AM, Akafubu Kibombo wrote:
> I am trying to write a script which fetches a url, logs into the
> site, then fetches particular items from the page, and goes to the
> next page, fetching the same type of files on the new page until
> there are no new pages to fetch from. So I need form and cooke
> handling, as well as manipulating the headers. What do I need to
> use? I found this thread, but I don't understand it: http://codespeak.net/pipermail/lxml-dev/2008-December/004272.html
> .
>
> Also, I don't want to wipe out the server with so many requests, is
> there a "wait 2 - 3 seconds before fetching the next element" type
> function?..
>
> Thank you so, so much.
>
> -A.F.
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090423/20916a1a/attachment.htm
More information about the lxml-dev
mailing list