[lxml-dev] Forms, Cookies, Headers, and Time
Douglas Mayle
douglas at openplans.org
Thu Apr 23 17:24:31 CEST 2009
Ahh, randomly enough, the thread you link to is the one I started.
After browsing through the lxml code, it turned out that there was no
need to pass an open_http parameter, as the default method did almost
exactly the same thing as the code sample given and so monkey patching
the library (the standard way to add cookie support) already works.
Unfortunately, I found out that passing a URL directly to lxml causes
it to use libxml's native downloading support, which has no support
for cookies. As such, you have to handle all of the downloading of
content yourself (except when taking advantage of lxml forms).
As to waiting 2-3 seconds before requests, you can just put sleeps
into your code, or find some sort of bandwidth throttling package...
Douglas Mayle
On Apr 23, 2009, at 2:41 AM, Akafubu Kibombo wrote:
> I am trying to write a script which fetches a url, logs into the
> site, then fetches particular items from the page, and goes to the
> next page, fetching the same type of files on the new page until
> there are no new pages to fetch from. So I need form and cooke
> handling, as well as manipulating the headers. What do I need to
> use? I found this thread, but I don't understand it: http://codespeak.net/pipermail/lxml-dev/2008-December/004272.html
> .
>
> Also, I don't want to wipe out the server with so many requests, is
> there a "wait 2 - 3 seconds before fetching the next element" type
> function?..
>
> Thank you so, so much.
>
> -A.F.
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20090423/638581ac/attachment.htm
More information about the lxml-dev
mailing list