[lxml-dev] About lxml status

Lee Brown lee.brown at elecdev.com
Fri Dec 15 00:09:36 CET 2006


Greetings!

The Apache web server has several different MPMs (Multi-Processing Modules)
available to it (unless you're running the Win32MPM, in which case that's the
one you're stuck with.)  But basically, the web server can spawn either
processes or threads to handle incoming requests.

In the Win32MPM, each VHOST (virtual web site) runs as a separate OS process and
each request that a VHOST receives is handled entirely as a thread within that
process.  Each thread invokes a chain of request handlers (code modules that
handle specific tasks like authentication, authorization, content delivery,
output filtering, and so forth) that are instantiated for that thread and then
they die at the end of the request.

Request threads may arrive simultaneously and are by nature very short-lived.
If a VHOST gets 32 simultaneous requests, 32 threads get created and then within
a second or two all 32 threads are finished and terminated.  (By default the
Win32 MPM can have a maximum of 250 concurrent threads.)

What Mod Python does is to allow you to specify a python function that will
handle a specific task or tasks in the chain in lieu of Apache's standard
handlers.  Mod Python's default behavior is to create a Python interpreter for
each VHOST and this interpreter is responsible for executing the various handler
functions in a thread-safe way for each request. (I have NO idea how it does it,
nor is my state of confusion likely to change even if someone explains it to
me.)  The source code containing the function is imported as a module at
interpreter startup in the normal 'Python' way, that is, executable code in the
module defined outside of the handler function definition(s) is executed on
import and is global to the handler function(s).

So, naively, I wrote some global code to pre-load and pre-compile all of my XSLT
templates into a dictionary at startup.  Then, within the handler function
definition I look up the correct template in the dictionary and use it to
transform the parsed XML source object.  This worked just fine as long as one
and only one thread was being executed at any given time.  Simultaneous requests
would either bomb out with a threading-related error or just hang until the
server ran out of available threads and crashed.  Apparently, Mod Python can
dole out handler functions in a thread-safe way, but any global objects you
create at import time are not so lucky.  Nor does there seem to be any way to
share an object from one thread with another thread.

One way around this may be to pass a copy of the template dictionary to the
handler function, that is, pass a literal copy instead of an object reference.
This would eliminate the time overhead of recompiling templates for each request
at the expense of possibly having a lot of copies in-memory at one time.  But
since my server always seems to have plenty of free memory, I'll give it a try.

-----Original Message-----
From: Ian Bicking [mailto:ianb at colorstudy.com] 
Sent: Thursday, December 14, 2006 4:50 PM
To: Lee Brown
Cc: 'Hans-Jürgen Hay'; lxml-dev at codespeak.net
Subject: Re: [lxml-dev] About lxml status

Lee Brown wrote:
> Thanks for the warning, but I've already run headfirst into that 
> problem.  My apache server is running the Win32MPM, where every 
> request is a new thread, so there aren't any tricks I can play with 
> the PythonInterpreter directive.  (None that help, anyway.)
> 
> However, I did some benchmark tests and found that I can serve about 
> 32 requests per second even with the overhead of recompiling the XSLT 
> template new for each request.  This is adequate for my needs, though 
> a very busy website might have trouble.

I'm not clear exactly on the way threads and mod_python and all that work, but I
imagine you could use a pool of templates.  You'd do something like:

try:
     tmpl = template_pool.pop()
except IndexError:
     tmpl = compile_template()

# then to return the template to the pool:

template_pool.append(tmpl)


This is assuming that it's okay to move templates between threads, but not use
them concurrently between threads.  Or if they have to be used in the thread
they were created in, you can use:

import threading
template_cache = threading.local()

try:
     tmpl = template_cache.template
except AttributeError:
     tmpl = template_cache.template = compile_template()


That's assuming that threads are long-lived, otherwise this won't change
anything either.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org



More information about the lxml-dev mailing list