[lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64

Stefan Behnel stefan_ml at behnel.de
Fri Dec 28 09:28:55 CET 2007


Hi,

Dmitri Fedoruk wrote:
> I've upgdaded my code to be lxml2.0-compatible.

Cool. Hope it wasn't too hard.


>> Entity 'hellip' not defined
> Parsing of the incoming data fails when I have html entities in it.
> 
> Literally I have this code:
> 
> xmlParser = etree.XMLParser( no_network = False, resolve_entities = False )
> storedDoc = etree.parse( StringIO.StringIO(reply['data']), xmlParser )
> 
> I tried to turn  resolve_entities = True, did not help either. The
> point is that all entities are defined in the files included in the
> DTD file, and I do not want to validate the data in the runtime - I
> have strict time limitations.

You can load the DTD without triggering validation by passing "load_dtd =
True". I never tested the performance impact, though.

The XML parser needs to read the DTD to learn about the entities (that's how
it works). If you are dealing with HTML, you can also try the HTMLParser() -
it's not only good for fixing HTML, it also knows a lot of HTML specifics.


> As I have already said, this happens with only several given
> stylesheets. May this be the data\stylesheet problem?

Not sure what you mean here. Can you figure out what is different in the
stylesheets that fail? Something like "only they call document() to read from
other XML files" or "only they (or all of them) use stylesheet-local data" or
"they were created at a different place in the code".

As a quick fix, did you try changing the mod_python config as proposed in the FAQ?

http://codespeak.net/lxml/dev/FAQ.html#my-program-crashes-when-run-with-mod-python-pyro-zope-plone

Again, no idea about the performance impact here.

Stefan


More information about the lxml-dev mailing list