[lxml-dev] how to hack (and where)

Stefan Behnel stefan_ml at behnel.de
Thu Jul 19 10:24:31 CEST 2007


jholg at gmx.de wrote:
>>> The name check should go directly into _createElement,
>> No, _createElement() is only a tiny wrapper around the element node 
>> creation in libxml2. No Python exceptions allowed there.
> 
> Just out of curiosity: Is this by policy, or would it really cause 
> problems? Because I tried just that and didn't see any problems. But of 
> course I didn't test any tricky stuff or threading or you know what.

They are meant to be as simple as they are: just create a plain xmlNode. They
basically give that a more explicit name and make sure we always call the same
thing, so if we ever really need to change something here, we have one place
to do so. But they are internal functions, not API level functions. Error
checking must be done at the API level, before entering into the internals.

For example, _makeElement is the main function for creating an Element proxy
at the API level. It's also a public C function that can be safely used in a
external modules (like objectify). It does all the error checking and figuring
out what you meant and it can happily throw an exception if you provided
rubbish, as it will only be used from API functions.

Another good example is _getNsTag, which is the main API-level helper for
splitting up something that came from the user into a UTF-8 encoded namespace
*string* (or None) and a tag name *string*. It throws an exception if that
fails and guarantees to return objects of the expected types. That really
helps internally, because all internal code can just rely on that.

There's also public-api.pxi that wraps some of the half-public C functions and
adds some additional error checking in some cases to make them publicly usable.

Maybe a good way to detect an API-level C function is if a) they already throw
an exception, b) they return things like _Element or other API-level objects
or c) they are often used at the beginning of API functions. Admittedly, it's
not always 100% clear from the code (_createElement is a bad example as it's
directly used in SubElement), but those are good rules of thumb.

Does that make the difference clear?

Stefan


More information about the lxml-dev mailing list