[lxml-dev] lxml.objectify.deannotate refuses to clean nil nodes

Stefan Behnel stefan_ml at behnel.de
Wed Jun 3 20:14:14 CEST 2009


Robert Pierce wrote:
> In case it isn't obvious, I'm not an XML guru and haven't been using
> lxml for long, but truly IMHO:
> 
> I stipulate the importance of nil (or null) in schema definitions, as
> well as in attaching types to the in memory representation of the
> tree.  But from the standpoint of text representation, <foo
> xsi:nil='true'/> doesn't seem to carry any additional information over
> <foo/>.

I think it makes more sense to let an empty leaf element represent an empty
string than to represent it as None. It's a matter of use cases, obviously.


> My use case is passing XML through SQS

"SQS" is an ambiguous abbreviation.


> which has an upper bound of
> about 6kB (after http headers are accounted for).

That sounds like a rather odd restriction. Doesn't it at least support
compression?


> I think it is impossible to retain input intent once a tree is parsed
> into memory.  Really, in the absence of a schema I shouldn't be able
> to tell the difference between your input and
> 
> root = objectify.fromstring('<root><x/></root>')
> 
> or
> root  = objectify.fromstring('<root/>')
> root.x = None

Well, it /is/ different, though.

	>>> root = objectify.fromstring('<root><x/></root>')
	>>> str(root.x)
	''
	>>> root  = objectify.fromstring('<root/>')
	>>> root.x = None
	>>> str(root.x)
	'None'


> You can only ask for consistency on output.

No, lxml.objectify is a Python-object-like in-memory tree. Serialisation is
only a way out, validation only a way to check what leaves the code that
processed the tree. All the rest is about making it easy to use as a tree
structure. That's what the annotations are there for. If you want to keep
the necessary information during a serialise-parse cycle or not is up to
you (or should be, so an option to remove everything is just fine).

Stefan



More information about the lxml-dev mailing list