[lxml-dev] lxml.objectify.deannotate refuses to clean nil nodes

Stefan Behnel stefan_ml at behnel.de
Thu Jun 4 15:34:18 CEST 2009


jholg at gmx.de wrote:
>> jholg at gmx.de wrote:
>> > A compromise may be to add another keyword arg "nil" to deannotate()
>> to
>> > allow for xsi:nil removal if needed (defaults to False, of course :)
>>
>> I think that should be done, yes. A "nil=False" keyword would nicely
>> solve
>> this. And disabling it by default makes sense for two reasons: backwards
>> compatibility and the fact that xsi:nil may be used in existing
>> documents.
>>
>> Is a plain "nil" enough or should we use "xsi_nil"?
>
> I think xsi_nil is clearer.

Thought so, too.


> What if we add a general deannotation function that lets you strip a tree
> off arbitrary attributes? Something like
>
> def remove_attributes(element_or_tree, *attrs):
> ...
>
> which takes either ns-qualified strings or (ns, attrname) tuples and
> removes these attributes wherever found. objectify.deannotate() would then
> be a special case of this and share the implementation.

That sounds like functionality that belongs into lxml.etree, although it's
partly available in lxml.html already. What about adding some more, then?

- strip_attributes(tree, *attribute_names)
  remove all named attributes from a tree

- strip_elements(tree, *element_names)
  remove all named elements from a tree, including their subtrees (alt:
"strip_subtrees")

- strip_tags(tree, *element_names)
  remove all named elements from a tree, merging their children and text
content into their parents

Since lxml.html provides a drop_tag() Element method, I considered
drop_tags() for the last one, but thought that "strip_*" might be slightly
better for consistency here. Alternatively, we might use "drop_*" for
everything, but "strip" is a common thing in Python, while "drop" isn't.
Plus, there are "drop_*()" /methods/ in lxml.html, which make sense on an
Element and do not traverse into subtrees. "strip" makes no sense in that
context.

I also vote for functions instead of methods here since they work on
complete (sub-)trees rather than a single Element object. A function makes
this clearer.

Comments?

Stefan



More information about the lxml-dev mailing list