[lxml-dev] lxml.objectify.deannotate refuses to clean nil nodes

John Lovell jlovell at nwesd.org
Thu Jun 4 17:30:12 CEST 2009


My comments would be: brilliant, useful, wonderful!

However should the last one read...
strip_tags(tree, *tag_names)

John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell at nwesd.org
 
www.nwesd.org
Together We Can ...


-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Stefan Behnel
Sent: Thursday, June 04, 2009 6:34 AM
To: jholg at gmx.de
Cc: lxml-dev at codespeak.net
Subject: Re: [lxml-dev] lxml.objectify.deannotate refuses to clean nil nodes

jholg at gmx.de wrote:
>> jholg at gmx.de wrote:
>> > A compromise may be to add another keyword arg "nil" to 
>> > deannotate()
>> to
>> > allow for xsi:nil removal if needed (defaults to False, of course 
>> > :)
>>
>> I think that should be done, yes. A "nil=False" keyword would nicely 
>> solve this. And disabling it by default makes sense for two reasons: 
>> backwards compatibility and the fact that xsi:nil may be used in 
>> existing documents.
>>
>> Is a plain "nil" enough or should we use "xsi_nil"?
>
> I think xsi_nil is clearer.

Thought so, too.


> What if we add a general deannotation function that lets you strip a 
> tree off arbitrary attributes? Something like
>
> def remove_attributes(element_or_tree, *attrs):
> ...
>
> which takes either ns-qualified strings or (ns, attrname) tuples and 
> removes these attributes wherever found. objectify.deannotate() would 
> then be a special case of this and share the implementation.

That sounds like functionality that belongs into lxml.etree, although it's partly available in lxml.html already. What about adding some more, then?

- strip_attributes(tree, *attribute_names)
  remove all named attributes from a tree

- strip_elements(tree, *element_names)
  remove all named elements from a tree, including their subtrees (alt:
"strip_subtrees")

- strip_tags(tree, *element_names)
  remove all named elements from a tree, merging their children and text content into their parents

Since lxml.html provides a drop_tag() Element method, I considered
drop_tags() for the last one, but thought that "strip_*" might be slightly better for consistency here. Alternatively, we might use "drop_*" for everything, but "strip" is a common thing in Python, while "drop" isn't.
Plus, there are "drop_*()" /methods/ in lxml.html, which make sense on an Element and do not traverse into subtrees. "strip" makes no sense in that context.

I also vote for functions instead of methods here since they work on complete (sub-)trees rather than a single Element object. A function makes this clearer.

Comments?

Stefan

_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev


More information about the lxml-dev mailing list