[lxml-dev] lxml.objectify.deannotate refuses to clean nil nodes
John Lovell
jlovell at nwesd.org
Thu Jun 4 17:30:12 CEST 2009
My comments would be: brilliant, useful, wonderful!
However should the last one read...
strip_tags(tree, *tag_names)
John W. Lovell
Web Applications Engineer
Northwest Educational Service District
1601 R Avenue
Anacortes, WA 98221
(360) 299-4086
jlovell at nwesd.org
www.nwesd.org
Together We Can ...
-----Original Message-----
From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Stefan Behnel
Sent: Thursday, June 04, 2009 6:34 AM
To: jholg at gmx.de
Cc: lxml-dev at codespeak.net
Subject: Re: [lxml-dev] lxml.objectify.deannotate refuses to clean nil nodes
jholg at gmx.de wrote:
>> jholg at gmx.de wrote:
>> > A compromise may be to add another keyword arg "nil" to
>> > deannotate()
>> to
>> > allow for xsi:nil removal if needed (defaults to False, of course
>> > :)
>>
>> I think that should be done, yes. A "nil=False" keyword would nicely
>> solve this. And disabling it by default makes sense for two reasons:
>> backwards compatibility and the fact that xsi:nil may be used in
>> existing documents.
>>
>> Is a plain "nil" enough or should we use "xsi_nil"?
>
> I think xsi_nil is clearer.
Thought so, too.
> What if we add a general deannotation function that lets you strip a
> tree off arbitrary attributes? Something like
>
> def remove_attributes(element_or_tree, *attrs):
> ...
>
> which takes either ns-qualified strings or (ns, attrname) tuples and
> removes these attributes wherever found. objectify.deannotate() would
> then be a special case of this and share the implementation.
That sounds like functionality that belongs into lxml.etree, although it's partly available in lxml.html already. What about adding some more, then?
- strip_attributes(tree, *attribute_names)
remove all named attributes from a tree
- strip_elements(tree, *element_names)
remove all named elements from a tree, including their subtrees (alt:
"strip_subtrees")
- strip_tags(tree, *element_names)
remove all named elements from a tree, merging their children and text content into their parents
Since lxml.html provides a drop_tag() Element method, I considered
drop_tags() for the last one, but thought that "strip_*" might be slightly better for consistency here. Alternatively, we might use "drop_*" for everything, but "strip" is a common thing in Python, while "drop" isn't.
Plus, there are "drop_*()" /methods/ in lxml.html, which make sense on an Element and do not traverse into subtrees. "strip" makes no sense in that context.
I also vote for functions instead of methods here since they work on complete (sub-)trees rather than a single Element object. A function makes this clearer.
Comments?
Stefan
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
More information about the lxml-dev
mailing list