[lxml-dev] Proposal: Automatic unique ID generation for each tag or persistend user data for Element
Stefan Behnel
stefan_ml at behnel.de
Fri Jul 18 16:15:53 CEST 2008
Hi,
please keep the list involved and avoid top-posting.
Ivan Begtin wrote:
> 2008/7/18 Stefan Behnel:
>> Dr R. Sanderson wrote:
>>> tree = node.getroottree()
>>> elems = node.xpath('//p')
>>> info = {}
>>> for e in elems:
>>> eid = abs(hash(tree.getpath(e)))
>>> # processing here...
>>>
>>> Then later I can check if the data came from the same element or not by
>>> comparing the eids.
>> That's a nice way of doing it. If you additionally want to make the ID
>> stick
>> with the Element to avoid recalculation or to make it survive a tree split,
>> you can store the hash value as xml:id of the node, as in
>>
>> e.set("{http://www.w3.org/XML/1998/namespace}id",
>> md5sum(tree.getpath(e)))
>>
>> Just remember to remove all xml:id attributes and run cleanup_namespaces()
>> in lxml 2.1 before you serialise it to plain HTML.
>
> It solved about 99% of problem but still we have html comments which have
> attributes. Rarely, but sometimes they are needed too.
Could you explain what an html comment with an attribute is?
> And one more question: Is it possible to add ID without changing any
> original data ? Or to print tag via etree.tostring(...) without data which I
> add manually?
Repeating myself:
>> Just remember to remove all xml:id attributes and run cleanup_namespaces()
>> in lxml 2.1 before you serialise it to plain HTML.
Here's the code:
for el in root.iter():
try:
del el.attrib["{http://www.w3.org/XML/1998/namespace}id"]
except KeyError:
pass
etree.cleanup_namespaces(root)
print etree.tostring(root, method="html")
Stefan
More information about the lxml-dev
mailing list