[lxml-dev] Proposal: Automatic unique ID generation for each tag or persistend user data for Element

Stefan Behnel stefan_ml at behnel.de
Fri Jul 18 16:15:53 CEST 2008


Hi,

please keep the list involved and avoid top-posting.

Ivan Begtin wrote:
> 2008/7/18 Stefan Behnel:
>> Dr R. Sanderson wrote:
>>> tree = node.getroottree()
>>> elems = node.xpath('//p')
>>> info = {}
>>> for e in elems:
>>>    eid = abs(hash(tree.getpath(e)))
>>>    # processing here...
>>>
>>> Then later I can check if the data came from the same element or not by
>>> comparing the eids.
>> That's a nice way of doing it. If you additionally want to make the ID
>> stick
>> with the Element to avoid recalculation or to make it survive a tree split,
>> you can store the hash value as xml:id of the node, as in
>>
>>  e.set("{http://www.w3.org/XML/1998/namespace}id",
>> md5sum(tree.getpath(e)))
>>
>> Just remember to remove all xml:id attributes and run cleanup_namespaces()
>> in lxml 2.1 before you serialise it to plain HTML.
>
> It solved about 99% of problem but still we have html comments which have
> attributes. Rarely, but sometimes they are needed too.

Could you explain what an html comment with an attribute is?


> And one more question: Is it possible to add ID without changing any
> original data ? Or to print tag via etree.tostring(...) without data which I
> add manually?

Repeating myself:

>> Just remember to remove all xml:id attributes and run cleanup_namespaces()
>> in lxml 2.1 before you serialise it to plain HTML.

Here's the code:

    for el in root.iter():
        try:
            del el.attrib["{http://www.w3.org/XML/1998/namespace}id"]
        except KeyError:
            pass
    etree.cleanup_namespaces(root)
    print etree.tostring(root, method="html")

Stefan


More information about the lxml-dev mailing list