[lxml-dev] Ignoring unknown namespaces in XML while validating
micxer
micxer at micxer.de
Mon Jul 16 22:03:16 CEST 2007
Hi,
Stefan Behnel wrote:
> Hi,
>
> first of all: please don't respond to posts from a different thread when you
> want to start a new one. Mail-Readers will sort the e-mail into the wrong
> thread and confuse people.
>
Sorry about that. I thought I removed everything from the old post but I
forgot about the headers. And sorry for the late reply. I just found
your message in the Junk folder.
>
> micxer wrote:
>> I'm using lxml primarily for validation of XML documents and requests of
>> UPnP devices. Since many vendors are going to make their devices DLNA
>> compliant, some additional XML elements appear in the XML docs. I would
>> have to pay for the DLNA specs so I have no other choice than deleting
>> these elements in advance and validate the XML afterwards. Is there an
>> easy way to do this with lxml? Am I missing something?
>
> Not sure what your problem is exactly. Are these "additional elements" in a
> specific namespace? That would make it easy to remove them:
>
> for el in root.getiterator("{http://the/namespace}*"):
> parent = el.getparent()
> if parent is not None: # not the root element
> parent.remove(el)
>
> Or are they in other namespaces than the main one?
>
> MAIN_NS = "{http://the/namespace}"
> for el in root.getiterator("*"):
> if not el.tag.startswith(MAIN_NS):
> parent = el.getparent()
> if parent is not None: # not the root element
> parent.remove(el)
>
> Similarly, if you have a set of tag names that must be kept or removed, you
> can iterate over all elements and check the tag names against the set.
>
That's exactly the problem I have. I already thought about this manual
approach, but I also assumed there must be an easier way like telling
the parser to ignore any unknown tag or any tag that's not listed in the
schema.
>
> Does that solve your problem?
>
Absolutely, Thanks :-)
>
> Stefan
Michael
More information about the lxml-dev
mailing list