[lxml-dev] Ignoring unknown namespaces in XML while validating

micxer micxer at micxer.de
Mon Jul 16 22:03:16 CEST 2007


Hi,

Stefan Behnel wrote:
> Hi,
> 
> first of all: please don't respond to posts from a different thread when you
> want to start a new one. Mail-Readers will sort the e-mail into the wrong
> thread and confuse people.
> 
Sorry about that. I thought I removed everything from the old post but I 
forgot about the headers. And sorry for the late reply. I just found 
your message in the Junk folder.
 >
> micxer wrote:
>> I'm using lxml primarily for validation of XML documents and requests of 
>> UPnP devices. Since many vendors are going to make their devices DLNA 
>> compliant, some additional XML elements appear in the XML docs. I would 
>> have to pay for the DLNA specs so I have no other choice than deleting 
>> these elements in advance and validate the XML afterwards. Is there an 
>> easy way to do this with lxml? Am I missing something?
> 
> Not sure what your problem is exactly. Are these "additional elements" in a
> specific namespace? That would make it easy to remove them:
> 
>   for el in root.getiterator("{http://the/namespace}*"):
>       parent = el.getparent()
>       if parent is not None: # not the root element
>           parent.remove(el)
> 
> Or are they in other namespaces than the main one?
> 
>   MAIN_NS = "{http://the/namespace}"
>   for el in root.getiterator("*"):
>       if not el.tag.startswith(MAIN_NS):
>           parent = el.getparent()
>           if parent is not None: # not the root element
>               parent.remove(el)
> 
> Similarly, if you have a set of tag names that must be kept or removed, you
> can iterate over all elements and check the tag names against the set.
> 
That's exactly the problem I have. I already thought about this manual 
approach, but I also assumed there must be an easier way like telling 
the parser to ignore any unknown tag or any tag that's not listed in the 
schema.
 >
> Does that solve your problem?
> 
Absolutely, Thanks :-)
 >
> Stefan

Michael


More information about the lxml-dev mailing list