[lxml-dev] Ignoring unknown namespaces in XML while validating

Stefan Behnel stefan_ml at behnel.de
Wed Jul 11 14:30:17 CEST 2007


Hi,

first of all: please don't respond to posts from a different thread when you
want to start a new one. Mail-Readers will sort the e-mail into the wrong
thread and confuse people.

micxer wrote:
> I'm using lxml primarily for validation of XML documents and requests of 
> UPnP devices. Since many vendors are going to make their devices DLNA 
> compliant, some additional XML elements appear in the XML docs. I would 
> have to pay for the DLNA specs so I have no other choice than deleting 
> these elements in advance and validate the XML afterwards. Is there an 
> easy way to do this with lxml? Am I missing something?

Not sure what your problem is exactly. Are these "additional elements" in a
specific namespace? That would make it easy to remove them:

  for el in root.getiterator("{http://the/namespace}*"):
      parent = el.getparent()
      if parent is not None: # not the root element
          parent.remove(el)

Or are they in other namespaces than the main one?

  MAIN_NS = "{http://the/namespace}"
  for el in root.getiterator("*"):
      if not el.tag.startswith(MAIN_NS):
          parent = el.getparent()
          if parent is not None: # not the root element
              parent.remove(el)

Similarly, if you have a set of tag names that must be kept or removed, you
can iterate over all elements and check the tag names against the set.

Does that solve your problem?

Stefan


More information about the lxml-dev mailing list