[lxml-dev] parsing and serializing XML fragments
Stefan Behnel
stefan_ml at behnel.de
Thu Jun 25 09:14:19 CEST 2009
Hi,
Hervé Cauwelier wrote:
> Hi, I'm trying to load fragments of XML to inject them in an existing
> document tree.
>
> They look like this:
>
> <table:table table:name="%s" table:style-name="%s"/>
Just curious: why do you create a document in which you can do string
replacements?
> Converting the fragment to the "{uri}name" syntax is not an option since
> I must remain agnostic to the XML parser.
That can't be parsed by lxml's parser either.
> I would expect the XML() function to take an "nsmap" argument, like the
> xpath() method on elements, or parts of the API for subclassing elements.
No, it's a namespace aware XML parser, so it will reject documents that are
not namespace well-formed.
> For now I have another template, a complete document with namespace
> declaration, and I inject my fragment using string formatting. Lxml will
> parse it and I extract the first child element.
You can use the feed parser instead and do
parser = etree.XMLParser(...)
parser.feed('<?xml ...?><root xmlns:...>')
parser.feed(the_fragment)
parser.feed('</root>'
fragment = parser.close()[0]
Feed parsers are reusable after a call to close(), BTW.
> I have looked at custom elements and other resolving methods but lxml
> was raising a namespace error before my "print"'s show up.
Element proxies are created /after/ parsing, so this won't help.
> Another issue is to save the element back to its snippet form, for unit
> test validation. Lxml will produce a valid document with namespace
> declaration. Either how to serialize without namespace declaration or
> how to remove it while keeping prefixes?
I put a lot of work into preventing lxml from serialising broken documents,
sorry. I also doubt that lxml's doctest support can help you here, as it
also requires parsing.
But you can insert your document into a new root Element that defines all
used namespaces (either fixed or collected at runtime), serialise that, and
strip the root element from top and bottom of the serialised byte string. I
do a bit of string mangling in lxml's own doctests to make them work in
both Py2 and Py3, it's not that hard to add these things. You can write a
little wrapper class around lxml.etree and override tostring() and parse()
to fit your needs (kudos to Fredrik for making them functions, BTW).
Here's an example:
http://codespeak.net/svn/lxml/trunk/doc/api.txt
Stefan
More information about the lxml-dev
mailing list