[lxml-dev] generative building of xml?
kris
kris at cs.ucsb.edu
Thu May 8 20:03:46 CEST 2008
On Thu, 2008-05-08 at 09:22 +0200, Stefan Behnel wrote:
> Hi,
> Probably, although lxml is not designed for pipelined XML processing (any
> better than SAX, that is).
>
> It also depends on how your XML looks like. If it's from a database, it's
> probably something simple like
>
> <root>
> <row>
> <column>...</column>
> ...
> </row>
> ...
> </root>
>
> That shouldn't cause too many problems, you can use the (SAX-like) target
> parser to copy it into a simple Python container class, use that inside your
> program, merge all of those objects into a single stream at some point and
> then generate a new XML stream from that.
>
>
> > Here's the setup. I've got several databases
> > generating XML content (which can be quite large), I really want
> > to be able to process the database record progressively
> > generating XML and sending out on its own stream.
> >
> > An aggregator/filter (elsewhere) will read the streams
> > and parse them processing similar members and generate
> > a new stream based on the combined streams.
> >
> > DB1 DB2 DB3 Core database
> > XML XML XML XML genaration
> > WS WS WS delivery over a stream using generator
>
> A generator? Interesting. Why not just a file-like object?
I was thinking of a generator because I am feeding this
to a stream that works with/on generators ..
The databases are returning a top-k queries as xml files. Each DB keeps
generating its best hits as a stream the aggregator sorts them and
send them to the client. I would like to propagate the
query all the way to the component databases using
generators to minimize the work each on does.
> If the interface is a generator (yielding strings, I assume), then you will
> have to use the feed parser interface to copy the data into the parser,
> otherwise, you can just use one thread per DB connection and have it read and
> parse the data for you.
>
>
> > 2. Given the above generator, is there any such
> > thing as a generator version etree.tostring?
>
> Nothing keeps you from yielding "<root>", followed by the serialised stream
> entries (call tostring() on each separately), followed by a "</root>".
Unfortunately it is a tree structure.. I would like to visit the tree
in something like;
yield "<root>"
yield ' <child attr0="a" attr1="b" > '
yield ' <child ... '
...
yield ' </child '
yield ' </child>'
yield ' <child attr0="c" attr1="d" > '
...
yield '</root'>
> Stefan
--
Kristian Kvilekval
kris at cs.ucsb.edu http://www.cs.ucsb.edu/~kris w:805-636-1599 h:504-9756
More information about the lxml-dev
mailing list