[lxml-dev] generative building of xml?

kris kris at cs.ucsb.edu
Thu May 8 20:03:46 CEST 2008


On Thu, 2008-05-08 at 09:22 +0200, Stefan Behnel wrote:
> Hi,

> Probably, although lxml is not designed for pipelined XML processing (any
> better than SAX, that is).
> 
> It also depends on how your XML looks like. If it's from a database, it's
> probably something simple like
> 
>   <root>
>     <row>
>       <column>...</column>
>       ...
>     </row>
>     ...
>   </root>
> 
> That shouldn't cause too many problems, you can use the (SAX-like) target
> parser to copy it into a simple Python container class, use that inside your
> program, merge all of those objects into a single stream at some point and
> then generate a new XML stream from that.
> 
> 
> > Here's the setup.  I've got several databases
> > generating XML content (which can be quite large), I really want
> > to be able to process the database record progressively 
> > generating XML and sending out on its own stream. 
> > 
> > An aggregator/filter  (elsewhere) will read the streams 
> > and parse them processing similar members and generate 
> > a new stream based on the combined streams.
> > 
> > DB1    DB2   DB3   Core database
> > XML    XML   XML   XML genaration
> >  WS     WS   WS     delivery over a stream using generator
> 
> A generator? Interesting. Why not just a file-like object?

I was thinking of a generator because I am feeding this 
to a stream that works with/on generators .. 
The databases are returning a top-k queries as xml files.  Each DB keeps
generating its best hits as a stream the aggregator sorts them and
send them to the client.   I would like to propagate the 
query all the way to  the component databases using 
generators to minimize the work each on does.



> If the interface is a generator (yielding strings, I assume), then you will
> have to use the feed parser interface to copy the data into the parser,
> otherwise, you can just use one thread per DB connection and have it read and
> parse the data for you.
> 
> 
> > 2.  Given the above generator, is there any such 
> >     thing as a generator version etree.tostring?
> 
> Nothing keeps you from yielding "<root>", followed by the serialised stream
> entries (call tostring() on each separately), followed by a "</root>".

Unfortunately it is a tree structure.. I would like to visit the tree
in something like;

yield "<root>"
yield '  <child attr0="a" attr1="b" >  '
yield '      <child ... '
...
yield '      </child '
yield '  </child>'
yield '  <child attr0="c" attr1="d" >  '
...
yield '</root'>


> Stefan
-- 
Kristian Kvilekval
kris at cs.ucsb.edu  http://www.cs.ucsb.edu/~kris w:805-636-1599 h:504-9756



More information about the lxml-dev mailing list