[lxml-dev] Beginner question
Andreas Tille
tillea at rki.de
Thu Oct 11 09:56:24 CEST 2007
Hi,
I'm sorry to start with this beginner question. Yesterday I stumbled over
lxml and I think it is a really great tool which exactly is what I ever wanted
but I'm afraid I need some kick start. I try to parse some XML files that
are used as transport medium between different databases. We use a self defined
XSD schema. The xml file lokes like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<envelope xmlns="http://www3.rki.de/ns/agi/ibs/2007/T06/report" xmlns:ct="http://www3.rki.de/ns/rki/base/ct/2007/T03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www3.rki.de/ns/agi/ibs/2007/T06/report http://www3.rki.de/ns/agi/ibs/2007/T06/RKI_IBS_SVD2RKI.xsd">
<preamble>
<tracking sequence="1" timestamp="2007-10-10T02:10:11">
<source idSource="NRZ Berlin">
<ct:software name="psql2xml" version="0.1"/>
</source>
<target idTarget="RKI"/>
</tracking>
</preamble>
<content>
<virologics elements="9">
<virologic notifyWeek="2" period="2001/2002">
<properties elements="2">
<ct:property name="ageYear" value="21"/>
<ct:property name="ageMonth" value="o"/>
</properties>
<sender idSender="603" site="15362" class="o"/>
<patient birthYear="1980" birthMonth="o" sex="w"/>
<illness from="2002-01-04">
<vaccination>
<status response="n"/>
</vaccination>
<therapy>
<status response="x"/>
</therapy>
<contact response="n"/>
</illness>
<symptoms>
<acute response="y"/>
<fever>
<status response="y"/>
</fever>
<cough response="n"/>
<pain response="n"/>
</symptoms>
<complications>
<pneumonia response="n"/>
<hospitalisation response="x"/>
</complications>
<report idLab="02-0750">
<material from="2002-01-05" receipt="2002-01-09" name="R"/>
<results date="2002-01-09">
<result pathogen="InvA" value="o" interpretation="ni" method="PCR"/>
<result pathogen="InvB" value="o" interpretation="ni" method="PCR"/>
</results>
</report>
</virologic>
<virologic notifyWeek="2" period="2001/2002">
...
With the code that I adopted from the tutorial
for event, elem in etree.iterparse(infile, events=("start")):
if event == "start":
print "start:", etree.tostring(elem, pretty_print=True)
print "--->", elem.tag
I got something like:
...
start: <source idSource="NRZ Berlin">
<ct:software name="psql2xml" version="0.1"/>
</source>
---> {http://www3.rki.de/ns/agi/ibs/2007/T06/report}source
start: <ct:software name="psql2xml" version="0.1"/>
---> {http://www3.rki.de/ns/rki/base/ct/2007/T03}software
start <target idTarget="RKI"/>
---> {http://www3.rki.de/ns/agi/ibs/2007/T06/report}source
...
the elements as a whole with children on the one hand but I have no
idea how to finally access the values like 'idSource="NRZ Berlin" '
nor do I have an idea how to get rid of the default name space that
is prepended before the tags. I would rather like to access the tag
called "source" (without the default name space) or "ct:software"
with the shortcut of the name space.
I also found the very interesting objectify method at
http://codespeak.net/lxml/objectify.html
but I finally have no idea how to use that in the parser because
the page just describes creating objects (or did I missed something?)
Sorry for my ignorance in case things should be obvious from reading
the docs.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the lxml-dev
mailing list