[lxml-dev] Beginner question

Andreas Tille tillea at rki.de
Thu Oct 11 09:56:24 CEST 2007


Hi,

I'm sorry to start with this beginner question.  Yesterday I stumbled over
lxml and I think it is a really great tool which exactly is what I ever wanted
but I'm afraid I need some kick start.  I try to parse some XML files that
are used as transport medium between different databases.  We use a self defined
XSD schema.  The xml file lokes like this:


<?xml version="1.0" encoding="ISO-8859-1"?>
  <envelope xmlns="http://www3.rki.de/ns/agi/ibs/2007/T06/report" xmlns:ct="http://www3.rki.de/ns/rki/base/ct/2007/T03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www3.rki.de/ns/agi/ibs/2007/T06/report http://www3.rki.de/ns/agi/ibs/2007/T06/RKI_IBS_SVD2RKI.xsd">
    <preamble>
      <tracking sequence="1" timestamp="2007-10-10T02:10:11">
        <source idSource="NRZ Berlin">
          <ct:software name="psql2xml" version="0.1"/>
        </source>
        <target idTarget="RKI"/>
      </tracking>
    </preamble>
    <content>
     <virologics elements="9">
      <virologic notifyWeek="2" period="2001/2002">
        <properties elements="2">
          <ct:property name="ageYear" value="21"/>
          <ct:property name="ageMonth" value="o"/>
        </properties>
        <sender idSender="603" site="15362" class="o"/>
        <patient birthYear="1980" birthMonth="o" sex="w"/>
        <illness from="2002-01-04">
          <vaccination>
            <status response="n"/>
          </vaccination>
          <therapy>
            <status response="x"/>
          </therapy>
          <contact response="n"/>
        </illness>
        <symptoms>
          <acute response="y"/>
          <fever>
            <status response="y"/>
          </fever>
          <cough response="n"/>
          <pain response="n"/>
        </symptoms>
        <complications>
          <pneumonia response="n"/>
          <hospitalisation response="x"/>
        </complications>
        <report idLab="02-0750">
          <material from="2002-01-05" receipt="2002-01-09" name="R"/>
          <results date="2002-01-09">
            <result pathogen="InvA" value="o" interpretation="ni" method="PCR"/>
            <result pathogen="InvB" value="o" interpretation="ni" method="PCR"/>
          </results>
        </report>
      </virologic>
      <virologic notifyWeek="2" period="2001/2002">
        ...

With the code that I adopted from the tutorial

for event, elem in etree.iterparse(infile, events=("start")):
     if event == "start":
         print "start:", etree.tostring(elem, pretty_print=True)
         print "--->", elem.tag

I got something like:

...
start: <source idSource="NRZ Berlin">
          <ct:software name="psql2xml" version="0.1"/>
        </source>

---> {http://www3.rki.de/ns/agi/ibs/2007/T06/report}source
start: <ct:software name="psql2xml" version="0.1"/>

---> {http://www3.rki.de/ns/rki/base/ct/2007/T03}software
start <target idTarget="RKI"/>


---> {http://www3.rki.de/ns/agi/ibs/2007/T06/report}source
...

the elements as a whole with children on the one hand but I have no
idea how to finally access the values like 'idSource="NRZ Berlin" '
nor do I have an idea how to get rid of the default name space that
is prepended before the tags.  I would rather like to access the tag
called "source" (without the default name space) or "ct:software"
with the shortcut of the name space.

I also found the very interesting objectify method at
    http://codespeak.net/lxml/objectify.html
but I finally have no idea how to use that in the parser because
the page just describes creating objects (or did I missed something?)

Sorry for my ignorance in case things should be obvious from reading
the docs.

Kind regards

           Andreas.

-- 
http://fam-tille.de


More information about the lxml-dev mailing list