Dear Stefan,<br><br>I did read your other post, but using the file name directly when calling the parser didn't work for me. Here is what I tried:<br><br>from lxml import etree<br>outfile = open("output_test001.txt", "w")<br>
class EchoTarget():<br> def start(self, tag, attrib):<br> if tag.endswith("xternalPage"):<br> line = attrib["about"]<br> if line != "":<br> outfile.write(line+"\n")<br>
print line<br> def close(self):<br> return "closed!"<br>parser = etree.XMLParser(target = EchoTarget())<br>result = etree.XML("content.example.xml", parser)<br><br>This gives the following error:<br>
<br>Traceback (most recent call last):<br> File "extract_links_dmoz005.py", line 15, in <module><br> result = etree.XML("content.example.xml", parser)<br> File "lxml.etree.pyx", line 2358, in lxml.etree.XML<br>
File "parser.pxi", line 1354, in lxml.etree._parseMemoryDocument<br> File "parser.pxi", line 1243, in lxml.etree._parseDoc<br> File "parser.pxi", line 795, in lxml.etree._BaseParser._parseDoc<br>
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResultDoc<br> File "parser.pxi", line 478, in lxml.etree._raiseParseError<br>lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1<br>
<br><br>I have been reading the docs, but I'm new to processing XML in Python, so I don't find them all that easy to understand. I think I'm improving, though :) Thanks for your patience.<br><br>Best,<br><br>Sam<br>
<br><br><div class="gmail_quote">2008/5/24 Stefan Behnel <<a href="mailto:stefan_ml@behnel.de">stefan_ml@behnel.de</a>>:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi,<br>
<br>
did you read my other post?<br>
<div class="Ih2E3d"><br>
Sam Kuper wrote:<br>
> result = etree.XML(infile.read(), parser)<br>
<br>
</div>make that<br>
<br>
result = etree.parse("thefile.xml", parser)<br>
<br>
and consider reading the parser docs on the web page.<br>
<font color="#888888"><br>
Stefan<br>
</font></blockquote></div><br><br clear="all"><br>-- <br><a href="http://five.sentenc.es">http://five.sentenc.es</a> | <a href="http://tinyurl.com/3x9se4">http://tinyurl.com/3x9se4</a><br>--<br>Mr Sam Pablo Kuper BSc MRI<br>
Research Assistant<br>Darwin Correspondence Project<br>Cambridge University Library<br>West Road<br>Cambridge CB3 9DR<br><a href="mailto:spk30@cam.ac.uk">spk30@cam.ac.uk</a><br>Office: +44 (0)1223 333008<br>Mobile: +44 (0) 7971858176<br>
<a href="http://www.darwinproject.ac.uk">www.darwinproject.ac.uk</a>