[lxml-dev] parse unable to load url

Mary Lei lei at ipac.caltech.edu
Tue May 19 23:44:19 CEST 2009


I copied sample code from lxml documentation

#! /stage/irsa-sw-dev/cm/env_1.4_mnt/python/bin/python

import urllib
import urllib2
import urlparse
import os
import popen2
import StringIO
import sys, getopt
import re
import string

version = sys.version_info

print "python version: ", version

print os.environ['PYTHONPATH']

## test with lxml
from lxml import etree

from lxml.html import fromstring, tostring, parse, submit_form
#page = parse('http://tinyurl.com').getroot()
page = parse('/home/lei/python-stuff/tmp_CoRoT_exo_index.html').getroot()
print page

page = parse('http://tinyurl.com').getroot()

And got errors when parsing url, local file is ok
dodo:lei > test_lxml.py
python version:  (2, 5, 1, 'final', 0)
.:/home/lei/python-stuff/BeautifulSoup-3.1.0.1:./lib:/home/lei/lxml-2.2/src
<Element html at 3f9a40>
Traceback (most recent call last):
   File "test_lxml.py", line 27, in <module>
     page = parse('http://tinyurl.com').getroot()
   File "/home/lei/lxml-2.2/src/lxml/html/__init__.py", line 661, in parse
     return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
   File "lxml.etree.pyx", line 2693, in lxml.etree.parse 
(src/lxml/lxml.etree.c:52591)
   File "parser.pxi", line 1478, in lxml.etree._parseDocument 
(src/lxml/lxml.etree.c:75665)
   File "parser.pxi", line 1507, in lxml.etree._parseDocumentFromURL 
(src/lxml/lxml.etree.c:75993)
   File "parser.pxi", line 1407, in lxml.etree._parseDocFromFile 
(src/lxml/lxml.etree.c:75002)
   File "parser.pxi", line 965, in 
lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:72023)
   File "parser.pxi", line 539, in 
lxml.etree._ParserContext._handleParseResultDoc 
(src/lxml/lxml.etree.c:67830)
   File "parser.pxi", line 625, in lxml.etree._handleParseResult 
(src/lxml/lxml.etree.c:68877)
   File "parser.pxi", line 563, in lxml.etree._raiseParseError 
(src/lxml/lxml.etree.c:68093)
IOError: Error reading file 'http://tinyurl.com': failed to load HTTP 
resource


-- 
Mary Lei

Software Testing
IPAC-NExScl

Rm: KS-233
MS: 220-6
Phone: 395-1998



More information about the lxml-dev mailing list