[lxml-dev] parse unable to load url
Mary Lei
lei at ipac.caltech.edu
Tue May 19 23:44:19 CEST 2009
I copied sample code from lxml documentation
#! /stage/irsa-sw-dev/cm/env_1.4_mnt/python/bin/python
import urllib
import urllib2
import urlparse
import os
import popen2
import StringIO
import sys, getopt
import re
import string
version = sys.version_info
print "python version: ", version
print os.environ['PYTHONPATH']
## test with lxml
from lxml import etree
from lxml.html import fromstring, tostring, parse, submit_form
#page = parse('http://tinyurl.com').getroot()
page = parse('/home/lei/python-stuff/tmp_CoRoT_exo_index.html').getroot()
print page
page = parse('http://tinyurl.com').getroot()
And got errors when parsing url, local file is ok
dodo:lei > test_lxml.py
python version: (2, 5, 1, 'final', 0)
.:/home/lei/python-stuff/BeautifulSoup-3.1.0.1:./lib:/home/lei/lxml-2.2/src
<Element html at 3f9a40>
Traceback (most recent call last):
File "test_lxml.py", line 27, in <module>
page = parse('http://tinyurl.com').getroot()
File "/home/lei/lxml-2.2/src/lxml/html/__init__.py", line 661, in parse
return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
File "lxml.etree.pyx", line 2693, in lxml.etree.parse
(src/lxml/lxml.etree.c:52591)
File "parser.pxi", line 1478, in lxml.etree._parseDocument
(src/lxml/lxml.etree.c:75665)
File "parser.pxi", line 1507, in lxml.etree._parseDocumentFromURL
(src/lxml/lxml.etree.c:75993)
File "parser.pxi", line 1407, in lxml.etree._parseDocFromFile
(src/lxml/lxml.etree.c:75002)
File "parser.pxi", line 965, in
lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:72023)
File "parser.pxi", line 539, in
lxml.etree._ParserContext._handleParseResultDoc
(src/lxml/lxml.etree.c:67830)
File "parser.pxi", line 625, in lxml.etree._handleParseResult
(src/lxml/lxml.etree.c:68877)
File "parser.pxi", line 563, in lxml.etree._raiseParseError
(src/lxml/lxml.etree.c:68093)
IOError: Error reading file 'http://tinyurl.com': failed to load HTTP
resource
--
Mary Lei
Software Testing
IPAC-NExScl
Rm: KS-233
MS: 220-6
Phone: 395-1998
More information about the lxml-dev
mailing list