[lxml-dev] lxml XSD Schema validation problem

Richard Smith richard at indigo3.net
Wed Nov 5 18:01:54 CET 2008


Hi guys,

I'm getting an error using the lxml objectification API and XSD Validation.

Am using lxml and from distribution on standard hardy lib versions.

Here's the boilerplate version info from the FAQ:

lxml.etree:        (2, 1, 2, 0)
libxml used:       (2, 6, 31)
libxml compiled:   (2, 6, 31)
libxslt used:      (1, 1, 22)
libxslt compiled:  (1, 1, 22)

The code I'm testing with is as follows:


############################ >8 ############################
#!/usr/bin/python

from lxml import objectify
from lxml import etree

SCHEMA_FILE = 'nvdcve.xsd'
DATA_FILE = 'nvdcve-recent.xml'

schema_file = open(SCHEMA_FILE)
data_file = open(DATA_FILE)

schema = etree.XMLSchema(file=schema_file)
parser = objectify.makeparser(schema=schema)

tree = objectify.parse(data_file, parser)
############################ 8< ############################

The XSD I'm validating against is the XSD provided by NIST for CVE posts and
the XML doc is the recent NVDs:

http://nvd.nist.gov/schema/nvdcve.xsd
http://nvd.nist.gov/download/nvdcve-recent.xml

The error I'm getting from lxml is:

lxml.etree.XMLSyntaxError: Element '{http://nvd.nist.gov/feeds/cve/1.2}ref',
attribute 'url': [facet 'pattern'] The value
'http://www.securitytracker.com/id?1021107' is not accepted by the pattern
'(news|(ht|f)tp(s)?)://.+'.

The problem is that on every implementation of regex patterns I can think of
(Python, Perl, egrep etc) the regex validates... Indeed on my win32 xml
editors (LiquidXML and OxygenXML) the data validates with no problems.

$ egrep "(news|(ht|f)tp(s)?)://.+" nvdcve-recent.xml | wc -l
497

pcretest comes back fine too:

$ pcretest
PCRE version 7.4 2007-09-21

  re> #(news|(ht|f)tp(s)?)://.+#
data> http://www.securitytracker.com/id?1021107
 0: http://www.securitytracker.com/id?1021107
 1: http
 2: ht

Is this a libxml2 problem, my code or, dare I say, an evil bug?

Thanks

-- 
Richard


More information about the lxml-dev mailing list