[lxml-dev] XML files starting with BOM

Stefan Behnel stefan_ml at behnel.de
Mon Sep 17 17:00:51 CEST 2007


Gilles Lenfant wrote:
> A first, many thanks for lxml that's the easiest XML lib for Python.

:)


> lxml doesnt't like XML files starting with a BOM (See http:// 
> www.w3.org/TR/2000/REC-xml-20001006#sec-guessing-no-ext-info).
> 
> M$Office 2007 documents use such notation in their inner xml files.  
> And I need to skip all chars from the file until I get a "<" before  
> passing the stream to lxml. Hopefully, the files are UTF-8.

Is this only with UTF-8 BOMs?


> Is it a bug or a feature ?

Parsing BOM-ed XML data should work. Could you give some more detail here?
Such as some short example code that shows what you are doing to parse XML
data with a BOM and that fails on your machine?

Stefan


More information about the lxml-dev mailing list