[lxml-dev] XML files starting with BOM
Stefan Behnel
stefan_ml at behnel.de
Mon Sep 17 17:00:51 CEST 2007
Gilles Lenfant wrote:
> A first, many thanks for lxml that's the easiest XML lib for Python.
:)
> lxml doesnt't like XML files starting with a BOM (See http://
> www.w3.org/TR/2000/REC-xml-20001006#sec-guessing-no-ext-info).
>
> M$Office 2007 documents use such notation in their inner xml files.
> And I need to skip all chars from the file until I get a "<" before
> passing the stream to lxml. Hopefully, the files are UTF-8.
Is this only with UTF-8 BOMs?
> Is it a bug or a feature ?
Parsing BOM-ed XML data should work. Could you give some more detail here?
Such as some short example code that shows what you are doing to parse XML
data with a BOM and that fails on your machine?
Stefan
More information about the lxml-dev
mailing list