[lxml-dev] XML files starting with BOM

Gilles Lenfant gilles.lenfant at gmail.com
Mon Sep 17 16:41:40 CEST 2007


Hi from an lxml newbie,

A first, many thanks for lxml that's the easiest XML lib for Python.

lxml doesnt't like XML files starting with a BOM (See http:// 
www.w3.org/TR/2000/REC-xml-20001006#sec-guessing-no-ext-info).

M$Office 2007 documents use such notation in their inner xml files.  
And I need to skip all chars from the file until I get a "<" before  
passing the stream to lxml. Hopefully, the files are UTF-8.

Is it a bug or a feature ?
-- 
Gilles Lenfant
gilles.lenfant at gmail.com



More information about the lxml-dev mailing list