<div dir="ltr">I have a sample XML file which contains <text>&#135;&#135; .... </text> with 8,000,000 (eight million) repetitions of '&#135'.<br><br>A test program for loading it and then writing it is:<br>
<br>import sys<br>#import cElementTree as ET<br>from lxml import etree as ET<br>f=open(sys.argv[1])<br>et = ET.ElementTree(file = f)<br>et.write('ooo')<br><br>When it is run with cElementTree , it completes successfully in about 1 minute.<br>
When it is run with lxml, it does not complete, even after 12 hours!!! and the process is constantly at 100% CPU. <br>Further testing showed it reaches the 'write' statement quite fast and is stuck in there.<br><br>
Is this a bug or is lxml just dead slow relative to cElementTree , for this action?<br><br>Notes:<br>1) Nothing special about '&#135;', it is just a simple sample with the same character repeating. The original problem showed up with a long file of various entity refs (some encoding of binary data).<br>
2) Testing with shorter files (thousands of characters), seemed to have similar speed for cElementTree and lxml.<br><br>TIA<br>Moshe<br><br></div>