[lxml-dev] .text_content() should leave spaces. Tests included
Max Ivanov
ivanov.maxim at gmail.com
Tue Aug 26 11:19:08 CEST 2008
The way I've implementent text_content() analog. I've no idea abouth
XPath, so maybe some of checks could be implemented as XPath
processing instruction. Thats' just scratch to show an idea, no deep
testing at all but results are ok for me.
inlinetags = [ <tags list from
http://htmlhelp.com/reference/html40/inline.html> ] #except <br>
for el in doc.iter():
if el.text and (el.tag not in self.inlinetags):
el.text = ''.join((' ',el.text))
if el.tail and (el.tag not in self.inlinetags):
el.tail += ' '
if el.tag == 'br':
if el.tail and not el.tail.startswith('\n'):
el.tail = '\n'+el.tail
else:
el.tail = '\n'
el.drop_tag()
More information about the lxml-dev
mailing list