[lxml-dev] .text_content() should leave spaces. Tests included
Max Ivanov
ivanov.maxim at gmail.com
Sat Aug 23 09:57:19 CEST 2008
>> So according to description it transforms
>> "<span>element1</span><span>element2</span>" to "element1element2".
>> Notice the lack of space between contents of two elements.
>
> Exactly as in the HTML source, I would say. Given your specific example, I
> don't think a browser would display it any different.
>
Maybe <span> examples are not suitable here. but .text_content() on
"<html><head><title>test</title></head><body><h1>page
title</h1></body></html>" displaying "testpage title" instead of "test
page title" is definitely wrong. Imagine what would happen with
<table> with multiple td's and tr's - it'll transform it to one big
word without spaces. Do you think that it is correct?. Easiest way
will be but spaces between content of any two tags and keep all other
symbols between tags.
>Feel free to provide a patch.
text_method is an alias for XPath("string()"). But I didn't find any
description of just plain string() function, everything I found is an
"string text()" which according to wikipedia returns text content of
elements only one level lower. So I don't understand how all that
works =)
More information about the lxml-dev
mailing list