[lxml-dev] I get CDATA inside parsed html <script> element, and can not retrieve it's text

Alexander Kozlovsky alexander.kozlovsky at gmail.com
Fri Nov 3 22:35:47 CET 2006


Hello all!

I'm very new with lxml. Probably, I find a bug.

AFAIK, lxml does not expose direct interface to CDATA sections.
But, when I use etree.HTML function I get content of <script>
as CDATA section!

    >>> html = etree.HTML('<script> alert("Hello!"); </script>')
    >>> etree.tostring(html)
    '<html><head><script><![CDATA[ alert("Hello!"); ]]></script></head></html>'

The problem is, I cannot retrieve content of <script> tag
because lxml does not allow this:

    >>> script = html.find('.//script')
    >>> len(script)
    0
    >>> print script.text
    None

EXPECTED:
    >>> print script.text
    alert("Hello!");
    
Is it really a bug, or I don't understand something?
    

-- 
Best regards,
 Alexander                mailto:alexander.kozlovsky at gmail.com



More information about the lxml-dev mailing list