[lxml-dev] extracting .text strings systematically in unicode

Stefan Behnel stefan_ml at behnel.de
Tue Dec 9 19:23:39 CET 2008



Stefan Behnel wrote:
> John Lovell wrote:
>> The first one is the one the raises an exception for non-strings?
> 
>   Python 2.6.1 (r261:67515, Dec  7 2008, 21:12:01)
>   [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
>   Type "help", "copyright", "credits" or "license" for more information.
>   >>> u""+1
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in <module>
>   TypeError: coercing to Unicode: need string or buffer, int found

Or to present something more lxml related (session edited for readability):

  Python 2.6.1 (r261:67515, Dec  7 2008, 21:12:01)
  [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import lxml.etree as et
  >>> root = et.fromstring("<a><!--test--></a>")

  >>> root.tag
  'a'
  >>> unicode(root.tag)
  u'a'
  >>> u""+root.tag
  u'a'

  >>> root[0].tag
  <built-in function Comment>
  >>> unicode(root[0].tag)
  u'<built-in function Comment>'
  >>> u""+root[0].tag
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: coercing to Unicode: need string or buffer, \
                                 builtin_function_or_method found

Stefan



More information about the lxml-dev mailing list