[lxml-dev] cssselect and cssutils

Stefan Behnel stefan_ml at behnel.de
Mon Jan 7 20:56:11 CET 2008


Hi Ian,

Ian Bicking wrote:
> element.text is just a unicode string.

or a plain string.


> Maybe we could have a method
> like element.text_range(0, 1) that returns a subclass of unicode that
> also happens to know something about its location.

I prefer having the XPath string results be something like that. I think
that's the only case where you can 'spuriously' end up with a text value and
might want to know where it came from.


> class ElementText(unicode):

Maybe we should still keep up the str/unicode duality here. Although that will
be history with Python 3, it isn't now, and it is an integral part of the
current lxml API.


>     def __new__(cls, text, is_tail, range, parent):
>         self = unicode.__new__(cls, text)
>         self.is_tail = is_tail

Right, 'is_tail' should be in.


>         self.range = range

'range' would be the substring indices? I would prefer calculating as much as
possible on demand. Remember, most people will not use this object in any
other way than a plain string. That's why I'm so hesitant about instantiating
an Element object along the rode.


>     def enclose_in_tag(self, el):
>         """
>         Enclose this text range in an element, like::
> 
>             span = Element('span')
>             el.text_range(0, 1).enclose_in_tag(span)
>         """

Hmm, I'll have to think about that one. Not sure what the exact semantics
should be.


> Upon further thought, maybe subclassing unicode isn't the right thing --
> perhaps it should really just wrap a string.

No, it must be a 'real' string to avoid having to check for another special
case in the API (and likely in other places that we do not control).


> It's all a bit tricky ;)

I know. :)

Stefan


More information about the lxml-dev mailing list