[lxml-dev] cssselect and cssutils
Stefan Behnel
stefan_ml at behnel.de
Mon Jan 7 20:56:11 CET 2008
Hi Ian,
Ian Bicking wrote:
> element.text is just a unicode string.
or a plain string.
> Maybe we could have a method
> like element.text_range(0, 1) that returns a subclass of unicode that
> also happens to know something about its location.
I prefer having the XPath string results be something like that. I think
that's the only case where you can 'spuriously' end up with a text value and
might want to know where it came from.
> class ElementText(unicode):
Maybe we should still keep up the str/unicode duality here. Although that will
be history with Python 3, it isn't now, and it is an integral part of the
current lxml API.
> def __new__(cls, text, is_tail, range, parent):
> self = unicode.__new__(cls, text)
> self.is_tail = is_tail
Right, 'is_tail' should be in.
> self.range = range
'range' would be the substring indices? I would prefer calculating as much as
possible on demand. Remember, most people will not use this object in any
other way than a plain string. That's why I'm so hesitant about instantiating
an Element object along the rode.
> def enclose_in_tag(self, el):
> """
> Enclose this text range in an element, like::
>
> span = Element('span')
> el.text_range(0, 1).enclose_in_tag(span)
> """
Hmm, I'll have to think about that one. Not sure what the exact semantics
should be.
> Upon further thought, maybe subclassing unicode isn't the right thing --
> perhaps it should really just wrap a string.
No, it must be a 'real' string to avoid having to check for another special
case in the API (and likely in other places that we do not control).
> It's all a bit tricky ;)
I know. :)
Stefan
More information about the lxml-dev
mailing list