[lxml-dev] cssselect and cssutils
Stefan Behnel
stefan_ml at behnel.de
Mon Jan 7 19:30:54 CET 2008
Hi,
Höke, Christof wrote:
>> Von: Ian Bicking [mailto:ianb at colorstudy.com]
>> ::first-letter is hard
>> because it doesn't match any object in lxml. If it returned a string
>> like "A" it would be very much out of context (e.g., no parent pointer),
>> and it would be hard to do anything useful with it. To make it useful I
>> think it would require some new stringish object that also looked nodeish
>> (e.g., had a .getparent() method).
I considered that a while ago, as it would also be interesting for XPath in
general.
However, currently, we use fast Python string creation functions to serve the
API level. At the time I deducted that changing that to the instantiation of a
custom string object would almost certainly slow things down and complicate
them, just to serve a rather special use case. Although maybe I might want to
take another look at that today...
> What came to my mind was the DOM range spec stuff, but it is not really
> finished, is it? I was reading about it some years (!) ago I think in the
> Javascript Definitive Guide but I think it never went anywhere really.
I had a discussion about that starting over at the XML-SIG list last summer.
http://permalink.gmane.org/gmane.comp.python.lxml.devel/2763
That was the first time I heard about DOM ranges and when I dug into that a
little deeper, I almost ran away screaming. IMVHO, that's an insane and
horribly complicated spec.
> :first-letter should actually be element.text[0] I guess (which would be a
> string in lxml currently?)
Yes.
> I don't really know the lxml API but would it
> be possible to define a subtype for element.text for this case? But you are
> right, a more general approach would certainly be better.
You could define a special string (and unicode) subtype for the result of an
XPath expression, which is determined independent of the Python API level. The
freedom is right there. However, it would mean you have to search and (in the
worst case) instantiate the parent Element to make sure it won't go away while
the string result exists. That's some overhead compared to a simple string
creation.
As I said, I might reconsider that, but I'm not very confident.
Stefan
More information about the lxml-dev
mailing list