[lxml-dev] CSS and lxml
Stefan Behnel
stefan_ml at behnel.de
Mon Sep 17 15:07:21 CEST 2007
Hi,
Frederik Elwert wrote:
> I am currently looking into the possibilities to work with CSS in lxml
> or Python in general. I hope it's not too off-topic, but since it's
> related to lxml, I thought I might post it here.
Sure.
> The specific use case I have in mind would take an element and look for
> all applying CSS rules. Let's say we have a style element:
>
> <style type="text/css">
> p {
> font-size: 16pt;
> }
> p.strong {
> font-weight: bold;
> }
> </style>
>
> and in the tree, there are two elements:
>
> <p style="color:red;">Some text</p>
> <p class="strong">Some other text</p>
>
> Then I'd like to get something like
>
>>>> el1.getstyle()
> {'font-size': '16pt', 'color': 'red'}
>
> for the first element and
>
>>>> el2.getstyle()
> {'font-size': '16pt', 'font-weight': 'bold'}
> for the second one.
>
> I know that this is currently not possible. The only true CSS library
> for Python that I found were cssutils <http://cthedot.de/cssutils/>.
> They have quite sophisticated support for CSS parsing, but I think the
> library itself is quite DOM-centric and so it's not very pythonic /
> doesn't fit well to lxml. But more important, it has no real XML
> bindings. So it's possible to query stylesheets to get properties that
> match a selector:
>
>>>> stylesheet.props('p.strong')
> {'font-size': '16pt', 'font-weight': 'bold'}
>
> but not to query true elements to get the applying properties.
>
> On the other hand, lxml now has cssselect, which works the other way
> around: It takes a selector and returns all the elements that match that
> selector.
>
>>>> sel = CSSSelector('p.strong')
>>>> [e.text for e in sel(tree)]
> ['Some other text']
>
> So I just wanted to ask if somebody already had thought about this, or
> if somebody has any ideas in which direction to head to solve this
> problem.
>
> Maybe one could write a module, that combines cssutils and
> lxml.cssselect to match css style properties and actual elements. But
> maybe a completely different approach would be needed.
There are a couple of things you have to do here. First, you have to parse
CSS, which only the cssutils currently do. Then you have to find out which of
the rules apply to an element which AFAICT is not currently supported at all.
You could do a brute force test and just take all selectors that you find in
all CSS stylesheets in the document or in external references, to match them
against the element in question - but that would be quite some overhead. On
the other hand, if style lookup is more frequent than document parsing, you
can build an inverse index: run through all CSS selectors, find the elements
they match and store the style content for each of the elements, thus
aggregating the style properties per element.
You could maybe implement a "cssannotate(stylesheet, tree)" function, which
would map a stylesheet on a tree by setting (or extending) the "style"
attributes on each element accordingly. That would come pretty close to what
you were looking for.
Stefan
More information about the lxml-dev
mailing list