[Kss-devel] BeautifulSoup usage in kss.core
Balazs Ree
ree at ree.hu
Thu Mar 20 09:58:30 CET 2008
On Thu, 20 Mar 2008 08:56:47 +0100, Wichert Akkerman wrote:
> I became aware that kss.core uses BeautifulSoup. Looking at the code it
> is used to transform any HTML and XML parameters added to a KSS command.
> I'm not sure I like that behaviour: if I pass in XML I would expect that
> to be outputed as-is without any changes. I'm fine with an exception
> being raised if my XML is invalid, but it should never be transformed.
> Likewise for HTML: BeautifulSoup may make changes to my html that I do
> not want. I feel that KSS should trust people using its API to pass in
> valid data. If people put garbage in you will always get garbage going
> out as well, even if you use BeautifulSoup in an attempt to clean things
> up.
It seems you are in the same opinion with Jeroen. He raised this question
in connection with the kss.base rewrite. As I just responded to his post,
let me quote my full reply on this list as well.
Besides what is in my reply, there were also several reasons why we ended
up applying the current solution, it is a result of a long work, and it
seems to work reasonably well in practice, with no complains so far.
As you can see below I am not opposing the improvement of the currently
applied method but I insist on dealing with it in a separate step from
the kss.base porting.
To further explore the details of the problem, it would also be nice if
you could provide some concrete examples to show where the current
approach fails for you, possibly in the form of working code so we can
test it with different browsers with and without the sanitization.
Best wishes,
-----------------------------------------------------
On Fri, 14 Mar 2008 21:30:27 +0100, Jeroen Vloothuis wrote:
>> Balazs Ree wrote:
>>
>>> So the question that is raised: I offered some reasons above why I
>>> find this functionality important and to be kept. Do we have any
>>> reason to drop this functionality in the next release (based on
>>> kss.base)?
>>>
> According to you two things are handled by BeautifulSoup. One is
> validation and the other is automagically "fixing" data. I am against
> anything which tries to fix my HTML without my intervention, ss you
> might have guessed from me calling it automagic. This could lead to all
> sorts of weird problems which would take an app developer some serious
> time to figure out. I guess if they where anything like me they would be
> very displeased to find out that KSS was doing more than just sending
> the data over the wire.
>
> The other thing you mentioned is checking. Checking implies that we know
> what is correct. It does seem a bit pretentious though to know what is
> valid. If I send some partial paragraph (<p> without an end) because a
> wysiwyg editor stored it like that should KSS explode on me? Even though
> for the application I am developing it would be fine? How do we handle
> new / upcomming tags or entities? Do we know in advance how a tag for a
> new HTML 6 (speculation) tag must be closed?
I hear your opinion, but the reasons you offer above do not convince me.
Note that in the end this html content will be inserted into DOM by the
browser. All browsers will do checking and sanitization of this content
anyway, in a way uncontrollable for us. So in my view the "exact control"
of html you pursue above, is difficult to interpret.
By doing the server side checking and sanitization, as currently, we have
the advantage that
- we have a (possibly more detailed) server side error message in case of
syntactically wrong content, instead of having a client side error,
- and that we can do the sanitization of html in a well defined,
controlled way instead of letting all the browser versions do it
differently, in undefined and uncontrollable ways.
I am not saying we are currently doing the best possible way, but this is
the way it happens now.
> This also brings me to question the actual necessity of this validation.
> Plone and most other web tools / frameworks seem to be perfectly fine
> with not sanitizing or checking the generated HTML. Why would KSS be
> different?
In my opinion, Zope page templates (zpt) do validity checking and some
sanitization. Also KSS is doing this currently. If they _could_ be fine
without this is a different question. This brings me to the other issue
that is related to the procedural question of porting kss.core to
kss.base.
We are always open to fixes and improvements on the code, however noone -
including you - has suggested to change the current way of html
sanitization, unchanged from the initial version of kss. This indicates
that the current way of handling works correctly. Changing it may still
be a valid question in the future even it will open other issues.
But this should not happen _together_ with the current porting effort,
instead it should follow later in a separate step. It is an important
principle that the porting (that means changing the entire server side
stack of kss) should be unobservable for all applications that use kss.
> Godefroid Chapelle wrote:
>> Definitely no reason. I agree that html sanitizing and unicode nicety
>> need to be kept.
>>
>> And that they should go to kss.base.
However as I think the current argument will not lead to a resolution, I
propose that the following should happen:
1. The current way of sanitization done by BeautifulSoup should be
implemented identically to the current way.
2. It should go the kss.base as it belongs there.
3. We should implement a switch in kss.base that can be used to disable
this server side checking. The default value should be the current
behaviour.
4. If developers like Jeroen indeed feel the importance of not having the
sanitization, they can make the effort to change the switch manually
after installing kss. They also become aware that they need to deal with
all possible problems caused by this.
The advantage of this strategy that instead of arguing about it, we give
the freedom to the developers to choose. We can also see that if the
majority of developers is actually toggling the switch, and can base our
future dealing with the issue based on their gained experience.
I would not like here to go into the details on _how_ we should implement
this switch. We need to think about this in a further thread, in case of
the proposal is acceptable.
--
Balazs Ree
More information about the Kss-devel
mailing list