[kupu-dev] nbsp tags disappear, breakin' paragraphs
bernhard
g.bernhard at akbild.ac.at
Mon Feb 12 10:11:35 CET 2007
Dear Duncan!
Now i _do_ owe you one...
Kupu is dealing with 'nbsp' entities now exactly as it is supposed
to. I'd suggest to make
this.escapeEntities = function(xml) {
xml = xml.replace('\xa0', ' ');
return xml;
};
default behaviour; Indexing should work perfectly well with it. If
there is some service
as 'fleurop' or 'www.bierher.at' on your side of the screen im am
really willing to send
you a 'surprise'...
Who said Mondays are bad?
Very Best Regards,
Gogo.
<g.bernhard at akbild.ac.at>
On 12.02.2007, at 09:21, Duncan Booth wrote:
> bernhard <g.bernhard at akbild.ac.at> wrote:
>
>> Entering in html view and switching back to 'normal' kupu view
>> i found no any more; I looked into the dom using Firebug - no
>>
>
> It doesn't appear as in the DOM. It will appear as character
> #xA0
> which, since it displays as a space is quite hard to see. I found I
> had
> to do 'copy html' from firebug and paste into an editor set to display
> hex codes for non-ascii characters before I could verify that the non-
> break space was still there.
>
> There is a trivial 'fix' which should get you up and running: edit
> common/kupueditor.js, find this.escapeEntities remove the 'return
> xml;'
> line and uncomment the 4 line return statement. That will entitise
> everything.
>
> this.escapeEntities = function(xml) {
> // XXX: temporarily disabled
> return xml;
> // Escape non-ascii characters as entities.
> // return xml.replace(/[^\r\n -\177]/g,
> // function(c) {
> // return '&#'+c.charCodeAt(0)+';';
> // });
> };
>
> Unfortunately, doing that will break text indexing. An alternative
> 'fix' would be:
>
> this.escapeEntities = function(xml) {
> // XXX: temporarily disabled
> xml = xml.replace('\xa0', ' ');
> return xml;
> };
>
> which just escapes the non break space.
>
>>
>> If you want to have a serious i18n aware catalog you will have to use
>> TextIndexNG - and TextIndexNG knows how to deal with html entities
>> (hopefully); I would like to know if it is hard to locate the code
>> where the entities are dropped - is it zope, plone or kupu? As long
>> as it is not Python we definitively can handle :-P People are unable
>> to edit the image modules i put in otherwise.
>>
> Kupu doesn't entitise what it saves because Plone (either currently or
> in the past, but I think it is still a problem) doesn't handle
> entities
> properly. When you save a document Plone uses PortalTransforms to
> convert the html to plain text before passing the plain text to
> TextIndexNG. The transform looks like (Plone 2.5.2 version):
>
> from Products.PortalTransforms.libtransforms.retransform import
> retransform
>
> class html_to_text(retransform):
> inputs = ('text/html',)
> output = 'text/plain'
>
> def register():
> # XXX convert entites with htmlentitydefs.name2codepoint ?
> return html_to_text("html_to_text",
> ('<script [^>]>.*</script>(?im)', ' '),
> ('<style [^>]>.*</style>(?im)', ' '),
> ('<head [^>]>.*</head>(?im)', ' '),
> ('(?im)</?(font|em|i|strong|b)(?=\W)[^>]*>',
> ''),
> ('<[^>]*>(?i)(?m)', ' '),
> )
>
> Note the XXX comment which has been there for donkey's years. So what
> happens is that x&ecaute;y (or xéy which is what kupu used to
> convert it to) is left unchanged and TextIndexNG indexes separate
> words
> x, eacute, y (or x, xe9, y).
>
> I guess maybe I should attempt to do the conversion for some things
> like
> nbsp and leave accented letters unchanged.
>
> _______________________________________________
> kupu-dev mailing list
> kupu-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/kupu-dev
More information about the kupu-dev
mailing list