[kupu-dev] nbsp tags disappear, breakin' paragraphs -> supplemental
bernhard
g.bernhard at akbild.ac.at
Mon Feb 12 10:42:30 CET 2007
Hello again!
supplemental:
xml.replace would just replace the *first* occurance of any matches...
this.escapeEntities = function(xml) {
xml = xml.split('\xa0').join(' ');
return xml;
};
is maybe better; now it rocks.
Gogo.
On 12.02.2007, at 10:11, bernhard wrote:
> Dear Duncan!
>
> Now i _do_ owe you one...
> Kupu is dealing with 'nbsp' entities now exactly as it is supposed
> to. I'd suggest to make
>
> this.escapeEntities = function(xml) {
> xml = xml.replace('\xa0', ' ');
> return xml;
> };
>
> default behaviour; Indexing should work perfectly well with it. If
> there is some service
> as 'fleurop' or 'www.bierher.at' on your side of the screen im am
> really willing to send
> you a 'surprise'...
>
> Who said Mondays are bad?
>
> Very Best Regards,
> Gogo.
> <g.bernhard at akbild.ac.at>
>
>
> On 12.02.2007, at 09:21, Duncan Booth wrote:
>
>> bernhard <g.bernhard at akbild.ac.at> wrote:
>>
>>> Entering in html view and switching back to 'normal' kupu
>>> view
>>> i found no any more; I looked into the dom using Firebug - no
>>>
>>
>> It doesn't appear as in the DOM. It will appear as character
>> #xA0
>> which, since it displays as a space is quite hard to see. I found I
>> had
>> to do 'copy html' from firebug and paste into an editor set to
>> display
>> hex codes for non-ascii characters before I could verify that the
>> non-
>> break space was still there.
>>
>> There is a trivial 'fix' which should get you up and running: edit
>> common/kupueditor.js, find this.escapeEntities remove the 'return
>> xml;'
>> line and uncomment the 4 line return statement. That will entitise
>> everything.
>>
>> this.escapeEntities = function(xml) {
>> // XXX: temporarily disabled
>> return xml;
>> // Escape non-ascii characters as entities.
>> // return xml.replace(/[^\r\n -\177]/g,
>> // function(c) {
>> // return '&#'+c.charCodeAt(0)+';';
>> // });
>> };
>>
>> Unfortunately, doing that will break text indexing. An alternative
>> 'fix' would be:
>>
>> this.escapeEntities = function(xml) {
>> // XXX: temporarily disabled
>> xml = xml.replace('\xa0', ' ');
>> return xml;
>> };
>>
>> which just escapes the non break space.
>>
>>>
>>> If you want to have a serious i18n aware catalog you will have to
>>> use
>>> TextIndexNG - and TextIndexNG knows how to deal with html entities
>>> (hopefully); I would like to know if it is hard to locate the code
>>> where the entities are dropped - is it zope, plone or kupu? As long
>>> as it is not Python we definitively can handle :-P People are
>>> unable
>>> to edit the image modules i put in otherwise.
>>>
>> Kupu doesn't entitise what it saves because Plone (either
>> currently or
>> in the past, but I think it is still a problem) doesn't handle
>> entities
>> properly. When you save a document Plone uses PortalTransforms to
>> convert the html to plain text before passing the plain text to
>> TextIndexNG. The transform looks like (Plone 2.5.2 version):
>>
>> from Products.PortalTransforms.libtransforms.retransform import
>> retransform
>>
>> class html_to_text(retransform):
>> inputs = ('text/html',)
>> output = 'text/plain'
>>
>> def register():
>> # XXX convert entites with htmlentitydefs.name2codepoint ?
>> return html_to_text("html_to_text",
>> ('<script [^>]>.*</script>(?im)', ' '),
>> ('<style [^>]>.*</style>(?im)', ' '),
>> ('<head [^>]>.*</head>(?im)', ' '),
>> ('(?im)</?(font|em|i|strong|b)(?=\W)[^>]*>',
>> ''),
>> ('<[^>]*>(?i)(?m)', ' '),
>> )
>>
>> Note the XXX comment which has been there for donkey's years. So what
>> happens is that x&ecaute;y (or xéy which is what kupu used to
>> convert it to) is left unchanged and TextIndexNG indexes separate
>> words
>> x, eacute, y (or x, xe9, y).
>>
>> I guess maybe I should attempt to do the conversion for some things
>> like
>> nbsp and leave accented letters unchanged.
>>
>> _______________________________________________
>> kupu-dev mailing list
>> kupu-dev at codespeak.net
>> http://codespeak.net/mailman/listinfo/kupu-dev
>
> _______________________________________________
> kupu-dev mailing list
> kupu-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/kupu-dev
More information about the kupu-dev
mailing list