[kupu-dev] Kupu says links are bad when they aren't

sisi sisi at foei.org
Thu Apr 12 14:33:16 CEST 2007


Hi all,

We're trying to get our content migrated from our flat html site into a 
plone site, and while using the kupu relative path to uid tool we've 
come up against some strange problems.

Our plone site settings are:
Zope 2.9.6-final, Plone 2.5.2, python 2.4.4, linux, kupu 1.4 (svn, trunk 
- Revision: 41990)

All files have been dropped in using both FTP and WebDAV (separate 
experiments).

Link checker in Kupu erroneously flags some links (relative, site 
internal) as bad. Some common elements are:

. involving GIFs (and sometimes jpgs, we think)

. link is often a combination of anchor and img like:
    <a href="..."><img src="../../images/something.gif"></a>

. after editing (but making no changes) with Kupu and saving, these 
links become:
    <a><img src="../../images/something.gif"></a> (the anchor point has 
silently been removed)

This means that while kupu will flag these links as bad (even though 
they are not, and they are within either href or src), as soon as we 
have saved a page and rerun the kupu links script, the links get UID'ed 
and no longer marked as bad by kupu. So it seems that one small element 
that kupu does not like on our pages is causing it to ignore all the 
relative paths on that page.
Since we have close to 4500 pages we cannot save every page in our 
migrated content and then rerun the links scripts. At least, we'd rather 
not!

Other types (non specific) are also involved, but not as frequently. 
Furthermore, due to the fact the Kupu believes these links are bad, it 
will not convert them to UID.

Point of note is that, for some unknown reason, the content type 
registry was empty on first loading the files. This peculiarity has been 
noted by several people on the net, and is easily fixed by uninstalling 
and reinstalling the ATContentTypes product. After fixing this, the 
problem went away for many of the PDFs, for instance, but still persists 
(even after repopulating the folders via FTP or WebDAV) for many GIFs 
(and still some jpegs and pdfs and normal href links).

If anyone has any ideas about what might be causing this problem please 
send them on. We think there are 3900 pages with bad links on them, and 
each page has several bad links. One thing that is throwing us off the 
scent is that some pages have a mixture of resolved uids and relative 
paths after running the scripts.

One question we have is:
Where is the code that identifies links for checking? Is it in kupu or 
is kupu calling it from a function or something in Plone? Because we'd 
like to look at the code and make some progress that way but we can't 
find it, using all our ninja powers of grep and find etc :-)

Cheers,
sisi

-- 
# sisi nutt # extranet coordinator
# Friends of the Earth International
# PO Box 19199 # 1000 GD Amsterdam # The Netherlands
# Tel 31 20 6221369  # Fax 31 20 6392181  # http://www.foei.org
# email sisi at foei.org # skype foei_sisi


More information about the kupu-dev mailing list