Page 1 of 1

Inconsistent results of image caption search function and problem with international characters

Posted: 17 May 2015, 23:03
by bekateen
EDIT: Given the break which is identified later in this thread, which identifies a problem with the display of international characters as an issue separate from the original issue raised herein regarding inconsistency of image search results, I decided to change the title of the thread to include both problems, to aid people who are following up on this in the future.

Hi Jools,

I was using the image caption search tool, and I discovered that the search overlooks some results. I ran the following tests, and here are the results I obtained:
  1. Search for "Curua:"
    Results are L414 and L271. L414 has an image captioned "Rio Curua." L271 has an image captioned "Rio Curua Una." This makes sense.
  2. Search for "Rio Curua:"
    Results are same: L414 and L271. This also makes sense.
  3. and 4. Search for "Curua-Una" or search for "Rio Curua-Una:"
    Results are L271 and Pseudancistrus sp. `Rio Curua-Una.` Both of these spp. have images with the caption, "Rio Curua-Una." It makes sense that L414 disappeared from this search, because the L414 caption doesn't include the word "Una." But why didn't P. sp. `Rio Curua-Una` come up in the first two searches?
Pseudancistrus sp. `Rio Curua-Una` should have appeared as a result in all four searches. The cause should not be a spelling issue, because I'm the person who added the captions for these images and I used a copy/paste function; the two captions should be spelled exactly the same (i.e., no differences in spacing or font character, etc). I'm not sure why it's leaving out some results that should be hits, but I thought you should know about the glitch.

Complicating this is the way the caption search handles international characters. In some cases, the search tool finds words with á and other international characters in them, even if the user types the English character (a). For example, if you run an image caption search for "Rio Tapajos" using only English characters, you get results for both "Rio Tapajos" and "Rio Tapajós" (with the o accented). But in other cases, as in the case with all the Curua and Curua-Una results, these international characters are completely overlooked.

Cheers, Eric

EDIT: I just uploaded an image of C113 in the T-position. Prior to choosing an image caption, I used the image caption search for "T-position" to see how the term has been used on previous photos; I obtained two results: Corydoras paleatus and Otocinclus cocama. So I added the image of C113 with a caption, "Spawning T-position." Now when I rerun the image caption search, the O. cocama has disappeared from the search results. That's not good. (but I can still find it if I search only for "position" and not "T-position").

Re: Inconsistent results of image caption search function

Posted: 11 Jun 2015, 20:50
by Jools
So, to answer the first question, "Rio Curua Una" is not the same as "Rio Curua-Una". The hyphen is not an extended (international) character, so is not substituted for anything else.

As to T-position, I think it's the difference between "T - position" and "T-position". By all means edit the captions to ensure consistent results?

Cheers,

Jools

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 01:03
by bekateen
Dang, I just spent 40 minutes typing a careful response after testing and retesting various aberrations of the caption texts and spelling using the caption search tool, and then I just accidentally closed the window without posting or saving the entry. ~X(

Anyway, in short, I didn't mean to suggest that the hyphen was an international character. I meant that in some instances the caption search tool finds caption words with international characters (e.g., search for the caption "Rio Tapajos" and you will obtain results including "Rio Tapajós") but in other cases it doesn't find captions with international characters (e.g., search for the caption "Curua-Una" but you won't find "Curuá-Una").

And in my search for T-position, there were no spaces in the phrases "T-position" on the different species captions. As I mentioned before, the search term "T-position" initially worked for , and then, WITHOUT ALTERING the entry for cocama, the search term "T-position" stopped finding cocama after I added a photo of with the same caption title. The only typographical difference I observed for the caption of the cocama image was that the "T" in "T-position" was written as a lower case "t" (I have subsequently changed the "t" to "T"). But that shouldn't have mattered, since now if you search for either "T-position" or "t-position", the search results you obtain are the same, and both include captions with "T-position". Is it possible that this glitch is some kind of issue related to old entries needing to be opened and resaved before they work properly? (sort of like some of the old malfunctioning

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 06:36
by Jools
Hi Eric,

I hate it when that happens, sometimes I just bash out the text, submit it and then edit it into shape. That does have the disadvantage of sometimes people reading it before its complete.

There is a lot going on here, so let me split out what I think.

T-position did have a space in it, and the tests I did last night showed (to me) that's why. The space has been removed. It now works fine. I can't explain why after you added C113 that it stopped working. However these issues are going to need to be consistent to figure out. I think this issue is probably a data error and not a bug.

The image caption admin page does clean-up a submitted (or edited) caption but only in terms of removing leading or training spaces. I don't think it does anything more sophisticated akin to the

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 06:42
by Jools
However, there is an issue with international characters. I am recording for now that when I search for "Curuá-Una", the search string changes to "Curu??-Una" and I get no results. Let's call this a search containing extended characters.

When I search for "Tapajos", I get results with image captions that contain "Tapajos" and "Tapajós" but nowhere near all the images containing "Tapajós" (verified by searching for "Tapaj"). Let's call this a search with results expected to contain extended characters.

The issue will be to do with how these are stored and this can indeed be altered by admin edits of the caption. I will look into that now.

Jools

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 06:49
by Jools
This is a real mess! Image captions with extended characters can be stored on the database in one of three ways. It is important to understand that how they are stored and how they are displayed can be different.

1) As Rio Tapajós which displays as Rio Tapajós.
2) As Rio Tapaj??s which displays as Rio Tapaj??s
3) As Rio Tapajós which displays as Rio Tapajós. This will be as a result of image editing.

1) Happens when (I think, not checked) the image is entered for the first time via add image.
2) Happened with old bug in edit image caption
3) Happens when (I think, not checked) the image caption is edited.

Clearly 2 is incorrect, but I don't know for sure which out of 1 or 3 is correct - looking into that now.

Jools

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 07:10
by bekateen
Thanks Jools, that makes a lot of sense. And by the way, I to have observed the conversion of "á" to "??" when I've tried to type the "á" character in new image captions, and I've had the same problem with the "ó" character as well.

I also had an experience once where I used the "You can add, move, edit or re-order 22 images below" link on a CLOG page of one species in order to modify a caption on one image which was written in only English letters, and when I finished and hit the "Do it!" button to accept the change, this action simultaneously (without any action on my part) converted a different caption belonging to a different image on that species' CLOG page from a word with an international character (it was either an á or ó, I can't recall which) to the "??" representation. I was not able to convert it back, so I changed it to the English character (a or o). Definitely, there is some quirk in how the caption entry function and the caption search function are handling these characters... (in addition to the differences you and I observed regarding the incomplete search results for the Curua and Tapaj searches).

Cheers, Eric

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 07:24
by bekateen
Jools wrote:This is a real mess! Image captions with extended characters can be stored on the database in one of three ways. It is important to understand that how they are stored and how they are displayed can be different.

1) As Rio Tapajós which displays as Rio Tapajós.
2) As Rio Tapaj??s which displays as Rio Tapaj??s
3) As Rio Tapajós which displays as Rio Tapajós. This will be as a result of image editing.

1) Happens when (I think, not checked) the image is entered for the first time via add image.
2) Happened with old bug in edit image caption
3) Happens when (I think, not checked) the image caption is edited.

Clearly 2 is incorrect, but I don't know for sure which out of 1 or 3 is correct - looking into that now.
Honestly, based on what I've experienced so far, I don't think either (1) or (3) is going to be the correct explanation, or at least a complete one; read on:

When I have tried to create new captions for new images on CLOG pages, I have tried two methods to get international characters to appear:
  1. Before typing a new caption, I open another browser window and search up a caption for a preexisting image which has the same international chararacter; I copy that international character and then paste that into the new caption for my new image. When I hit the Do it! button, the international character is replaced with the ?? string.
  2. Before typing a new caption, I open another browser window and Google search how to write international letters in HTML code; I find a webpage that displays the proper code (e.g., for ó the code is "&oacute") and then I type that code in place of the desired letter in the new caption. When I hit the Do it! button, the caption displays the HTML code (&oacute) instead of the desired international character.
After creating my new captions, I always go back and check my work. When I observe the erroneous strings (?? or &oacute), I go back and edit them trying whichever of the above techniques I didn't use the first time. And in every case, no matter which I tried first and which I tried second during the correction, I still get the same erroneous strings (?? or &oacute). And so, in both of these cases, the only way I can find to "fix" the text is to go back and edit the caption again, this time replacing the bad string (?? or &oacute) with the proper English character (in this example, "o")... and then I give up trying to get the international character.

I was wondering if perhaps the people who have successfully created captions with international characters may have been using computers which are set in their settings to use international characters instead of simple English characters. That was just a guess, and it's probably wrong, but I couldn't figure out why the caption entry function wasn't accepting my international characters.

Cheers, Eric

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 07:25
by Jools
For my own benefit (because I need to go to work) best practice is "decode HTML entities on INSERT and encode them on output". Decode on insert is fine. However, encode on output I need to think about. I must however be doing this in the catelog pages data already - so first step is to check what I do there.

Jools

Re: Inconsistent results of image caption search function

Posted: 12 Jun 2015, 07:32
by bekateen
P.S. In recent history, the only new images uploaded, which have caught my eye because they had captions with functioning international characters, have been uploaded by @Racoll.

e.g., http://www.planetcatfish.com/common/ima ... e_id=19558

Perhaps Racoll is the key. ;-)

Enjoy work today.

Re: Inconsistent results of image caption search function and problem with international characters

Posted: 04 May 2023, 18:56
by Jools
I fixed this a while back, if you find any "dodgy text" where international characters should be present (e.g. Rio Tapaj??s), then either

a) Let me know, post what it is here, OR
b) Use the admin features to edit them - they all should work correctly now.

Cheers,





Jools