Page 1 of 1

Occurrence data: better to have it and be (slightly) wrong?

Posted: 15 Dec 2015, 08:28
by Silurus
Given the spate of occurrence data (map) issues we seem to be having, I decided to focus on the first family (Akysidae) in the Cat-eLog, and found tons of problems for more than half the species there (some of which have been partially corrected). It occurred to me that the occurrence data marks the middle of the chosen river drainage, which is almost always the wrong point (if the point of the map is to mark localities where the species is known to occur). I realize that this would be the only option in the absence of coordinate data (which is true for material collected before the advent of GPS).

So, the question is whether or not it is better to have (slightly) wrong locality markers using the occurrence data option, or to not have them at all (since people get the impression that the markers represent collection localities and these are wrong 99.9% of the time).

And the use of country-level markers should be avoided. For instance, using the "Indonesian waters" option puts a marker somewhere in the southeastern tip of Sulawesi, where no known native freshwater catfish is found.

Re: Occurrence data: better to have it and be (slightly) wrong?

Posted: 16 Dec 2015, 06:13
by bekateen
This is a great question/thread to start. Since I have been reporting a lot of the recent locality discrepancies, I'll chime in with my thoughts. They will be presented in a somewhat random order (except #1), but hopefully they stimulate discussion.

Before I begin, let me answer your question: My feeling/opinion on the question is, "Yes:" I think it's better to provide locality data, knowing that in some cases it will be at best inadequate and at worse misleading. Why do I say that?
    1. First and foremost, for many of the fish which enter the pet trade as undescribed or unidentified, collection locality is sometimes all the information we have available to us in order to "anchor" the animals for future reference - Two fish which look similar and are from nearby waterways may belong to the same species or they may not be. Obviously, this is particularly germane to all the loricariids and corys (L #, LDA #, C #, and CW #). How else are we to respond to people who say, "I bought this fish the other day. The importer said it was from the ____ river. What is it?"

      There have been a number of corys I've had or investigated which are known only as C or CW numbers, and they look very much like described species, but they are from the wrong location (e.g., and ). And in most cases, our CLOG databases don't have that locality info (I've been working to update the CLOG pages with relevant collection locality info (when available) to make the CLOG pages for these # species more helpful). Without locality info, NOTHING (literally, nothing) exists to define these C/CW/L/LDA # organisms because they are not yet described, or attributed to previously described spp.
      • What do you do with species that are distributed over relatively wide areas or multiple waterways? Initially, you may post only one occurrence on the map, and over time you may add more occurrences to show the range of the species. But until then, you have only one pin on the map. Where are you going to place that pin? Ideally at the type locality, if known with Lat/Long data. If not, then where? In the suspected center of the range? That could be grossly inaccurate with regard to where in the range the fish actually exist (e.g., wrong microhabitat), even if in the correct waterway.
        • When somebody finds a particular fish in one segment of a river, and a different fish upstream or downstream from that locality along the same river, of course it would be nice if there were two distinct entries in the database of localities; but who can say with certainty for EVERY pair of similar fish found that either of these fish might also be present slightly upstream or downstream from their original discovery site? In that case, ideally there are two entries in the database, but by doing so, we are artificially suggesting at the same time that the OTHER site is outside the range of the fish, when in fact it might not be.
          • Related to my previous point, I am very supportive of adding more narrowly prescribed entries into the database; I've been doing some of that myself lately, especially with regards to harvesting collection info from older papers. But as you point out, before the days of GPS, these localities were often general and/or vague. Case in point: I was recently updating the CLOG entries for several spp. described by Fowler in 1943 (Proceedings of the Academy of Natural Sciences of Philadelphia v. 95, pp 246, Figs. 23-25); his collection info was simply, Río Orteguasa, Florencia, Caquetá, Colombia. Well, there is already an entry for the Río Orteguasa in our database, but the map pin for this river is NOT placed on the actual river. I infer that this river's database entry was first created for , the only other species listed in the Rio Orteguasa and NOT described by Fowler (1943). But even that species' Lat/Long coordinate data don't match the map marker location for the same river. My inclination would be to edit the Lat/Long data to place the map marker right in the area of Florencia where the Orteguasa begins, even if that's "incorrect," because we have no better info to go off of; but I didn't change it yet because I haven't had time to investigate why the map marker was originally placed where it is now.
          So absolutely, let's keep the locality data, and let's keep improving its accuracy at every opportunity. Personally, I have come to rely on access to locality data, knowing that it is sometimes inaccurate. But it gives me a starting point.

          A related issue that comes up in this context, one which I've mentioned indirectly before in the context of other threads (e.g., a very indirect reference to it is here: Enhancement idea: Create a page that inventories images/videos of locations and habitats), is how do we know what areas are included in a particular occurrence data point on our maps?
          Silurus wrote:And the use of country-level markers should be avoided. For instance, using the "Indonesian waters" ...
          Not only is this inaccurate, it is very vague. It would be helpful to me, and perhaps to others, if there were some kind of map which shows the geographic areas covered by the names/localities in the database. For example, how much water or land is included within the term, "Indonesian waters?" or "Upper Amazon?"
            Okay, there - that should give people lots of fodder to discuss.

            Cheers, Eric

            P.S., This conversation is not really complete without considering the parallel question, What is a Species Anyway?. Myself, I am a strong believer in the species concept and the utility of the biological species concept, specifically. I recognize its limits, but among all other related concepts it is the one which provides testable hypotheses for examining moments of speciation. I suppose some would argue with me on this. Most criticisms I've read of it are not about the definition itself, but (unfortunately) of instances where humans apply the definition improperly. Some critics like to talk about populations as the only relevant "evolutionary unit," and I see and value the validity of this, but species are often larger than populations (but can also be just as small as a single population)... And thus the conundrum of lumpers and splitters. Why do I bring this up? Because locality data for a single population is often easier to define than locality for an entire "species," if we consider the species to be made up of multiple populations, especially when those populations are distributed over wide areas as I mentioned above.

            Cheers, Eric

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 16 Dec 2015, 07:56
            by Bas Pels
            Obviously a species concept is rather important if one is to make a distribution map. Consider fish from 2 populations - if they are 1 species, they show up on the same map, if not, they don´t

            Personally, I think species is something we, humans invented in order to help understanding the world around us. But working with populations is not easy either. Assuming a species such as Pterogolichthys gibbiceps is found along the whole Amazon river - does that make it 1 single population? And does that mean it does not matter whether a fish comes from Peru or Santarem in Brasil, the care is the same? Regardless the fact that the Peruan Amazon is a few degrees colder then in Santarem?

            For aquarium practise this does not help

            Back to the original questions from Silurius

            I think having something is better than having nothing, and it is just as it is in science - we don´t know what will be discovered tomorrow, but this is what we have now.

            And we are willing to change everything if neccessary

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 16 Dec 2015, 11:35
            by smitty
            Great interesting points. To be honest I am at a lost for words but thanks for giving me a point of view to think about.

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 16 Dec 2015, 12:02
            by racoll
            I think I mentioned this before, but I think the best approach is to use museum specimen data from GBIF (Global Biodiversity Information Facility).

            Sure, it's not perfect, but it goes some way to solving a lot of the problems of the ambiguity in assigning a single dot to the distribution of an entire species. Here, each marker is a verifiable museum collection record, i.e. it represents a fish that was collected in the field with lat/long data, and whose identity can be re-evaluated.

            The main downside is that it doesn't have any info on undescribed species---well it does, but you don't know if they are the same ones as in the clog---, so these will have to be assigned a location manually, but this is what needs to be done anyway.

            Here are all the records:
            akysis.png

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 16 Dec 2015, 15:31
            by TwoTankAmin
            I think there is another way to deal with this, if it is practical for the site to implement. It seems to me as if the issue boils down to knowing exact locations vs. different levels of generalizing. The less precise the location information, the less useful it will be. I see the solution here as having a simple easy to understand method for delineating a few potential grades of accuracy. A simple letter, number or color code should suffice. For example

            Location information is ranked from 1 highest to 4 lowest according to its degree of accuracy:
            1 = This fish is found only in this location.
            2 = This fish found in this location as well as others, (perhaps list the other known locations if they cannot be pinned on the map).
            3 = This fish found in this river only, or in multiple rivers, (perhaps list the other rivers if they cannot be pinned on the map).
            4 = This fish found in this country, or in multiple countries (perhaps list the other conotries if they cannot be pinned on the map).

            I am just thinking out loud here and there is likely a more effective system. I think the key is to work out a good basic system to quantify the exactness/reliability of the location data. This would allow for both the most exact and the most general information to be used since the reader will know exactly how accurate/precise the information is.

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 16 Dec 2015, 16:23
            by bekateen
            Bas Pels wrote:... Assuming a species such as Pterogolichthys gibbiceps is found along the whole Amazon river - does that make it 1 single population? And does that mean it does not matter whether a fish comes from Peru or Santarem in Brasil, the care is the same? Regardless the fact that the Peruan Amazon is a few degrees colder then in Santarem?

            For aquarium practise this does not help
            Indeed. Since the biological definition of a “population” (and “species,” for that matter) has at its core the concept of gene flow among members, this is a very relevant point in terms of how much we worry about or focus on factors such as how locality data is applied to aquarium practice. Without knowledge of the extent of gene flow among localities, it seems appropriate to be very mindful of obtaining "exacting" locality data when possible in order to adapt aquarium conditions (temp, water, etc.) as closely as possible to the preferences of the fish. But when a species is widespread (like gibbiceps), it probably indicates that these animals ARE very adaptable genetically to diverse water conditions - gibbies are a great example of this, as seen by their successful introduction to so many other places around the world (even though it’s not good they have become established almost world-wide); therefore we glean that they can succeed in a variety of aquarium conditions.
            Bas Pels wrote: Back to the original questions from Silurius

            I think having something is better than having nothing, and it is just as it is in science - we don´t know what will be discovered tomorrow, but this is what we have now.

            And we are willing to change everything if neccessary
            Exactly.
            racoll wrote:I think I mentioned this before, but I think the best approach is to use museum specimen data from GBIF (Global Biodiversity Information Facility).
            Thanks racoll for bringing this up. I recall you mentioning GBIF in the thread I referenced above (in my first post here). In that thread, I responded by stating that your idea probably wouldn’t work because (it was my impression that) PlanetCatfish doesn’t tend to rely heavily on outside sources to populate content because PC wouldn’t have control of the content then.

            Since I made that comment, I’ve learned that in fact PC relies heavily on external databases to populate content on CLOG pages (I suspect you already knew that :-)). So if PC could somehow harvest the GBIF data for localities and use that info to populate our CLOG maps/occurrence data, I wholeheartedly agree that this would be an improvement over the system we have now for mapping occurrences.

            I’m not sure how the GBIF database works, but I can see one shortcoming with this idea: That is, currently the PC database is set up so that people can identify a locality and then use it to “find other species” from the same area. I know that our current system is fraught with inaccuracies and generalizations, but conceptually it is a beneficial feature to be able to identify other species which happen to come from the same locality. So just as Bas Pels stated above, “I think having something is better than having nothing,... And we are willing to change everything if necessary,” I think the same principle would apply in this case too.
            racoll wrote:The main downside is that it doesn't have any info on undescribed species---well it does, but you don't know if they are the same ones as in the clog---, so these will have to be assigned a location manually, but this is what needs to be done anyway. (emphasis added)
            Yes. Just as we currently do with other CLOG database fields now, we would need to leave open the option for us to edit the locality/occurrence data on our site to make up for these shortcomings.
            TwoTankAmin wrote:I think there is another way to deal with this, if it is practical for the site to implement. It seems to me as if the issue boils down to knowing exact locations vs. different levels of generalizing. The less precise the location information, the less useful it will be. I see the solution here as having a simple easy to understand method for delineating a few potential grades of accuracy. A simple letter, number or color code should suffice. For example

            Location information is ranked from 1 highest to 4 lowest according to its degree of accuracy:
            1 = This fish is found only in this location.
            2 = This fish found in this location as well as others, (perhaps list the other known locations if they cannot be pinned on the map).
            3 = This fish found in this river only, or in multiple rivers, (perhaps list the other rivers if they cannot be pinned on the map).
            4 = This fish found in this country, or in multiple countries (perhaps list the other conotries if they cannot be pinned on the map).

            I am just thinking out loud here and there is likely a more effective system. I think the key is to work out a good basic system to quantify the exactness/reliability of the location data. This would allow for both the most exact and the most general information to be used since the reader will know exactly how accurate/precise the information is.
            To some extent, what you are describing is already built into the current system. On many CLOG pages, multiple occurrences are listed and pinned on the maps; therefore, the more itemized localities you see, the broader the distrubtion. And generalized localities (e.g., "Amazon basin") or localities which have no pins can be listed in the text field called "Distribution."

            But, to pursue your ideas further, perhaps this could be implemented visually, rather than textually. Currently on the CLOG maps, occurrence data are shown as pins on the maps. Original “type locality” data is typically shown as a yellow/gold pin with a star inside (personally, I appreciate the star as helping to set apart type localities from other localities). Additional occurrences are shown with different colors. I have not yet been able to decipher whether or not these other colors (green, grey, yellow (without a star)), etc., have specific meanings. But if they don’t, then maybe these colors could be coded to reflect your ideas: Assign a specific pin color to datapoints that are highly exact, and other colors to occurrences that are increasingly inexact.

            For example, currently new occurrence data are added to the database with allusions to things like “size” (tiny, small, med, large, etc) for rivers. If there were a data field for “accuracy” when a new occurrence were introduced for a species, then this data could be coded by color.

            TTA, this doesn’t exactly accomplish what you’re suggesting, but it might help us think about how to code your ideas.

            Cheers, Eric

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 16 Dec 2015, 18:12
            by racoll
            bekateen wrote:I can see one shortcoming with this idea: That is, currently the PC database is set up so that people can identify a locality and then use it to “find other species” from the same area. ... conceptually it is a beneficial feature to be able to identify other species which happen to come from the same locality.
            Agreed. I was thinking actually of having both the PC data and the GBIF data on the same map, just with different colour or shape icons.

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 17 Dec 2015, 01:40
            by Silurus
            I would think that if you wanted to indicate the river drainage the fish came from, it would be more accurate to highlight it (i.e. mark the drainage with different color) rather than to place a marker. Or at the very least, place a series of markers that map out the river's main course (although that may have the effect of making it even more confusing than it already is).

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 17 Dec 2015, 01:58
            by bekateen
            Silurus wrote:I would think that if you wanted to indicate the river drainage the fish came from, it would be more accurate to highlight it (i.e. mark the drainage with different color) rather than to place a marker.
            Thank you, Silurus. That is an excellent way of describing what I had in mind when I said this above:
            bekateen wrote:It would be helpful to me, and perhaps to others, if there were some kind of map which shows the geographic areas covered by the names/localities in the database. For example, how much water or land is included within the term, "Indonesian waters?" or "Upper Amazon?"
            ... except that my initial concept wasn't so well developed: I was envisioning a delimited (encircled) geographic area intended to represent a locality name.

            Cheers, Eric

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 17 Dec 2015, 04:26
            by bekateen
            One of the issues to consider as we discuss this is, how do we map multiple localities along the same river (e.g., for different species) without multiplying the name of the river in the database? What I'm thinking is that we might organize occurrence info in two ways:
            1. Entire waterways might be highlighted (as per Silurus' suggestion) or bounded (as I imagined, but I like Silurus' idea better). If a particular fish is known to exist in a waterway, then we can add that as a highlighted occurrence to the map.
            2. If somebody has specific locality data (ideally, GPS or Lat/Long data; alternatively, a carefully worded description (e.g., in the Rio "X," 1 mile downstream from the town of "Y"), then that occurrence could be added to the map without it being linked to the waterway's map highlight.
            This way, if two fish are both from the same river, each can have the river marked/highlighted on their occurrence map, and each could also have independent points pinned to the map.

            As the system is organized now, every occurrence locality in the database has to have a name assigned to it (e.g., Meta, Nile, Orinoco, whatever). And if any pin is going to appear on the map to represent this body of water, then each occurrence has to have Lat/Long data assigned to it. But obviously, most rivers, and even many lakes, are going to be so large that they extend over vast areas with different Lat/Long coordinates. So just as Silurus was saying that "Indonesian waters" is so vague that it is essentially useless, we could make these large areas more useful in the future if specific locality pins can be added without suggesting that the waterway name is only found in that one spot.

            Effectively what we would be doing is allowing part of our occurrence data to be logged and mapped in the same way (I think) that GBIF data are - a bunch of points, as Racoll was suggesting.

            Just today I was reading a paper about . This wasn't in our Cat-eLog, so I added the species and then I wanted to add a locality occurrence. I came to discover that not only were the creeks where the fish live absent from the database, some of the rivers leading up to these creeks were absent too. So I had to add all of them in as parents to the creeks! Here is what I ended up with:

            Orinoco, Middle Orinoco, Meta, Metica, Guayuriba, Negro (Colombia), Blanco (Colombia), La Caja.

            Along the way, I ended up randomly assigning Lat/Long data to the three intermediate rivers (Guayuriba, Negro, and Blanco). These coordinates do rest upon their respective waterways, but they hold no inherent significance regarding any fishes that might live along these waterways.

            Considering Silurus' and Racoll's suggestions, it would be helpful to be able to map waterways and also map multiple specific localities without attributing the locality coordinates to the waterway.

            Cheers, Eric

            P.S., @Racoll, Is there a way to search within GBIF for a location according to the proper name of a waterway? When I try to narrow searches by location, all I seem to find is a tool for drawing polygons on the map. Thanks, Eric

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 17 Dec 2015, 08:11
            by Jools
            There is a lot here to comment on. So, let me start at the top. Also want to include @MatsP in the discussion.

            The recent spate of incorrect occurrences have been due to errors in typography used in published info. As new errors appear we manual fix them and try to program around their happening again.

            A marker with a star in it is a specific location usually a type locality.
            A marker without a star in it is the middle point of a body of water.
            I agree that wide and unspecific areas (e.g. Indonesian waters) are undesirable.

            The point of the maps are more to show the average aquarist where in the world rather than to be absolutely correct in all cases. It is also good to show where species live. As anyone who's sat in a bio diverse river will tell you, it is a far cry from what's actually "on the ground".

            When I am sitting at a computer I will check recent updates to see what corrections are being made and think and write more on this.

            Jools

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 17 Dec 2015, 08:15
            by Jools
            Also, technically, drawing a line on a river is difficult within the google maps environment.

            Different icons for different data sources is possible and perhaps we could pull a GBIF layer onto the species map. It would not work that well on the majority of sub family or family maps.

            Jools

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 19 Dec 2015, 10:45
            by MatsP
            I don't know what to add here. The data is entered largely by me (not counting the "type locality" markers). It is my fault if it's wrong, and usually it's because I have no "better knowledge". I'm not sure if everyone commenting here has access to the database for updating localities, but if you don't and think you want to, let Jools know. I sure could do with help on that account, as we currently have 188 described species with no occurrence data in Siluriformes, and 3371 species entries with no occurrence data if we take the entire database (non-Siluriformes, both described and undescribed species).

            And in particular, Silurus, if you have better information on where fish are from, or a better marker location for something, I'm pretty sure nobody will object if you make it better. [Yes, this is the corresponding to "Patches welcome" answer in an open source community when someone points out that "you could do this better"]

            My aim has been that "every fish should appear on the map", rather than having two fish out of a genus of thirty being marked. Even if the markers aren't exactly where the fish is being found. To have a more precise locality we'd have to either split the current bodies of water into even smaller sections. And I have no problem with if someone feels like adding a little creek that flows into a larger river as the occurrence. But I often only have the data in fishbase, CoF, Cloffsca or similar to go by, and the precision on that is often less than perfect.

            If there is a (simple) way to import data from elsewhere to display where the fish comes from, then that's great. Even if it's not complete. But we still need a way to do the "find fish from the same place" or "search for species from X", which I don't believe the GBIF will give [directly at least]

            Sorry if this comes across as defensive, but I have spent a lot of hours on actually creating this, and I don't think many of the database entries (5552 occurrences, ~1200 bodies of water, a few handfuls of countries) are by anyone else.

            --
            Mats

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 19 Dec 2015, 12:38
            by Jools
            So, I think this is more about a discussion of where we could go next rather than any kind of commentary of the huge amount of work that's been done to date.

            Correct me if I am wrong, but Google are not able to label bodies of water (see here) and so the idea of having a body of water highlighted will have to remain an idea until Google support it. A plus point in this is that because of the way the data is structured, adding this kind of feature once Google can do will be a slog, but relatively easy.

            Google do have a community driven map augmenting facility called map maker, but it seems like too much work to map all the bodies of the water in the world just becuase Google can't get it done!

            Jools

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 19 Dec 2015, 12:42
            by Jools
            I would like to bring the discussion back to a specific example. I can see @Silurus made some changes to the map - however I can't see what they were (note to self, improve the auditing messages) therefore I can't see what was changed.

            Can you give me one example of a species you changed and from what to what?

            Cheers,

            Jools


            PS For the record, here is what is audited for :

            Code: Select all

            Occurrence data changes for Bagarius yarrelli
            Value 372 wasn't updated as no change was made on previous screen
            Value 5 wasn't updated as no change was made on previous screen
            Value 299 wasn't updated as no change was made on previous screen
            Value 364 wasn't updated as no change was made on previous screen
            Updated value 369
            Inserted value 428
            Inserted value 310
            Inserted value 995
            Inserted value 437
            Inserted value 309
            

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 19 Dec 2015, 12:46
            by Jools
            Silurus wrote:And the use of country-level markers should be avoided. For instance, using the "Indonesian waters" option puts a marker somewhere in the southeastern tip of Sulawesi, where no known native freshwater catfish is found.
            On this specific point, that's a matter of changing the lat/long co-ordinates for the body of water record to, presumably the middle of the island.

            I note this general marker is used for some 27 species. One example being . I can see that the distribution text is "Asia: Singapore, Peninsular Malaysia and Indonesia", so I think this has just been taken and represented in graphical form. I think this is one of those where more data wasn't available or considered.

            GBIF, in this instance, shows us more occurrences, but none in Indonesia. It does look like could use GBIF data or maps somehow.

            Jools

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 19 Dec 2015, 13:21
            by Silurus
            Can you give me one example of a species you changed and from what to what?
            Using as an example:

            Code: Select all

            Updated value 369
            This changed "Indonesian waters" to "Batang Hari"

            Code: Select all

            Inserted value 428
            Inserted value 310
            Inserted value 995
            Inserted value 437
            Inserted value 309
            
            Inserted Kapuas, Pahang, Mahakam, Musi, Barito in that order.

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 19 Dec 2015, 13:28
            by racoll
            Jools wrote:It does look like could use GBIF data or maps somehow.
            If you did, would you want to query their database on-the-fly using their API, or just download and clean up their data and then query it directly from the PC servers?

            http://www.gbif.org/developer/summary

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 20 Dec 2015, 05:56
            by bekateen
            Lots to comment on here:
            Jools wrote:The recent spate of incorrect occurrences have been due to errors in typography used in published info. As new errors appear we manual fix them and try to program around their happening again.
            Thank you, Jools, for mentioning this. I meant to state it in my first post and forgot to do so. It is relevant because the underlying instigation for these corrective forum posts was a (small number) of errors in the database rather than ambiguities in the organization or structure of the database. Therefore...
            MatsP wrote:I don't know what to add here. The data is entered largely by me (not counting the "type locality" markers). It is my fault if it's wrong, and usually it's because I have no "better knowledge".... Sorry if this comes across as defensive, but I have spent a lot of hours on actually creating this, and I don't think many of the database entries (5552 occurrences, ~1200 bodies of water, a few handfuls of countries) are by anyone else.
            Mats, nothing personal. Jools was correct in another of his posts: IMHO, this thread and the issues raised are not an indictment of past work, but more a discussion about how the locality/occurrence database can be enhanced in the future. :-BD
            Jools wrote:A marker with a star in it is a specific location usually a type locality.
            A marker without a star in it is the middle point of a body of water.
            Thanks again. That is about what I expected, although I didn't know the markers without stars were "midpoints" along bodies of water. When I've created new entries in the database, and the number is starting to add up, I've been placing the markers (selecting representative Lat/Long coordinates) by either using my best interpretation of location based on my source (e.g., if a paper states that fish were collected in streams of River X located a certain # of km east of city ABC, then I look at maps and select lat/long coordinates which are as close as I can discern to that described place), or when a locality is simply River X near city ABC, I select lat/long coordinates near the city. I've been using this website (http://www.mapcoordinates.net) to extract the lat/long coordinates which correspond to the locality I'm looking up on the map, when exact lat/long data aren't provided by the original source. I'll be more careful in the future to identify lat/long coordinates near the midpoint of water bodies when I have nothing more to go on. "My bad." :ymblushing:
            MatsP wrote:My aim has been that "every fish should appear on the map", rather than having two fish out of a genus of thirty being marked. Even if the markers aren't exactly where the fish is being found.
            MatsP wrote:If there is a (simple) way to import data from elsewhere to display where the fish comes from, then that's great. Even if it's not complete. But we still need a way to do the "find fish from the same place" or "search for species from X", which I don't believe the GBIF will give [directly at least]
            Both of the points Mats makes are very important to me, and both are somewhat key to my ideas of how we (ha-ha; I mean you, since I can't write code. Sorry :d :-BD ) can enhance the admin functions. So here's some ideas/thoughts:

            Currently, each species can have up to five distribution/occurrences/locality elements on its CLOG page:
            1. Type locality (if available) will appear textually at the top of the CLOG page
            2. The type locality will also appear as a starred pin on the interactive map IF the type locality data includes Lat/Long coordinates
            3. The Distribution field can include a textual description of the species' general localities; this may or may not reiterate the type locality, and more generally it can include additional textual descriptions of where the species is found. This field does not have any representation on the interactive map.
            4. The interactive map will show flags for all occurrences recorded manually by Mats and whoever else is able to add this info as admins; these pins are in addition to the type locality data which may or may not be pinned (see list #2 above).
            5. For every pinned occurrence on the map (with its own lat/long coordinates), a textual description (a hierarchical tree) will appear in the Distribution field above the map. IMPORTANTLY, it is this textual record which people can click on to "find fish from the same place" or "search for species from X."
            Okay, so given all this, here we go:
            MatsP wrote:To have a more precise locality we'd have to ... split the current bodies of water into even smaller sections.
            Mats, you mentioned the idea of subdividing bodies of water. I think this is a good idea when a body of water can be subdivided into smaller segments, each with its own distinct textual name. But I would not think this is a good idea if it means taking a single river (e.g., Amazon River), and dividing it up every 2-3 km along the waterway if all of the segments are still known by the single name of the waterway. Why is this problematic? If the waterway was originally entered with a specific Lat/Long coordinate corresponding to a specific collection locality for once species, then that pin is more important than just "midway" along the river - it is the actual location of the fish. Now imagine that tomorrow I want to add an occurrence for a different species along the same waterway, but this fish is not from the same Lat/Long coordinates. I am faced two choices: Either
            • use the preexisting entry for waterway and accept a pin in an incorrect location, EVEN THOUGH I have at my disposal a more correct set of Lat/Long coordinates to apply, or
            • create a new entry in the occurrence database, with the same waterway name but with new Lat/Long coordinates. While that might work for me today, then the next time somebody comes along and wants to add an occurrence for some other (third species of) fish from the same waterway, they have to figure out which of these two entries they should use, or they will need to create a third entry if they have new Lat/Long coordinates.
            IMO, having potentially multiple Lat/Long entries with the same "name" will be confusing to admins and users in general. So this is my suggestion or question:

            Can we add an additional admin function related to occurrences which allows this? Admins can add new entries of Lat/Long coordinates and NOT provide a specific name for the entry, but in lieu of a name for this site, we select the corresponding (preexisting) waterway as a "parent." This way, the corresponding map point will be accurate for that species and the Distribution field will textually display the parent waterway on the CLOG page, but we won't be proliferating multiple occurrence entries with the same waterway name in the database. When a user views the CLOG page for that species, the textual Distribution field will list only the "Parent" waterway, but the map shows the pin for the new Lat/Long location. This way, the user can still search for fish nearby in the parent waterway, but it will allow multiple species to be specifically located at different positions along this same waterway. Of course, the drawback of this would be that multiple species from the same waterway, but not necessarily the exact same area of the waterway, will be listed together during searches; but this weakness has been discussed before (Re: Enhancement idea: Create a page that inventories images/videos of locations and habitats) and I think the general consensus in the past has been that something was better than nothing (much like the issue at hand in this thread!)

            Now, this suggestion only applies when there is no better name for the particular subsection of the waterway involved in the posts. Obviously, when the new section can be named, then we just make a new entry, as is currently the system here. Mats, you also mentioned we can even add in small creeks. Well, that's what I did for . It was collected from several small creeks along the Rio Blanco, which is near the end of the Rio Negro basin watershed (oddly, Google Maps calls this waterway "Blanco" but other web maps (e.g., mapquest.com) consider this same segment part of the larger Negro; but that's a whole different issue. And since this site uses Google Maps, I used Blanco instead of Negro as I set up the parent waterway entry in the database).

            I named that entry after one of the creeks along the Blanco, but there were others. So is there a better solution here? There is the "Description" field beside every occurrence. For the creeks, I wrote " Rio Negro basin watershed creeks" in the description field. Admins are able to add this additional information in to occurrence entries, but none of this information appears on the clog page. What if, for situations where we have unique Lat/Long coordinates along already named waterways, we could display this textual "Description" field on the end of the nesting, after the relevant parent field, even if this descriptive text wasn't a link to "find other fish here"? Or would you be willing to accept new occurrence entries with narrative names, rather than just proper names (e.g., typing "Rio Negro basin watershed creeks" in the "Name" field instead of in the "Description" field)?

            I know that's a lot to chew on. Thanks for your indulgence.

            Cheers, Eric

            P.S., This suggestion is in addition to Racoll's and Jools' talk of overlaying GBIF data points to maps. I think that would be marvelous.

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 20 Dec 2015, 16:07
            by MatsP
            Ok, I'm skipping over MOST of the discussion, and commenting on a few of Eric's comments...
            If the waterway was originally entered with a specific Lat/Long coordinate corresponding to a specific collection locality for once species, then that pin is more important than just "midway" along the river - it is the actual location of the fish. Now imagine that tomorrow I want to add an occurrence for a different species along the same waterway, but this fish is not from the same Lat/Long coordinates
            This should not be the way that coordinates are used for bodies of water. The marker should be in the middle of a body of water, not where the first species found in that river was caught.
            Mats, you mentioned the idea of subdividing bodies of water. I think this is a good idea when a body of water can be subdivided into smaller segments, each with its own distinct textual name.
            The idea here was rather to have specific named region, e.g. "Cachioera do Belo Monte" as part of Lower Xingu. - Ideally we'd then also have "Lower Xingu above Cachioera do Belo Monte", etc - as well as Lower Xingu to represent fishes that occur everywhere in the river.

            However, the point is still that we're not intended to precisely recording capture localities, but a general rough idea of the distribution of the fish.

            It wouldn't be very hard, in my view, to add another database table, where we could record actual locations where fish has been captured in lat/long - it would take a little bit of programming to add this feature.

            The real work is to add the data, I think we can do it automatically, although I couldn't quite figure out how to get the occurrences for a particular species in the GBIF database above - and of course, automatically importing data has it's own problems. Take this example: http://www.gbif.org/species/5961451 (Ancisturs dolichopterus). Which looks fine, except for the ONE point way south of all the others, which I'm pretty sure isn't a correct identification.

            Whether we do this automatically or manually, we probably want to do some sort of manual override, such that we can "hide" or "delete" data that we think is incorrect. And I think we should have a script that fetches and parses the data from GBIF rather than fetch on demand, as this will allow us to control the data better - most importanly by adding our own data without having to wait for someone to add it to the GBIF dataset first.

            It's trivial to mark these with a different marker than the "big dot" and "star" that we have so far - we probably should have a foot, ehm, legend to explain what the different symbols actually represent.

            --
            Mats

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 20 Dec 2015, 21:17
            by bekateen
            MatsP wrote:This should not be the way that coordinates are used for bodies of water. The marker should be in the middle of a body of water, not where the first species found in that river was caught...

            However, the point is still that we're not intended to precisely recording capture localities, but a general rough idea of the distribution of the fish.
            Yes, and this is part of my motivation. As I said before, I was unaware of the logic underlying the placement of pins along waterway, and I will certainly follow the "middle of the body of water" principle in the future. I don't want to confound bodies of water with actual specific localities. But I'd also like to add specific localities to better reflect the ranges of species. Overlaying GBIF data certainly goes most of the way in this regard, but when admins here are aware of additional specific localities, it would be nice to be able to add them as pins, without confounding the pins associated with general bodies of water.

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 21 Dec 2015, 00:31
            by MatsP
            Yes, so I would absolutely make those pins different in some way - maybe a + instead of * or dot in the marker - perhaps even a special one for "our own data".

            And I think the idea of importing those markers rather than asking GBIF for each one - if nothing else because I think it'll be much faster doing it our way - it's slow enough as it is when you view the map of Loricariidae or something.

            --
            Mats

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 27 Dec 2015, 09:14
            by bekateen
            Perhaps this is slightly askew of the OP, but it is still relevant to naming bodies of water on more precise scales: If one name applies to two separate bodies of water in different geographic areas, and the two bodies of water do not connect, what (in your opinion) is the best way to name these?

            Using a specific example, today I added the Rio Paru in Venezuela as a new body of water (with the Ventuari as its parent). But "Paru" already exists in the Bodies of Water database, in reference to the Paru in Brazil (which has the Lower Amazon River as its parent). AFAIK and as far as I could trace on maps, these two Parus do not connect with each other and are not related, so I named the new entry "Paru (Venezuela)." Is that okay? Would you prefer that I name it simply "Paru?" Or is there a third option to use?

            On one hand, the new entry is already identified in Venezuela because of the "Political Area" designation; so with that, perhaps adding the word "Venezuela" to the name of the body is redundant and thus unnecessary. However, it seems to me that if I were simply to name it "Paru," that would create potential confusion for future admins if there are multiple entries with the exact same name. Yes, obviously with careful attention to the "Parent" and "Political Area" of each entry, mistakes should be avoided; but it would seem to create just one more opportunity for mistakes. So what is your preference?

            Thanks, Eric

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 27 Dec 2015, 19:05
            by MatsP
            Paru (Venezuela) or Paru (Ventuari) would be the pattern used previously.

            --
            Mats

            Re: Occurrence data: better to have it and be (slightly) wrong?

            Posted: 27 Dec 2015, 19:38
            by bekateen
            So Paru (Venezuela) is okay. Then I'll leave its name as is. Thanks.

            Cheers, Eric