[Nhcoll-l] verbatimLabel DarwinCore Field Addition

Derek Sikes dssikes at alaska.edu
Tue Apr 20 14:18:24 EDT 2021


Doug,

Excellent points. Regarding #2 this is why our loan form states:

NOTE: Data on the labels may not be correct/complete; The most accurate
data are available via
Arctos http://arctos.database.museum (or spreadsheet by request).

Although I wonder how many people actually read loan forms. I always offer
to check over the draft ms of anyone publishing using our specimens. I can
often find errors that the borrower made while transcribing our labels
(which many still do, despite the specimens already being databased and the
data easily available).

Regarding verbatim label data - I'm more in favor of it being preserved
than you are (but I do a poor job of actually preserving it, other than
relatively rare photos of labels). I think the risk of people being
confused by the verbatim being different than the parsed is not as great as
you fear and will diminish further in time as people become increasingly
used to digitized specimen data.

I'm an outlier I'm sure in thinking of the labels on specimens as little
more than 'worst-case scenario insurance against the loss of our digital
data'. For most of our specimens the data for a specimen in our database is
far more complete than what's on the labels (including photos of habitat,
trap methods, links to publications that used the specimen, links to DNA
sequences, remarks about the condition of the specimen, identification
remarks, links to the keys used to ID the specimens, etc.).

And even worse... how many type localities are wrong?

-Derek

On Tue, Apr 20, 2021 at 9:39 AM Douglas Yanega <dyanega at gmail.com> wrote:

> I'm ambivalent regarding verbatim label data, because it can be extremely
> helpful in some cases, and extremely damaging in others.
>
> Some of you may recall my having given talks, or unhappy comments at
> meetings, regarding the empirical data on error rates on original labels of
> insect specimens. It's pretty disheartening; across tens of thousands of
> specimens in roughly 10 major entomological museums assayed, somewhere
> between 15-20% of all original labels had data omissions or errors
> requiring correction prior to georeferencing. While a fair percentage of
> these are omissions that are easily fixed, or obvious typos, roughly half
> either cannot be fixed (e.g., a place name that occurs in more than one
> county, like "Sulphur Springs, Arkansas"), or are errors that MUST be fixed
> but are not immediately obvious.
>
> Such statements have been known to provoke people to roll their eyes at
> me, thinking that I overstate the problem, but it's a genuine issue, and
> includes lines of evidence that aren't immediately obvious, such as
> comparing labels produced by different people who were collecting together.
> Just as a "tip-of-the-iceberg" example, consider these data labels,
> produced by six professional researchers from several high-profile
> entomology museums on an NSF-funded field trip to Mexico:
>
> Chihuahua, 72 km NE Chihuahua, El Carrion, 27-VIII-91
> Chihuahua, El Corrion, 72 km NE Chihuahua, 27-VIII-91
> Chihuahua, El Morrion, 67 km NW Chihuahua, 27-VIII-91, 1200 m
> Chihuahua, 67 km N El Morrion, 27-VIII-91
> Chihuahua, 67 km N El Morrion, 27-III-91
> Chihuahua, 74 km NE Chihuahua, 27-VIII-91
>
> These labels all refer to the exact same collecting event, yet you'll note
> that no two are the same. You'll also note that *in the absence of the
> comparison*, none of them has an obvious error.
>
> Worse still, *they are all wrong*. The actual data for this particular
> collecting event are
>
> Chihuahua, El Morrion, 67 km NE Chihuahua, 27-VIII-91, 1200 m
>
> As such, the six labels produced had (1) and (2) the wrong mileage *and*
> the wrong place name (3) the wrong cardinal direction (4) the wrong
> reference point (5) the wrong reference point and the wrong month, and (6)
> the wrong mileage. Note also that the georeferences generated for these six
> labels result in two points that are 67 km from the actual location, and
> one over 100 km off.
>
> When you look specifically for examples like this, with multiple
> collectors' data used side-by-side to evaluate label accuracy, it's
> frightening how poorly people do. It also means that treating verbatim
> label data as *inherently trustworthy* is a serious mistake. As data
> suppliers and consumers, we need to be far more critical. Label data
> underlies so much of people's research, and if we supply or use bad data,
> that undermines the quality of the resulting research.
>
> The question is whether we are better off displaying the verbatim data, or
> not, and to me that depends on whether serious quality control has or has
> not *already been exercised*.
>
> My points are these:
>
> (1) If the process of data capture is limited to entering verbatim label
> data and then simply parsing it out into other fields, it is much less
> likely that the data capture person is going to notice those labels that
> are in that roughly 10% where the data are wrong but it isn't obvious. If
> the process of data capture only uses verbatim data as the starting point,
> however, then the person trying to make sense of a label by georeferencing
> it themselves is relatively more likely to view it critically, and catch
> any errors.
>
> (2) If we assume for the moment that you have done the right thing, and
> fixed an error, how are users of your data going to know which version of
> the data they should trust? If a specimen has verbatim data listing a
> country or state or county or mileage or direction that is *not the same
> as the parsed data*, is that not going to confuse them, if they notice
> the discrepancy?
>
> (3) My overall feeling is that including verbatim data is only genuinely
> beneficial to users if quality control has NOT been applied, AND if
> external users have a reliable way to communicate with the data providers
> to *report an error and get it fixed*. In other words, having *bad*
> verbatim data made visible makes it more likely that external users will
> find errors. If quality control HAS been applied, and the data are clean,
> then the discrepancy between verbatim and parsed data only stands to
> confuse external users. Given that the specimens will have a GUID label,
> any discrepancy between what the data labels say and what the parsed data
> say won't be a problem, because the data labels are not what you'll refer
> to when tracking a specimen down.
>
> It's a complex issue.
>
> --
> Doug Yanega      Dept. of Entomology       Entomology Research Museum
> Univ. of California, Riverside, CA 92521-0314     skype: dyanega
> phone: (951) 827-4315 (disclaimer: opinions are mine, not UCR's)
>              https://faculty.ucr.edu/~heraty/yanega.html
>   "There are some enterprises in which a careful disorderliness
>         is the true method" - Herman Melville, Moby Dick, Chap. 82
>
> _______________________________________________
> Nhcoll-l mailing list
> Nhcoll-l at mailman.yale.edu
> https://mailman.yale.edu/mailman/listinfo/nhcoll-l
>
> _______________________________________________
> NHCOLL-L is brought to you by the Society for the Preservation of
> Natural History Collections (SPNHC), an international society whose
> mission is to improve the preservation, conservation and management of
> natural history collections to ensure their continuing value to
> society. See http://www.spnhc.org for membership information.
> Advertising on NH-COLL-L is inappropriate.
>


-- 

+++++++++++++++++++++++++++++++++++
*Derek S. Sikes*, Curator of Insects, Professor of Entomology
University of Alaska Museum (UAM)
University of Alaska Fairbanks
1962 Yukon Drive, Fairbanks, AK   99775-6960
dssikes at alaska.edu phone: 907-474-6278
he/him/his
University of Alaska Museum <https://www.uaf.edu/museum/collections/ento/>
-  search 357,704 digitized arthropod records
<http://arctos.database.museum/uam_ento>
+++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological
Society and / or sign up for the email listserv "Alaska Entomological
Network" at
http://www.akentsoc.org/contact_us
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.yale.edu/pipermail/nhcoll-l/attachments/20210420/5b0ef769/attachment.html>


More information about the Nhcoll-l mailing list