[NHCOLL-L:5554] Re: Cloud sourcing

Doug Yanega dyanega at ucr.edu
Thu Jul 14 12:34:46 EDT 2011


Paul Callomon wrote:

>Regarding transcriptions: during the 1990s we paid a team of people 
>to transcribe our hand-written ledgers (six shelf feet of them, 
>about 300,000 specimen records) into a database. Analyzing the 
>resulting data revealed persistent problems with word recognition. 
>That is, folks often incorrectly transcribed a word because being 
>from outside our world they could not make educated guesses based on 
>knowledge of the field. The data was very dirty indeed, and cleaning 
>it up is proving an almost never-ending task.
>Testing random sample errors, we find that reasonably proficient 
>taxonomists with a working knowledge of geography and foreign 
>languages would have guessed right almost every time. We couldn't 
>have paid them enough to do the job, however.

I had a postdoc in the early 1990's to transcribe label data (this 
was the beginning phase of the specimen database at the INHS), and - 
given that I'm still doing the same basic thing today (as project 
manager *and* data entry person, both) - would make two observations 
about the problems above:

(1) The georeferencing resources available at one's literal 
fingertips *today* were total pie-in-the-sky pipedreams back then: 
things like Google Earth, the USGS GNIS, and the Fuzzy Gazetteer were 
not in the toolkit, but if they had been, that error rate you mention 
would have been far, far lower. Still, "a working knowledge of 
geography" is essential regardless of the sophistication of one's 
tools; we have always made this a prime criterion for hiring a data 
entry technician.

(2) Having a taxonomist, per se, is - in my experience - only 
important when one does not trust the existing identifications on 
one's specimens, or when the label data involve many scientific names 
*other than* the organism being databased (whose name should be in 
the system already, and never, ever manually typed-in by a data entry 
person). Nonetheless, the idea that people with a professional 
background are too expensive to hire is no longer true; the job 
market is (sadly) so desperately thin that you can get piles of 
applications from people with PhDs for a $15/hour soft-money 
technician job such as this.

Sincerely,
-- 

Doug Yanega        Dept. of Entomology         Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314        skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
              http://cache.ucr.edu/~heraty/yanega.html
   "There are some enterprises in which a careful disorderliness
         is the true method" - Herman Melville, Moby Dick, Chap. 82


More information about the Nhcoll-l mailing list