[NHCOLL-L:5554] Re: Cloud sourcing
Doug Yanega
dyanega at ucr.edu
Thu Jul 14 12:34:46 EDT 2011
Paul Callomon wrote:
>Regarding transcriptions: during the 1990s we paid a team of people
>to transcribe our hand-written ledgers (six shelf feet of them,
>about 300,000 specimen records) into a database. Analyzing the
>resulting data revealed persistent problems with word recognition.
>That is, folks often incorrectly transcribed a word because being
>from outside our world they could not make educated guesses based on
>knowledge of the field. The data was very dirty indeed, and cleaning
>it up is proving an almost never-ending task.
>Testing random sample errors, we find that reasonably proficient
>taxonomists with a working knowledge of geography and foreign
>languages would have guessed right almost every time. We couldn't
>have paid them enough to do the job, however.
I had a postdoc in the early 1990's to transcribe label data (this
was the beginning phase of the specimen database at the INHS), and -
given that I'm still doing the same basic thing today (as project
manager *and* data entry person, both) - would make two observations
about the problems above:
(1) The georeferencing resources available at one's literal
fingertips *today* were total pie-in-the-sky pipedreams back then:
things like Google Earth, the USGS GNIS, and the Fuzzy Gazetteer were
not in the toolkit, but if they had been, that error rate you mention
would have been far, far lower. Still, "a working knowledge of
geography" is essential regardless of the sophistication of one's
tools; we have always made this a prime criterion for hiring a data
entry technician.
(2) Having a taxonomist, per se, is - in my experience - only
important when one does not trust the existing identifications on
one's specimens, or when the label data involve many scientific names
*other than* the organism being databased (whose name should be in
the system already, and never, ever manually typed-in by a data entry
person). Nonetheless, the idea that people with a professional
background are too expensive to hire is no longer true; the job
market is (sadly) so desperately thin that you can get piles of
applications from people with PhDs for a $15/hour soft-money
technician job such as this.
Sincerely,
--
Doug Yanega Dept. of Entomology Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314 skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
http://cache.ucr.edu/~heraty/yanega.html
"There are some enterprises in which a careful disorderliness
is the true method" - Herman Melville, Moby Dick, Chap. 82
More information about the Nhcoll-l
mailing list