[NHCOLL-L:5552] Cloud sourcing

Paul Callomon callomon at ansp.org
Wed Jul 13 22:08:40 EDT 2011


Regarding transcriptions: during the 1990s we paid a team of people to transcribe our hand-written ledgers (six shelf feet of them, about 300,000 specimen records) into a database. Analyzing the resulting data revealed persistent problems with word recognition. That is, folks often incorrectly transcribed a word because being from outside our world they could not make educated guesses based on knowledge of the field. The data was very dirty indeed, and cleaning it up is proving an almost never-ending task.
Testing random sample errors, we find that reasonably proficient taxonomists with a working knowledge of geography and foreign languages would have guessed right almost every time. We couldn't have paid them enough to do the job, however.

Paul Callomon
Academy of Natural Sciences
Philadelphia PA


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.yale.edu/mailman/private/nhcoll-l/attachments/20110714/cf5908aa/attachment.html 


More information about the Nhcoll-l mailing list