[NHCOLL-L:5547] Re: Crowdsourcing labels
Doug Yanega
dyanega at ucr.edu
Wed Jul 13 14:27:03 EDT 2011
Chris Norris wrote:
>Would you consider using crowdsourcing methods to transcribe
>handwritten specimen labels? Are you horrified by the idea? Do you
>have any idea what crowdsourcing is?
>
>Rob Guralnick from the University of Colorado is looking for
>feedback on this issue from the collections community. You can read
>more about it, and comment, by following this link:
>
>http://soyouthinkyoucandigitize.wordpress.com/2011/07/11/old-weathers-crowd-and-the-challenge-of-digitization/
Being in charge of a massive label transcription project, it may not
surprise you that I have a question that I see as crucial, but is not
discussed on the webpage linked, nor by Penny's follow-up regarding
ALA.
Specifically: a crowdsourcing project requires, as an absolute first
step, that each and every unit (in this case, a specimen plus its
labels) gets (1) assigned a unique identifying label if it does not
already have one, and (2) has its existing labels removed (generally
- this might be different with herbarium sheets), photographed, and
then placed back with the specimen.
The time, effort, and expense in getting just this first step done is
not trivial. It is so non-trivial, in fact, that I have to wonder
whether anyone has ever done an actual budgetary analysis that
compares the cost of taking all of those digital images (especially
the labor cost) with the alternative; namely, that instead of paying
a technician X amount per hour to handle specimens and photograph
labels, that technician is paid to handle specimens and simply
transcribe the labels.
Note that the crowdsourcing effort is not cost-free; there are
considerable expenses designing, creating and maintaining the
specialized infrastructure that supports it (*not* counting the
underlying database), right down to the need to pay someone to write
instructions for the volunteers to follow, and those expenses have no
parallel in a project that simply hires technicians to enter data
directly.
Accordingly, the comparative costs are not something simple and
straightforward to establish.
Let's say project X hires a technician to take photographs, and this
technician manages to process 100 specimens per hour. The cost of
this step *per specimen* is not simply 1/100th of their hourly
salary, but must also include the investment in the camera and
software used to take the photos and put them online. The next step,
serving those images to volunteers and having them transcribe the
label data, even if the volunteer labor is free, requires the
personnel to design, create and maintain it, as noted above (people
who may be making hourly salaries, in some cases). That also adds to
the cost per specimen.
Now, how does one compare that to project Y where a technician simply
sits down and types in label data, at a rate of 50 specimens per
hour? Which project is more cost-effective? It *might* be project Y,
since it has far fewer expenses. It might depend rather heavily on
the scale; a project involving only 50,000 specimens versus one that
involves 500,000 will give much worse payoffs for "up-front"
infrastructure investments, for example.
So, who, if anyone, has ever crunched the proper numbers to determine
which approach is more cost-effective?
Sincerely,
--
Doug Yanega Dept. of Entomology Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314 skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
http://cache.ucr.edu/~heraty/yanega.html
"There are some enterprises in which a careful disorderliness
is the true method" - Herman Melville, Moby Dick, Chap. 82
More information about the Nhcoll-l
mailing list