[NHCOLL-L:5550] Re: Crowdsourcing labels
Penny Berents
Penny.Berents at austmus.gov.au
Wed Jul 13 20:22:10 EDT 2011
Hi Doug,
At the AM we are attempting to do exactly the analysis that you describe. We are running a pilot project based on our cicada collection and with support from the ALA. We have a team of trained volunteers who take images of the specimens and labels and assign a unique identifier (registration number).
We have had assistance from the ALA to employ volunteer coordinators and to develop the web portal.
The assessment at the end of the pilot will examine the real costs of the project and the uptake by 'virtual volunteers' to transcribe labels.
Cheers
Penny
Dr Penny Berents
Head of Natural Science Collections
Australian Museum
6 College Street Sydney NSW 2010 Australia
t 61 2 9320 6134 f 61 2 9320 6210
www.australianmuseum.net.au
Inspiring the exploration of nature and cultures
-----Original Message-----
From: owner-nhcoll-l at lists.yale.edu [mailto:owner-nhcoll-l at lists.yale.edu] On Behalf Of Doug Yanega
Sent: Thursday, 14 July 2011 4:27 AM
To: christopher.norris at yale.edu
Cc: NHCOLL-L at lists.yale.edu
Subject: [NHCOLL-L:5547] Re: Crowdsourcing labels
Chris Norris wrote:
>Would you consider using crowdsourcing methods to transcribe
>handwritten specimen labels? Are you horrified by the idea? Do you
>have any idea what crowdsourcing is?
>
>Rob Guralnick from the University of Colorado is looking for
>feedback on this issue from the collections community. You can read
>more about it, and comment, by following this link:
>
>http://soyouthinkyoucandigitize.wordpress.com/2011/07/11/old-weathers-crowd-and-the-challenge-of-digitization/
Being in charge of a massive label transcription project, it may not
surprise you that I have a question that I see as crucial, but is not
discussed on the webpage linked, nor by Penny's follow-up regarding
ALA.
Specifically: a crowdsourcing project requires, as an absolute first
step, that each and every unit (in this case, a specimen plus its
labels) gets (1) assigned a unique identifying label if it does not
already have one, and (2) has its existing labels removed (generally
- this might be different with herbarium sheets), photographed, and
then placed back with the specimen.
The time, effort, and expense in getting just this first step done is
not trivial. It is so non-trivial, in fact, that I have to wonder
whether anyone has ever done an actual budgetary analysis that
compares the cost of taking all of those digital images (especially
the labor cost) with the alternative; namely, that instead of paying
a technician X amount per hour to handle specimens and photograph
labels, that technician is paid to handle specimens and simply
transcribe the labels.
Note that the crowdsourcing effort is not cost-free; there are
considerable expenses designing, creating and maintaining the
specialized infrastructure that supports it (*not* counting the
underlying database), right down to the need to pay someone to write
instructions for the volunteers to follow, and those expenses have no
parallel in a project that simply hires technicians to enter data
directly.
Accordingly, the comparative costs are not something simple and
straightforward to establish.
Let's say project X hires a technician to take photographs, and this
technician manages to process 100 specimens per hour. The cost of
this step *per specimen* is not simply 1/100th of their hourly
salary, but must also include the investment in the camera and
software used to take the photos and put them online. The next step,
serving those images to volunteers and having them transcribe the
label data, even if the volunteer labor is free, requires the
personnel to design, create and maintain it, as noted above (people
who may be making hourly salaries, in some cases). That also adds to
the cost per specimen.
Now, how does one compare that to project Y where a technician simply
sits down and types in label data, at a rate of 50 specimens per
hour? Which project is more cost-effective? It *might* be project Y,
since it has far fewer expenses. It might depend rather heavily on
the scale; a project involving only 50,000 specimens versus one that
involves 500,000 will give much worse payoffs for "up-front"
infrastructure investments, for example.
So, who, if anyone, has ever crunched the proper numbers to determine
which approach is more cost-effective?
Sincerely,
--
Doug Yanega Dept. of Entomology Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314 skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
http://cache.ucr.edu/~heraty/yanega.html
"There are some enterprises in which a careful disorderliness
is the true method" - Herman Melville, Moby Dick, Chap. 82
#####################################################################################
This e-mail message has been scanned for Viruses and Content and cleared
by MailMarshal
#####################################################################################
Rituals of Seduction: Birds of Paradise
Are we more alike than you think?
Exhibition 9 April 7 August 2011
The Australian Museum.
The views in this email are those of the user and do not necessarily reflect the views of the Australian Museum. The information contained in this email message and any accompanying files is or may be confidential and is for the intended recipient only. If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this email or any attached files is unauthorised. If you are not the intended recipient, please delete it and notify the sender. The Australian Museum does not guarantee the accuracy of any information contained in this e-mail or attached files. As Internet communications are not secure, the Australian Museum does not accept legal responsibility for the contents of this message or attached files.
Please consider the environment before printing this email.
More information about the Nhcoll-l
mailing list