[NHCOLL-L:5550] Re: Crowdsourcing labels

Penny Berents Penny.Berents at austmus.gov.au
Wed Jul 13 20:22:10 EDT 2011


Hi Doug,
At the AM we are attempting to do exactly the analysis that you describe. We are running a pilot project based on our cicada collection and with support from the ALA. We have a team of trained volunteers who take images of the specimens and labels and assign a unique identifier (registration number). 
We have had assistance from the ALA to employ volunteer coordinators and to develop the web portal. 
The assessment at the end of the pilot will examine the real costs of the project and the uptake by 'virtual volunteers' to transcribe labels.
Cheers
Penny

Dr Penny Berents
Head of Natural Science Collections

Australian Museum
6 College Street Sydney NSW 2010 Australia 
t 61 2 9320 6134   f 61 2 9320 6210
www.australianmuseum.net.au

Inspiring the exploration of nature and cultures




-----Original Message-----
From: owner-nhcoll-l at lists.yale.edu [mailto:owner-nhcoll-l at lists.yale.edu] On Behalf Of Doug Yanega
Sent: Thursday, 14 July 2011 4:27 AM
To: christopher.norris at yale.edu
Cc: NHCOLL-L at lists.yale.edu
Subject: [NHCOLL-L:5547] Re: Crowdsourcing labels

Chris Norris wrote:

>Would you consider using crowdsourcing methods to transcribe 
>handwritten specimen labels? Are you horrified by the idea? Do you 
>have any idea what crowdsourcing is?
>
>Rob Guralnick from the University of Colorado is looking for 
>feedback on this issue from the collections community. You can read 
>more about it, and comment, by following this link:
>
>http://soyouthinkyoucandigitize.wordpress.com/2011/07/11/old-weathers-crowd-and-the-challenge-of-digitization/

Being in charge of a massive label transcription project, it may not 
surprise you that I have a question that I see as crucial, but is not 
discussed on the webpage linked, nor by Penny's follow-up regarding 
ALA.

Specifically: a crowdsourcing project requires, as an absolute first 
step, that each and every unit (in this case, a specimen plus its 
labels) gets (1) assigned a unique identifying label if it does not 
already have one, and (2) has its existing labels removed (generally 
- this might be different with herbarium sheets), photographed, and 
then placed back with the specimen.

The time, effort, and expense in getting just this first step done is 
not trivial. It is so non-trivial, in fact, that I have to wonder 
whether anyone has ever done an actual budgetary analysis that 
compares the cost of taking all of those digital images (especially 
the labor cost) with the alternative; namely, that instead of paying 
a technician X amount per hour to handle specimens and photograph 
labels, that technician is paid to handle specimens and simply 
transcribe the labels.

Note that the crowdsourcing effort is not cost-free; there are 
considerable expenses designing, creating and maintaining the 
specialized infrastructure that supports it (*not* counting the 
underlying database), right down to the need to pay someone to write 
instructions for the volunteers to follow, and those expenses have no 
parallel in a project that simply hires technicians to enter data 
directly.

Accordingly, the comparative costs are not something simple and 
straightforward to establish.

Let's say project X hires a technician to take photographs, and this 
technician manages to process 100 specimens per hour. The cost of 
this step *per specimen* is not simply 1/100th of their hourly 
salary, but must also include the investment in the camera and 
software used to take the photos and put them online. The next step, 
serving those images to volunteers and having them transcribe the 
label data, even if the volunteer labor is free, requires the 
personnel to design, create and maintain it, as noted above (people 
who may be making hourly salaries, in some cases). That also adds to 
the cost per specimen.

Now, how does one compare that to project Y where a technician simply 
sits down and types in label data, at a rate of 50 specimens per 
hour? Which project is more cost-effective? It *might* be project Y, 
since it has far fewer expenses. It might depend rather heavily on 
the scale; a project involving only 50,000 specimens versus one that 
involves 500,000 will give much worse payoffs for "up-front" 
infrastructure investments, for example.

So, who, if anyone, has ever crunched the proper numbers to determine 
which approach is more cost-effective?

Sincerely,
-- 

Doug Yanega        Dept. of Entomology         Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314        skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
              http://cache.ucr.edu/~heraty/yanega.html
   "There are some enterprises in which a careful disorderliness
         is the true method" - Herman Melville, Moby Dick, Chap. 82

#####################################################################################
This e-mail message has been scanned for Viruses and Content and cleared 
by MailMarshal
#####################################################################################

Rituals of Seduction: Birds of Paradise
Are we more alike than you think?
Exhibition 9 April – 7 August 2011



The Australian Museum.


The views in this email are those of the user and do not necessarily reflect the views of the Australian Museum. The information contained in this email message and any accompanying files is or may be confidential and is for the intended recipient only. If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this email or any attached files is unauthorised. If you are not the intended recipient, please delete it and notify the sender. The Australian Museum does not guarantee the accuracy of any information contained in this e-mail or attached files. As Internet communications are not secure, the Australian Museum does not accept legal responsibility for the contents of this message or attached files.

Please consider the environment before printing this email.


More information about the Nhcoll-l mailing list