[Nhcoll-l] Fwd: iDigBio Augmenting OCR October Workshop, February Hackathon Invitation

Karen Reeds karen.reeds at verizon.net
Wed Aug 22 16:17:57 EDT 2012


Dear Deb    8/22/2012

I hope you can record as much information as possible about 
collectors' affiliations, annotators of the label, and miscellaneous 
notes of economic/ethnographic/medical interest.

Digitized herbaria at the NY Botanical Garden, Natural History Museum 
(London), Academy of Natural Sciences (Philadelphia), Linnean Society 
of London, and Swedish Museum of Natural History have already proved 
invaluable to me as a historian of botany and exhibition curator.

Thank you all for the care and work that is going into these efforts!

Karen

cc: History of Natural History <HIST-NAT-HIST at JISCMAIL.AC.UK>

>Date: Wed, 22 Aug 2012 15:51:05 -0400
>From: Deb Paul <dpaul at fsu.edu>
>Subject: [Nhcoll-l] iDigBio Augmenting OCR October Workshop,
>	February Hackathon Invitation
>
>List-Post: <mailto:nhcoll-l at mailman.yale.edu>
>List-Subscribe: <http://mailman.yale.edu/mailman/listinfo/nhcoll-l>,
>  <mailto:nhcoll-l-request at mailman.yale.edu?subject=subscribe>
>List-Unsubscribe: <http://mailman.yale.edu/mailman/listinfo/nhcoll-l>,
>  <mailto:nhcoll-l-request at mailman.yale.edu?subject=unsubscribe>
>List-Archive: <http://mailman.yale.edu/pipermail/nhcoll-l>
>List-Help: <mailto:nhcoll-l-request at mailman.yale.edu?subject=help>
>List-Id: Natural History Collections Listserver <nhcoll-l.mailman.yale.edu>
>
>Augmented OCR Best Practices Workshop and Hack-a-thon Planning
>
>iDigBio (https://www.idigbio.org/) is running a workshop (October 
>1-2, 2012) and hack-a-thon (February 2013) to identify best 
>practices and develop tools to get information from museum labels 
>into computers.
>
>We are seeking individuals to participate in the "iDigBio Augmenting 
>OCR" workshop on October 1-2. The objective of the workshop is to 
>improve OCR output and subsequent manipulation by algorithms to 
>extract the content of biological collection specimen labels and 
>notes and have them efficiently and accurately inserted into a 
>database for future use.  Participants in the October workshop plan 
>to narrow the hack-a-thon focus down to specific programmatic goals 
>for software developers working at a hackathon to be held in 
>February of 2013.
>
>Most broadly there can be four main steps to digitization: create an 
>image, process the image to text using Optical Character Recognition 
>(OCR) and/or human typists, break the content of the text into 
>semantically useful fields such as family, scientific name, 
>collector, date collected, location, habitat, growth habit and other 
>fields and finally format this information for injection into a 
>database. The participants will help to identify and collect images 
>that are representative of those that will be needed by the biology 
>community. This collection of images will serve as the working set 
>for developers in the February Hack-a-thon.
>
>The October workshop participants plan to identify OCR output 
>products that will be useful for the community as well as metrics 
>that help evaluate how well different automation approaches produce 
>these products. This may include measures of accuracy of the OCR but 
>also accuracy of automated error correction, effectiveness of 
>breaking text into meaningful semantic units such as precision, 
>recall and F-Score. We seek biologists, programmers and others 
>involved in the digitization process to participate in this October 
>workshop to plan the February hack-a-thon and participate in the 
>hackathon itself.
>
>Anyone can view our wish list at
>http://tinyurl.com/OCRHackathonWishList
>of some possible goals we have for optimizing machine and natural 
>language processing algorithms used on OCR output from specimen 
>labels.
>
>If interested in participating and you would like to know more 
>please email asap to:
>Debbie Paul,dpaul at fsu.edu
>Deadline Thursday, August 30th to participate in the Oct 1 - 2 workshop.
>
>Looking forward to your participation,
>  From all of us in the iDigBio Augmenting OCR Working Group
>Please forward to other interested listserves - thanks!
>
>--
>Deborah Paul
>User Services, iDigBio
>Institute for Digital Information, iDigInfo
>Florida State University
>Tallahassee, Florida 32308
>850-644-6366
>
>
>_______________________________________________
>Nhcoll-l mailing list
>Nhcoll-l at mailman.yale.edu
>http://mailman.yale.edu/mailman/listinfo/nhcoll-l


-- 
Karen Reeds, PhD, FLS 	karen.reeds at verizon.net
  Princeton Research Forum, a community of independent scholars 
http://www.princetonresearchforum.org/

Guest Curator, Botanica Magnifica: Photographs by Jonathan Singer
Exhibition -- through August 26, 2012  LAST WEEK!
New Jersey State Museum, 205 West State Street, Trenton, NJ  609 292-6464
Tues-Sat 9-4:45 pm, Sunday 12-4. Closed Mondays and public holidays. 
Free admission!
http://www.state.nj.us/state/museum/dos_museum_exhibit-singer.html
http://njstatemuseum.blogspot.com/search?updated-max=2012-01-24T13:57:00-05:00&max-results=5
http://www.princetonmagazine.com/wordpress/?p=789
http://www.jonathan-singer-photography.com/

http://www.nytimes.com/2012/08/05/nyregion/botanica-magnifica-photographs-by-jonathan-singer-is-at-the-new-jersey-state-museum.html

Just in:
http://www.newsworks.org/index.php/new-jersey-more/item/43087-rare-flowers-exquisite-detail-figure-in-nj-photographers-work?Itemid=4





More information about the Nhcoll-l mailing list