<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    What: Visualize Your Text Data Using OCR Output<o:p></o:p><br>

    Why: Fast access to your data, reveal the unexpected<o:p></o:p><br>

    When: Wednesday 10 AM EST 22 January 2014<o:p></o:p><br>

    Where: <a class="moz-txt-link-freetext"

      href="http://idigbio.adobeconnect.com/augmentocr">http://idigbio.adobeconnect.com/augmentocr</a><o:p></o:p><br>

    Who: All Are Welcome!<o:p></o:p><br>

    <p class="MsoNormal">Note: Headsets recommended for best experience

      with <a

        href="https://www.idigbio.org/wiki/index.php/Web_Conferencing">AdobeConnect</a><o:p></o:p>

      and please log in 15 minutes early if it is your first experience

      with AdobeConnect<o:p></o:p> </p>

    <p class="MsoNormal">Twitter: @iDigBio #citscribe #ocrviz<o:p></o:p></p>

    <p class="MsoNormal"><o:p>&nbsp;</o:p>See your data in a whole new way!

      Museum specimen labels, note cards, field notebooks, ledgers and

      other primary source materials are being imaged in many

      digitization projects. Other projects plan to OCR their materials

      or have questions about what they can do with the output.<br>

    </p>

    <p class="MsoNormal">OCR text output from these sources opens a

      window to your data, <i style="mso-bidi-font-style:normal">before</i>

      the data elements are entered into the database fields. It gives

      you unprecedented, fast access to your data, revealing insights to

      facilitate research, data validation, and public participation in

      science. Come see a demonstration of how you might do this with

      OCR output. As part of the recent <a

        href="https://www.idigbio.org/content/citscribe-hackathon">iDigBio

        CITScribe Hackathon</a>, <a

href="https://www.facebook.com/photo.php?fbid=645283398848943&amp;l=bbbcf70f3b">the

LlLl

        team</a> demonstrated one technique to do this visualization

      with <a href="http://search.carrot2.org/stable/search">Carrot<sup>2</sup></a>

      and <a href="https://developers.google.com/chart/">Google charts</a>

      using OCR output <a

href="http://www.techopedia.com/definition/1210/index-idx-database-systems">indexed</a>

      by <a href="http://lucene.apache.org/solr/">Apache Solr</a> and <span

        style="font-size:11pt;font-family:Calibri,sans-serif">highlighting

        OCR errors using n-gram, a probabilistic model for estimating

        likelihood of a string being a good word.</span>&nbsp; <span

        style="mso-spacerun:yes"></span>Find what you want, fast and <i

        style="mso-bidi-font-style:normal">discover</i> unexpected

      informative search terms. The same approach can be used to guide

      what needs to be validated using crowdsourcing outputs, on a per

      field basis. All are welcome.<br>

      <br>

      See you there! Yes, please share the link, spread the word, and

      yes, it will be recorded.<br>

      Andrea M, Jason B, Miao C, Sylvia O, Reed B, William U and @idbdeb

      from the @idigbio #citscribe <a

href="https://www.facebook.com/photo.php?fbid=645283398848943&amp;l=bbbcf70f3b">LlLl

        Team</a>, et al from the iDigBio CITScribe Hackathon and iDigBio<br>

      <br>

      NB. Work inspired by a Biodiversity Information Standards (TDWG)

      2013 talk <a

href="http://www.tdwg.org/fileadmin/2013conference/slides/Drinkwater_OCRforHerbaria.pptx">The

        use of OCR in the digitisation of herbarium specimens</a>. Robyn

      Drinkwater, Robert Cubey, and Elspeth Haston, RBGE. </p>

    <o:p></o:p>keywords: OCR ML NLP SOLR GoogleCharts CARROT2

    <pre class="moz-signature" cols="72">-- 

Upcoming iDigBio Events <a class="moz-txt-link-freetext" href="https://www.idigbio.org/outreach-events-sidebar">https://www.idigbio.org/outreach-events-sidebar</a>

--Deborah Paul

iDigBio Technology Specialist

Institute for Digital Information, 234 LSB

Florida State University

Tallahassee, Florida 32306

850-644-6366</pre>

  </body>

</html>