[NHCOLL-L:2032] Fwd: Fwd: Genbank

John Deck jdeck at socrates.Berkeley.EDU
Mon Aug 11 17:43:36 EDT 2003


While specimen_voucher is a likely field to store specimen voucher
information in GenBank it is not always used and when it is used, it often
references specimens in non-standard ways.  For instance, it may 1)
reference a frozen tissue collection number but not the standard
catalognumber which is the more common way of discovering specimen metadata
2) there is not usually a reference to collection, even though catalog
numbers are not often unique across collections  3) sometimes users insert
specimen voucher data but put it in the wrong field 4) many other types of
data inconsistencies too numerous to list here...  

My reccomendation is to adopt the darwin core standard
(http://tsadev.speciesanalyst.net/documentation/ow.asp?DarwinCoreV2) for
identifying specimens within GenBank (in place of the specimen_voucher
field).  It is fairly simple and is only three fields to uniquely identify a
specimen:

InstitutionCode
CollectionCode
CatalogNumber

Though still not a perfect solution, having these three fields available for
data input on the GenBank side should go a long way in cleaning up some of
the many data inconsistencies I have found in perusing accessions from many
different types of collections.

Finally, I've developed an online application to assist anyone to make a
link between their specimen database and prior GenBank submissions
(http://bnhmdev.berkeley.edu/genbank/index.php).  This is a temporary
application (*hopefully*) in that the proper data should be stored at
GenBank and museums and not in a separate database.  It is, however, useful
for cleaning up years of disjointed data and assisting museums to collect
data matching their institution code.

John Deck
Berkeley Natural History Museums Informatics Coordinator
(510) 643-3191



More information about the Nhcoll-l mailing list