[NHCOLL-L:1769] Re: GenBank

Una Smith una at lanl.gov
Fri Nov 22 14:07:44 EST 2002


On Fri, Nov 22, 2002 at 01:02:02PM -0500, Gregory Watkins-Colwell wrote:
>Today I was given a reprint from somebody who borrowed YPM specimens 
>recently and conducted a molecular analysis.  They deposited their 
>sequences at GenBank (at the requirement of the Journal they used for the 
>paper).  In the paper, they cite the GenBank "Accession Number" and not any 
>museum catalog numbers.  Nowhere in the paper is there an appendix or 
>anything indicating which sequences have voucher specimens or where those 
>voucher specimens might be.  Has this happened to anyone else before?  Does 
>GenBank keep any records that would link their "Accession Number" to the 
>actual catalog number for the voucher specimen?

Yes, it happens all the time.

Details of the sample source are normally included in the notes
section of each GenBank accession record.  If the authors did
not include such notes initially, you can ask them to submit an
update to the GenBank accession record, giving the provenance
of the sequences.  You can also ask them to give you this data,
for YPM's own records.


>Do most molecular-based journal rely entirely upon the GenBank number and 
>not the museum catalog numbers for such things?  This just seems wrong.

Yes.  All most journals require is a GenBank accession number;
details of provenance are supposed to be in the accession record.

GenBank accession records are normally embargoed for some time
after submission to GenBank.  This allows the authors of a paper
to include the required accession numbers in their manuscripts
without giving public access to the sequences in GenBank before
the manuscript is published.  During this embargo period, the
journal editors (or more importantly, the reviewers) cannot get
the accession records from GenBank.  So it isn't fair to expect
them to police this aspect of the author's work.

For what it's worth, I spend a lot of time annotating this kind 
of data for HIV.  See the search interface to the HIV Sequence
Database on http://hiv-web.lanl.gov/.  All our records come via
GenBank, then we get the original papers and read them, also get
any relevant papers cited (that don't contain sequence info but
report some details of the patients), run analyses, and often
write to the authors for clarification.  Good annotations are a
huge, tedious, endless, and often thankless task!  It is common
for multiple sequences to be obtained from a single sample, and
for multiple samples to be obtained from a single patient, yet
these linkages are usually not recorded in the GenBank records.
We get to know the literature pretty well, though, so can spot
these linkages and write to the authors for confirmation.  The
HIV sequence database lists over 70,000 sequences and thousands
of patients with multiple sequences.

	Una Smith

Los Alamos National Laboratory, MS K-710, Los Alamos, NM  87545


More information about the Nhcoll-l mailing list