[Nhcoll-l] Digital objects vs. physical objects in collection management databases and how to manage them

Tue Aug 25 19:02:49 EDT 2020

Speaking from a perspective coming from managing diverse marine
invertebrates: we would consider any digital products as being derivatives
(and hence "filed with" or linked to) the original specimen. Linking an
image or a CT scan back to a specimen (to be precise: to a specimen record
distinguished by a unique specimenID) would be treated just like linking a
molecular sequencing extract back to a specimen record.

The workflow that often goes with marine invertebrates has led us away from
the "prep" concept (which seems to come naturally for vertebrate
collections where there are a roughly fixed number of kinds of preps for
specimens) to a parent/child concept. A "specimen" can be an unsorted lot
(multi-phyletic), or a jar of individuals assumed to be the same species,
or one critter in a jar. The moment an individual (or sub-lot) is removed
or distinguished in some way, it gets a new specimenID that is linked as a
"child" of the "parent" specimen record. That process can continue (all the
worms from an unsorted lot could be given their own specimenID as a lot,
and then individuals pulled from the worm jar could each get their own
specimenID when they're isolated for sequencing). The idea is that anything
you will want to refer to individually later (in a publication, etc.)
should get its own specimenID.

Now, having said that, and thinking about images, CT scans, and molecular
extracts, I hypothesize that I think about those in a way very similar to
the way vertebrate people think about "preps". An image never gets its own
specimenID, but always points to a specimenID (yes, there's an imageID for
the image record, but that's just the identifier for the image record and
does not pretend to be anything else). The same is true for molecular
extracts, and would be for CT scans if we had any. Each of those derived
products (images, extracts, CT scans) would probably get its own table in a
database, since there are properties unique to it that are irrelevant to
the others. But they exist as derivatives of the specimen identified by its
specimenID (and use that specimenID to maintain the linkage). It seems to
me that this is analogous to the "prep" idea, which has a defined set of
physical derivatives that can come from a specimen, each one of which may
have its own properties. Some of these derivatives (images, CT scans) are
electronic-only, and some are physical (molecular extract), but they're
treated similarly as defined derivatives of specimens.

A field we maintain in the specimen table is "specimen exists (yes/no)". We
have situations where there are derivative products but no physical
specimen remains in the collection: when the specimen was consumed for
molecular analysis, for example. But that same system would accommodate a
situation where there never was a physical specimen, but there only ever
was an image, for example. We're mostly specimen-based (not so much
observation-based), so in practice almost all the situations like that
existing in our case are situations where we started with a physical
specimen.

-Dean
-- 
Dean Pentcheff
pentcheff at gmail.com
pentcheff at nhm.org <dpentche at nhm.org>
https://research.nhm.org/disco

On Tue, Aug 25, 2020 at 2:30 PM Bentley, Andrew Charles <abentley at ku.edu>
wrote:

> Zach
>
>
>
> Thanks for this articulation of your process.  With regard to the issues
> you raise (which are similar to issues raised in other replies I have
> received):
>
>
>
>    1. Space – not sure this is a valid reason for using preps given that
>    there are other mechanisms of tagging individuals in a lot rather than
>    separating them out as you indicate – wrapping in cheesecloth or tagging.
>    Also, imaged individuals can usually be discerned simply by visual
>    inspection of distinguishing characteristics – fin damage, body shape,
>    size, etc.
>    2. Searching - not sure this is a valid reason either given that it is
>    just as easy to search a database for those records that have attachments
>    as searching for preps.  You could add metadata about the attachment that
>    could facilitate searching for various kinds of attachments in the same
>    manner.
>    3. Data integration – These attachments are still published to the
>    aggregators as associated with the occurrence record through extensions or
>    otherwise, even though they are not preps.  See this example from my
>    collection where Genbank sequences, images and citations are all published
>    as part of the record of this tissue -
>    https://www.gbif.org/occurrence/656980275.  CT scans would similarly
>    be included as linkages to Morphosource.
>
>
>
> I am still stuck thinking that an image or a CT scan is simply a digital
> representation of a specimen and not a prep in the traditional sense but
> maybe I am thinking too narrowly.  I have yet to see a compelling argument
> for preps.  For instance, if you were to scan a publication or field
> notebook, would this represent a separate “prep” of the publication or
> field note page or is it simply a digital representation of the same
> thing?  Is the distinction that more information can be gleaned from a CT
> scan than can be gleaned from the specimen itself without dissection?  Is
> that true of an image?  What more information is available besides
> coloration from an image taken while alive or shortly after euthanizing?  I
> am still worried by the possible confusion with collection stats and
> digital representations being counted as specimens.  I am also worried
> about the process of publishing data to aggregators where currently digital
> media are published as part of an Audubon core extension and not as
> occurrences (which they would be as preps).
>
>
>
> Still mulling this over in my brain but it would be great if we had some
> community consensus as to how to treat these things – which there currently
> is not given the replies I have been receiving.  I will admit that some of
> my thinking is driven by the Specify data model that we use for our
> collections.  I would be interested in hearing how other CMS’s deal with
> these or is it similarly all over the map.
>
>
>
> Andy
>
>
>
>      A  :                A  :               A  :
>
>  }<(((_°>.,.,.,.}<(((_°>.,.,.,.}<)))_°>
>
>      V                   V                  V
>
> Andy Bentley
>
> Ichthyology Collection Manager
>
> University of Kansas
>
> Biodiversity Institute
>
> Dyche Hall
>
> 1345 Jayhawk Boulevard
>
> Lawrence, KS, 66045-7561
>
> USA
>
>
>
> Tel: (785) 864-3863
>
> Fax: (785) 864-5335
>
> Email: abentley at ku.edu
>
> http://ichthyology.biodiversity.ku.edu
>
>
>
>      A  :                A  :                A  :
>
>  }<(((_°>.,.,.,.}<(((_°>.,.,.,.}<)))_°>
>
>      V                   V                   V
>
>
>
>
>
> *From: *"zrandall at flmnh.ufl.edu" <zrandall at flmnh.ufl.edu>
> *Date: *Tuesday, August 25, 2020 at 3:22 PM
> *To: *"SchindelD at si.edu" <schindeld at si.edu>, Andrew Bentley <
> abentley at ku.edu>, "nhcoll-l at mailman.yale.edu" <nhcoll-l at mailman.yale.edu>
> *Subject: *RE: Digital objects vs. physical objects in collection
> management databases and how to manage them
>
>
>
> Hi Andy,
>
>
>
> This is a great topic. We produce a large amount of 2D images (live and
> preserved) and CT data for our fish collection here at UF. We treat these
> data as prep types for a collection object. A major reason for this
> approach and not separating lots is to conserve collection space. Given the
> rate that we are imaging our specimens, we wouldn’t be able to also support
> our future growth of newly acquired collections. Additionally, we see the
> value of having image data as prep types so that the collection object can
> be the one stop for all “metadata” including from other individuals from
> the same lot. For example, although we try to CT scan individuals from
> tissued lots to increase data value, we usually don’t scan the individuals
> that were tissued because it would be a loss in morphological data captured
> (e.g. missing fins, epaxial tissue, etc.).  One catalog number to rule them
> all. Guess we’re a bunch of lumpers at heart.
>
>
>
> Our system for tracking down the imaged individual in a lot is still being
> improved (luckily we rarely get those types of requests). When possible, we
> image lots with only one individual. If we image lots with several
> individuals, then we wrap that individual with cheese cloth and/or include
> tag(s).
>
>
>
> Adding these media as prep types in Specify allows us to query the number
> of multimedia that we have, similarly to tissues. This number only include
> files existing in Specify, since derivatives and raw data are a whole other
> can of worms.
>
>
>
>
>
> Best,
>
> Zach
>
> --
>
> Zachary S. Randall
>
> Biological Scientist & Imaging Lab Manager
>
> Florida Museum of Natural History
>
> 1659 Museum Road
>
> Gainesville, FL 32611-7800
>
> 352-273-1958|Rm. 277
>
>
>
> www.zacharyrandall.org
>
> Twitter: @Zach__Randall
>
>
>
>
>
>
>
> *From:* Nhcoll-l <nhcoll-l-bounces at mailman.yale.edu> * On Behalf Of *Schindel,
> David
> *Sent:* Tuesday, August 25, 2020 11:09 AM
> *To:* Bentley, Andrew Charles <abentley at ku.edu>; nhcoll-l at mailman.yale.edu
> *Subject:* Re: [Nhcoll-l] Digital objects vs. physical objects in
> collection management databases and how to manage them
>
>
>
> *[External Email]*
>
> Hi, Andy,
>
>
>
> We've had discussions about this in the Interagency Working Group on
> Scientific  Collections (IWGSC; see usfsc.nal.usda.gov).  If the digital
> representations of an object are not published, they would be archival
> material directly related to the specimen, and therefore part of the
> collection.  They would be equivalent to field notes, locality maps, audio
> and video recordings, etc.  lf they are submitted to a public database or
> other open access data repository (GenBank, CTBase, etc.) then these are
> publication events that can (and in a perfect world, would be) linked to
> the specimen record along with scholarly publications in which the specimen
> is cited.
>
>
>
> In both cases, a comprehensively curated system of specimen digitization
> would allow users to discover and navigate to all these assets.
>
>
>
> Best regards and stay well -
>
>
>
> David
>
>
>
> David E. Schindel, Research Associate
>
> Office of the Provost
>
> Smithsonian Institution
>
> Email: schindeld at si.edu
>
>
> ------------------------------
>
> *From:* Nhcoll-l <nhcoll-l-bounces at mailman.yale.edu> on behalf of
> Bentley, Andrew Charles <abentley at ku.edu>
> *Sent:* Monday, August 24, 2020 4:16 PM
> *To:* nhcoll-l at mailman.yale.edu <nhcoll-l at mailman.yale.edu>
> *Subject:* [Nhcoll-l] Digital objects vs. physical objects in collection
> management databases and how to manage them
>
>
>
> *External Email - Exercise Caution*
>
> Hi all
>
>
>
> I am trying to resolve a philosophical conundrum brought on by the
> ever-increasing mountain of digital data being produces from and associated
> with natural history collections.  My question is whether digital
> representations of an object (images, CT scans, etc.) should be treated as
> preparations of an object in a collections database similar to other
> physical preparations or treated differently?  For instance, in a fish
> collection like mine, you have a lot that has a certain number of
> specimens.  Some of those may be subsequently cleared and stained or have
> skeletons prepared.  These are traditionally handled as preparations of the
> original lot with the same catalog number (although in some collections
> they are treated as separate catalog numbers).  Now, however, you have
> digital representations of those physical objects such as images, CT scans,
> etc.  Should these also be treated as preparations or be treated
> differently - as digital products or linked as attachments to the physical
> objects?  To me, they are not physical objects but digital representations
> of the original object.  As such, they are somewhat different to a
> preparation.  This has implications when totaling traditional counts of
> objects in a collection as well as when publishing data from a collection
> to the aggregator community.  In some instances, this may be governed by
> the data model and business rules of the CMS you are using or by your
> personal preference.
>
>
>
> I would be interested in hearing your views on this and how you handle
> this in your collection as I am not sure there is any community consensus
> as to which way to handle these.  I have heard of both methods being used
> in various collections.
>
>
>
> Thanks in advance
>
>
>
> Andy
>
>
>
>      A  :                A  :               A  :
>
>  }<(((_°>.,.,.,.}<(((_°>.,.,.,.}<)))_°>
>
>      V                   V                  V
>
> Andy Bentley
>
> Ichthyology Collection Manager
>
> University of Kansas
>
> Biodiversity Institute
>
> Dyche Hall
>
> 1345 Jayhawk Boulevard
>
> Lawrence, KS, 66045-7561
>
> USA
>
>
>
> Tel: (785) 864-3863
>
> Fax: (785) 864-5335
>
> Email: abentley at ku.edu
>
> http://ichthyology.biodiversity.ku.edu
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam02.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fichthyology.biodiversity.ku.edu-252F-26data-3D02-257C01-257Cschindeld-2540si.edu-257C05bb69872c7b4aa324d208d8486a936a-257C989b5e2a14e44efe93b78cdd5fc5d11c-257C0-257C0-257C637338969870562799-26sdata-3D05wkr90YzamCsPVnBhhcDKJXZOvQO4CCkun82aj8z3s-253D-26reserved-3D0%26d%3DDwMFAw%26c%3DsJ6xIWYx-zLMB3EPkvcnVg%26r%3DBsjUEaAzErVnOJA4kXSO_g%26m%3DdjrI-zIT93wiMGeria7qiHeg1OnPwpMDkL1fmqCoeBk%26s%3Dz6APnia86i0F27gg3vvbRfIh_HCwEBpFrZWKVObYaYU%26e%3D&data=02%7C01%7Cabentley%40ku.edu%7C9b20f56ff46a4587b84f08d84934919f%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637339837426191803&sdata=sHT%2BbOrOFfIpCemASxMgK%2BLu%2FJPocbV2PwYpLXCv33E%3D&reserved=0>
>
>
>
>      A  :                A  :                A  :
>
>  }<(((_°>.,.,.,.}<(((_°>.,.,.,.}<)))_°>
>
>      V                   V                   V
>
>
> _______________________________________________
> Nhcoll-l mailing list
> Nhcoll-l at mailman.yale.edu
> https://mailman.yale.edu/mailman/listinfo/nhcoll-l
>
> _______________________________________________
> NHCOLL-L is brought to you by the Society for the Preservation of
> Natural History Collections (SPNHC), an international society whose
> mission is to improve the preservation, conservation and management of
> natural history collections to ensure their continuing value to
> society. See http://www.spnhc.org for membership information.
> Advertising on NH-COLL-L is inappropriate.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.yale.edu/pipermail/nhcoll-l/attachments/20200825/f3c180bb/attachment.html>