[Nhcoll-l] Unique IDs for museum objects versus specimens

Thu Aug 14 15:35:17 EDT 2014

Catching up on some responses...

It's interesting, and I think encouraging, that I'm hearing responses that
reprise many of the discussions we've had about this subject over the last
5 years or so. That suggests to me that we've pretty well plowed the ground
and are well positioned to work out solutions, rather than continuing a
hunt for use cases and edge cases.

The cascading number/letter schemes are, indeed, appealing. They give a
direct visual guide to the physical hierarchy. But, as has been pointed
out, they add complexity and a very real entry-point for human error. It's
a matter of opinion (and we all know how strongly we hold such
near-religious opinions :) but I lean towards a single, opaque, numerical
identifier for each object, rather than a more-explicit,
information-conveying string. The implication, of course, is that there's
an information system that makes it easy and immediate to see the
relationships of that ID to any/all other IDs in the collection. I'm sure I
haven't seen as many seemed-like-a-good-idea-at-the-time complex ID schemes
as Andy has seen, but even I've seen enough that I'm terribly disinclined
to invent yet one more. (Ah, I see John Simmons has jumped in with that
viewpoint, too, and very well put.)

Key to keep in mind here is that the idea I raised was precipitated by
tracking objects (and derivatives), and is independent of taxonomic sorting
(or subsorting). I may not have been clear about that initially. Some
levels of subdivision may, indeed, be assigned taxon names at various
ranks, but that's not directly related to tracking the things themselves.

And relative to some other comments — indeed you're all correct that it's
not really a "mystery" why we have the schemes we have. Collections
databases were, for obvious and justifiable reasons, usually derived
directly from the pre-existing paper ledger schemes. But as Rob implies,
we're now at a point where we're trying to expose the existence and nature
of specimens across all collections in addition to tracking jars in one
room. Schemes we come up with now should have that target in mind.

On that note, I think that what I'm thinking about is at once more focussed
and more abstract than Rob's semantic ontology suggestions. More focussed
in that it's directed to being a simple way of tracking objects as they
move through the specimen curation pipeline. More abstract in that it's a
simple directed graph, independent of semantics.

Having said that, I think that a collection object bookkeeping system based
on a branching parent-child hierarchy can be:
1. Overlaid with potentially very useful semantics, ontology-based or
otherwise, but can be used with no semantic implications if desired.
2. Used as an underlying scheme for global-scope assemblages of specimen
information.

On #2, my assumption is that we're unlikely to use within-institution ID
tags directly in a global context, if only because we can't discard all the
existing numbering schemes that already have an object ID of "1234".
However, if every object has a simple unique identifier within the
institution, it seems very tractable to assign each of those objects a
globally unique identifier for public exposure. If the in-house identifier
is also globally unique, so much the better (but does suffer from the
very-long-label problem that was pointed out).

It sounds as though Arctos has a pretty straightforward branching
hierarchical scheme.

In Specify, as I understand it, not quite yet. Specify has:
- Containers — which unify all the parts of a common-base object (by the
analogy that all the parts are in the same physical container). However,
there is no way to represent the chain of derivation of parts.
- Relationships — seems fertile, as it's a joining table that makes
directed connections between collection objects. I don't know what would
need to be done to the UI to make this an easily-accessed way of creating
object parent-child relationships and recalling the descendent and parent
hierarchies. I don't think there's (currently) any way of propagating
information from parent to child as a relationship is created (e.g. the
child object automatically gets the same collecting-event information as
its parent).
- Projects — like containers, unify all the bits of a common-base set, but
isn't directed.
- Preps — predefined sub-parts of collection objects. These are limited to
a single level of derivation and to predefined kinds of derivatives (though
the predefined kinds are, of course, definable by the user).

It is, I think, important to be able to capture the chain of derivation.
One example that is not at all abstract these days: We have DNA extracts
that come from one leg of one individual from an unsorted lot. We also have
environmental DNA extracts that come from the alcohol of the unsorted lot
jar. While it's essential to know that those both derive from the same
collecting event, it's also essential to know the particular parent of each
of those DNA extracts.

So in Specify, maybe what I'm looking for could be built on top of
"Relationships" (Hmph. Not surprising, in retrospect). There would need to
be UI components that allow for:
- Easy visualization of an objects descendants and an object's parents.
- Easy child-creation based on a particular object as parent.
- Information propagation from parent to children.

The last piece of functionality is the part, I think, that will take some
head-scratching and room-with-whiteboard-and-coffee time. Some information
should propagate from parent to child (e.g. accession information,
collecting event information). Some information must be assignable
per-object (e.g. object type [unsorted lot, individual, DNA extract]). I
can see (at least) two approaches to doing that.

A "soft" approach copies a set of fields from the parent object to the
newly-created child at the time of child creation (and of course also
creates the child->parent pointer). After that, it's up to the database
user to maintain consistency So, if someone corrects an erroneous link to
the wrong collecting event in a parent record, she also needs to find and
fix that field in all the children.

A "hard" approach has coded logic that pushes corrections like that down
the ordered chain to all the children. Care would need to be taken to pick
and choose which fields to overwrite.

A possible intermediate could be an "advisory" approach that, on a change
to a parent, recommends a set of changes to the user that should be
propagated (but it's his responsibility to do so, or choose not to).

It would also seem advisable to have code that does a consistency audit to
find potential bogosities (e.g. someone forced a collecting-event change on
a child but not on the parent).

Damn. I'll shut up now.

For a little while...

-Dean
-- 
Dean Pentcheff
pentcheff at gmail.com
dpentche at nhm.org

On Thu, Aug 14, 2014 at 9:58 AM, DLMcDonald <dlmcdonald at alaska.edu> wrote:

> Arctos has two relevant object types:
>
> 1) Cataloged Items <http://arctosdb.org/documentation/catalog/> are what
> a Curator says they are: Individuals
> <http://arctos.database.museum/guid/UAM:Mamm:19268>, lots
> <http://arctos.database.museum/guid/CUMV:Fish:11005>, a collection's part
> of a co-cataloged individual
> <http://arctos.database.museum/guid/DMNS:Mamm:12422>, etc.
> 2) Parts <http://arctosdb.org/documentation/parts/> are physical objects
> (and simultaneously leaf nodes of a hierarchical object tracking system).
>
> It is possible to catalog the aggregate (slide
> <http://arctos.database.museum/guid/MSB:Para:14740>, nest
> <http://arctos.database.museum/guid/MVZ:Egg:10588>, whatever
> <http://arctos.database.museum/guid/UAM:ES:3397>) and call the
> individuals parts, or to catalog the individuals, or any combination
> <http://arctos.database.museum/guid/MSB:Para:1145> thereof. The choice is
> somewhat arbitrary from a curatorial perspective, perhaps less so when
> considering potential citations.
>
> Arctos creates relationships between cataloged items through Identifiers,
> and provides resolvable identifiers for all specimens. This allows
> hierarchical relationships for tracking split lots, and also specimens to
> GenBank <http://arctos.database.museum/guid/UAM:Mamm:30681>, or to other
> specimen databases <http://arctos.database.museum/guid/UAM:ES:12505>, or
> within and across Arctos collections (host
> <http://arctos.database.museum/guid/MSB:Bird:34763>/parasite
> <http://arctos.database.museum/guid/KNWR:Ento:8763>, predator
> <http://arctos.database.museum/guid/DMNS:Bird:34623>/prey
> <http://arctos.database.museum/guid/DMNS:Mamm:13143>, etc
> <http://arctos.database.museum/guid/MVZ:Herp:256547>.). If the "target"
> provides machine-readable data, queries such as botfly parasites of voles
> <http://arctos.database.museum/SpecimenResults.cfm?related_term_val_1=Microtus&scientific_name=Cuterebra>
> become possible.
>
> -D
>
>
>
> On Thu, Aug 14, 2014 at 7:18 AM, Bentley, Andrew Charles <abentley at ku.edu>
> wrote:
>
>>  There are a number of issues brought up here.
>>
>>
>>
>> The first is numbering systems.  Having worked with a large number of
>> collections in my role as Usability specialist for Specify I can attest
>> that I have come across virtually every numbering system known to mankind.
>> The only one that works effectively is a simple number.  The more complex a
>> number becomes (adding delimiters or sub numbers or sub-sub numbers) the
>> more error prone the possibilities become. Not only that, but it is no
>> longer a number (in digital jargon) but is now a string which has all sorts
>> of other implications in the digital world.
>>
>>
>>
>> Secondly we have the issue of lots vs. specimens.  It has always baffled
>> me as to why entomology collections do not work on a hybrid lot system as
>> opposed to a specimen based system.  With this system you could number the
>> original “bag” or lot of mixed specimens as soon as it arrives.  As that
>> bag is separated out into other “bags” or lots through identification etc.,
>> they too can be numbered until you get to the point where individual
>> specimens are being extracted, identified and cataloged with an individual
>> number.  If any of these “bags” or lots becomes “empty” through this
>> process, they could of course be deleted from the database or maintained
>> and indicated as being empty with a count of zero.  All of this material
>> would of course be linked by the collecting event and locality information
>> as having come from the same collection but you could also group these
>> together in other ways to indicate that they are related – see below.
>>
>>
>>
>> The other issue is derivatives.  There are a number of ways in which
>> these can be handled but they both fall into two main categories.  They are
>> either treated as “new” objects or they are treated as “preparations” of
>> the original object.  If they are “new” objects they should be given a new
>> number.  If they are “preparations” of the original object, then they
>> should retain the original number and should be indicated as preparations
>> by a preparation type (skeleton, C&S, tissue, DNA extract etc.).  The
>> preparation scenario is the easiest in terms of keeping track of how these
>> items are related as they will all have the same number.  The “new” items
>> are a little more tricky but can also be handled.  In Specify there are
>> three mechanisms in which to handle these:
>>
>>
>>
>> 1.       Containers – within Specify there is a concept of containers
>> whereby distinct objects with individual catalog numbers can be linked
>> together within a single container to indicate that they came from the same
>> “parent” object or are linked in some other way.  This is most commonly
>> used in herbaria and paleo collections where multiple objects on the same
>> sheet or rock have been given different catalog numbers.
>>
>> 2.       Relationships – there is also a concept of relationships
>> whereby individuals from different collections can also be linked i.e.
>> tissue/voucher, host/parasite, host/pollinator etc. This accommodates some
>> instances where the linked items are not from the same taxonomic group and
>> are in distinct collections.
>>
>> 3.       Projects – there is also a project table in Specify that allows
>> for multiple collection objects to be grouped together in a defined manner
>> with a project title and other information.
>>
>>
>>
>> There are so many ways in which different collections and even
>> collections within the same discipline do things that it is near impossible
>> to cover every variation found.  The only way to do this is to offer as
>> many possibilities as possible to accommodate these relationships.
>>
>>
>>
>> Andy
>>
>>
>>
>>     A  :             A  :             A  :
>>  }<(((_°>.,.,.,.}<(((_°>.,.,.,.}<)))_°>
>>     V                V                V
>> Andy Bentley
>> Ichthyology Collection Manager
>> University of Kansas
>> Biodiversity Institute
>>
>> Dyche Hall
>> 1345 Jayhawk Boulevard
>> Lawrence, KS, 66045-7561
>> USA
>>
>> Tel: (785) 864-3863
>> Fax: (785) 864-5335
>> Email: abentley at ku.edu
>>
>> http://ichthyology.biodiversity.ku.edu
>>
>>
>>
>> SPNHC President
>>
>> http://www.spnhc.org
>>
>>
>>
>>                            :                 :
>>     A  :             A  :             A  :
>>  }<(((_°>.,.,.,.}<(((_°>.,.,.,.}<)))_°>
>>     V                V                V
>>
>>
>>
>> *From:* nhcoll-l-bounces at mailman.yale.edu [mailto:
>> nhcoll-l-bounces at mailman.yale.edu] *On Behalf Of *Dean Pentcheff
>> *Sent:* Wednesday, August 13, 2014 8:12 PM
>> *To:* Colin Favret
>> *Cc:* nhcoll-l at mailman.yale.edu
>> *Subject:* Re: [Nhcoll-l] Unique IDs for museum objects versus specimens
>>
>>
>>
>> This is an issue that I've raised in the past with the Specify team (and
>> plan to raise again in the near future — fair warning, guys :)
>>
>>
>>
>> The precipitating example for us comes from marine specimens. Often an
>> unsorted jar of material will arrive (e.g. from a dredge sample) to be
>> cataloged in the collection — this unsorted lot should get a unique ID — it
>> may be around for years before it's touched. Then we may pull out (for
>> example) all the crustacea into another jar. This partly-sorted lot also
>> needs a unique ID (it may go to a different room under different staff, so
>> just keeping it with the original jar is not an option). Then we may pull
>> out a single individual, identify it, and use that in a publication, so
>> that, too, needs an ID. A visiting researcher then examines that individual
>> and pulls off parasitic crustacea, identifying each and putting them into
>> individual vials, each of which needs an ID. Etc.
>>
>>
>>
>> What we have is a clear hierarchical branching parent-child relationship
>> from the initial unsorted lot down to the individual parasites (and their
>> parasites, and their molecular derivatives, etc.). Logically, the way to
>> accommodate this is to have any "thing" in the collection identified with a
>> unique ID. Any derived or subsorted "thing" gets another unique ID and (and
>> this is critical) is linked to its parent so that all the information from
>> the parent (and on up the chain to the top) is immediately available via
>> any "child" ID.
>>
>>
>>
>> Every "thing" gets a first-class ID (no sub-IDs or a limited list of
>> "preps" from an initial object). Key to the concept is retaining the
>> parent-child-grandchild-... chain. At any moment, one should be able to
>> retrieve any ID's entire chain of parents (and their associated data), or
>> any ID's entire chain of derived children (and their associated data).
>>
>>
>>
>> It is a mystery to me why this scheme is not the standard model for
>>  specimen databases where there is a habit of creating chains of
>> derivatives over time. There certainly are implementation details that need
>> careful consideration (for example with propagation of data down the chain,
>> how "locked" that propagation is, and how to handle things that get
>> completely subdivided so they no longer exist as such, but whose data must
>> persist), but it seems like a very clean, very flexible base model.
>>
>>
>>  -Dean
>> --
>> Dean Pentcheff
>> pentcheff at gmail.com
>> dpentche at nhm.org
>>
>>
>>
>> On Wed, Aug 13, 2014 at 4:01 PM, Colin Favret <ColinFavret at aphidnet.org>
>> wrote:
>>
>>  Has anyone dealt with the distinction between issuing unique IDs (for
>> labels and database records) for museum objects versus specimens? A case in
>> point might be a microscope slide with 100 specimens on it (or a jar,
>> envelope, etc.). These specimens can be of multiple taxa, different sexes,
>> life stages, etc. I believe most collections label the museum object
>> (slide, jar, envelope, etc.) with a unique identifier and then treat the
>> specimens as a lot, but this doesn't fully parse out the data associated
>> with the various specimens in a specimen database.
>>
>>
>>
>> I've developed my own solution (unique ID label for the object, decimal
>> numbers but no label for the individual specimens or specimen lots - e.g.
>> INST123456 for the slide, INST123456.001 for the first specimen lot,
>> INST123456.002 for the second, etc.).
>>
>>
>>
>> But I'm wondering what others have done or if there is anything out there
>> approaching an industry standard.
>>
>>
>>
>> Thanks for your input!
>>
>>
>>
>> Colin
>>
>>
>>
>> Colin Favret
>>
>> Université de Montréal
>>
>> Favret.AphidNet.org <http://favret.aphidnet.org/>
>>
>>
>> _______________________________________________
>> Nhcoll-l mailing list
>> Nhcoll-l at mailman.yale.edu
>> http://mailman.yale.edu/mailman/listinfo/nhcoll-l
>>
>> _______________________________________________
>> NHCOLL-L is brought to you by the Society for the Preservation of
>> Natural History Collections (SPNHC), an international society whose
>> mission is to improve the preservation, conservation and management of
>> natural history collections to ensure their continuing value to
>> society. See http://www.spnhc.org for membership information.
>>
>>
>>
>> _______________________________________________
>> Nhcoll-l mailing list
>> Nhcoll-l at mailman.yale.edu
>> http://mailman.yale.edu/mailman/listinfo/nhcoll-l
>>
>> _______________________________________________
>> NHCOLL-L is brought to you by the Society for the Preservation of
>> Natural History Collections (SPNHC), an international society whose
>> mission is to improve the preservation, conservation and management of
>> natural history collections to ensure their continuing value to
>> society. See http://www.spnhc.org for membership information.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.yale.edu/pipermail/nhcoll-l/attachments/20140814/00cdfd39/attachment.html