[eml-dev] EML question

Gail Steinhart gss1 at cornell.edu
Mon Jun 30 10:00:08 PDT 2008


Thanks everyone, for your helpful suggestions.

I think where the species lists are not too long, 
we can manage to get them (and higher tax. 
levels) into the metadata, and will aim for that. 
Where we have too many (just a phytoplankton data 
set, probably), we'll decide how far down the 
taxonomic tree we can go in terms of getting that 
info into the metadata, and upload a table that 
lists all the species and the codes used in the 
data set. For phyto we have a species list but I 
know of no easy way to get the entire hierarchy 
above that from the list (does anyone?), since we 
don't have time to look up/record this info for 
possibly 100's of species. Lacking that I think 
we are shooting for a compromise between 
including as much info as we can, and actually completing metadata records.

Best,
Gail

At 01:22 AM 6/28/2008, David Blankman wrote:
>Hi Gail,
>
>I wanted to add an observation that may be 
>covered in LTER EML best practice 
>recommendations, but one that I think is worth 
>noting. People rarely search at the same 
>taxonomic level that represented by the data. 
>Generally data is presented at the species 
>level, but taxonomic searches are usually done higher up on the tree.
>
>The searching rank differs both for the 
>taxonomic group and domain of the 
>researcher.  For example, most "plant" people 
>are rarely interested in distinctions above the 
>level of Family. The same is probably not the 
>case for people interested in insects or for 
>invertebrates in general. People interested in 
>mammals on the other hand, probably are 
>interested in distinctions further down the tree.
>
>I don't know if there is any clear rule for what 
>rank is most likely to be searched for any 
>different taxonomic group, but there may be some general guidelines.
>
>I say all of this in order to recommend 
>including documentation above the species level. 
>Clearly the more of the taxonomic tree that is 
>included the greater is the likelihood that your 
>data will be found by someone's taxonomic search.
>
>Wade, as usual, has made life easier by 
>providing a variety of different trees. If, on 
>the otherhand, you want to provide a minimal set 
>of taxonomic coverage, find out from the domain 
>scientists what their search heuristics are.
>
>David Blankman
>Director of Information Management, Israel LTER/Ma'arag
>Mitrani Department of Desert Ecology
>Jacob Blaustein Desert Research Institute
>Ben Gurion University
>Midreshet Ben Gurion, 84990 Israel
>972-54-685-9345 (cell)
>1-505-349-5680 (Skype)
>
>
>
>On Sat, Jun 28, 2008 at 12:49 AM, Wade Sheldon 
><<mailto:sheldon at uga.edu>sheldon at uga.edu> wrote:
>Hi Gail,
>
>I agree with Margaret's comments on the LTER EML 
>best practice recommendations (I recall I wrote 
>that section, as the only person populating 
>taxonomicCoverage in LTER at that point). EML 
>was still fairly untested in 2004 and the 
>trade-offs between using the taxonomicCoverage 
>tree and data tables were hard to anticipate (and perhaps still are).
>
>If you are unclear on the difference between the 
>"list" style and "tree" style of tag nesting 
>(based on what Callie quoted), you can use our 
>taxonomic database web application to generate 
>species lists as EML 2.01 documents with either 
>implementation at: 
><http://gce-lter.marsci.uga.edu/public/app/all_species_lists.asp>http://gce-lter.marsci.uga.edu/public/app/all_species_lists.asp
>
>At GCE LTER we use the "list" style (without tag 
>nesting within common taxa) for our data sets 
>regardless of the number of references, but 
>that's easy for us because the taxonomicCoverage 
>is automatically generated from our taxonomic 
>database for all referenced species, so it's no 
>more effort. We then include species codes in 
>the primary data tables with codes defined in 
>the attribute metadata. However in EML 2.01 we 
>can't link the codes to the taxonomicCoverage 
>nodes anyway, so there's no linkage to the 
>taxonomic details. That may argue for the 
>secondary table approach Margaret uses (where 
>you can even define a foreign key relationship 
>between entities using the EML "constraint" module).
>
>As for whether to include higher level taxa or 
>not, the key advantage as Margaret said is to 
>support metadata searches. As for how many 
>end-users search for data based on taxonomic 
>terms, perhaps Matt et al. can answer based on Metacat search logs.
>
>Regards,
>
>Wade Sheldon
>GCE-LTER
>
>
>Margaret O'Brien wrote:
> > Hi Gail -
> > Adding to what Callie told you, I have seen several ways to include
> > taxonomic information in EML. By the way, the document that Callie
> > referenced is not really precise in it's recommendations, partly because
> > in 2004 there were not a large number of rich EML files to learn from.
> > It is in need of an update, and somewhat specific to LTER needs, but if
> > you are interested in seeing how one group uses EML, I can get a copy to
> > you.
> >
> > We often put taxonomic information in a data table as you have
> > suggested. This is the simplest method when the list is long, or is
> > already included in the table to be published. If a dataset is concerned
> > with only a few species, then we include a taxonomicCoverage tree with
> > all the ranks labled. The flexibility of EML means that you could
> > include any (or all) ranks, or just the unique binomial. The entire
> > binomial should be included as one string, according to the rules of
> > binomial nomenclature. So this form is recommended:
> > <taxonomicClassification>
> > <taxonRankName>genus</taxonRankName>
> > <taxonRankValue>Macrocystis</taxonRankValue>
> > <taxonomicClassification>
> > <taxonRankName>species</taxonRankName>
> > <taxonRankValue>Macrocystis pyrifera</taxonRankValue>
> > </taxonomicClassification>
> > </taxonomicClassification>
> >
> > but not
> > <taxonomicClassification>
> > <taxonRankName>genus</taxonRankName>
> > <taxonRankValue>Macrocystis</taxonRankValue>
> > <taxonomicClassification>
> > <taxonRankName>species</taxonRankName>
> > <taxonRankValue>pyrifera</taxonRankValue>
> > </taxonomicClassification>
> > </taxonomicClassification>
> >
> > Ideally, it would be great to get all the taxonomic info into the
> > metadata so that it can be effectively searched. This can be impractical
> > though, and if many taxa are included, the metadata can be quite
> > extensive. I have cc'd the EML development group with your question, in
> > case any others want to chime in. Please let this group know of your
> > experiences using EML -
> > Regards,
> > Margaret O'Brien
> >
> > ========================
> > Margaret O'Brien
> > Information Management
> > Santa Barbara Coastal LTER
> > Marine Science Institute
> > University of California
> > Santa Barbara, CA  93106-6150
> >
> > 805-893-2071
> > <mailto:mob at icess.ucsb.edu>mob at icess.ucsb.edu
> > http://sbc.lternet.edu
> > ========================
> >
> >
> >
> > Callie Bowdish wrote:
> >> Hi Gail,
> >>
> >> Here is a section out of the LTER emlbestpractices_oct2004.doc. I
> >> think the phrase "organisms relevant to the study" and "broader
> >> taxonomic searches" are helpful things to keep in mind when making
> >> decisions on how much taxonomic information to include. It is also
> >> considered important to include the Classification System or authority
> >> that was used for naming when possible. Archived data is designed to
> >> last for a long time so the ability to find something that may not
> >> seem so important currently may in the future be valuable. It is also
> >> a good reason to put some thought into including the Classification
> >> System and choosing what taxon to include in the eml document.
> >>
> >> "<taxonomicCoverage> The <taxonomicCoverage> element (see Example 2.1)
> >> should be used to document taxonomic information for all organisms
> >> relevant to the study. Genus, species name binomial and common name
> >> should always be included, but higher level taxa should also be
> >> included whenever possible to support broader taxonomic searches.
> >> Blocks of <taxonomicClassification> elements should be hierarchically
> >> nested within a single <taxonomicCoverage> element as illustrated in
> >> Example 2.1 rather than repeated at the same level. The
> >> <generalTaxonomicCoverage> element should be included to describe the
> >> general procedure of how the taxonomy was determined (keys used,
> >> etc.), should include a general textual description of all flora/fauna
> >> in the study (scope), as well as how finely grained the taxonomy is
> >> broken down to – for example "family" or "genus and species."
> >>
> >> Note that elements within common <taxonRankName> entries can be
> >> combined in the hierarchy to create a taxonomic "tree" (not
> >> illustrated), but this practice may impede combining and re-using
> >> <taxonomicClassification> information from multiple documents and is
> >> not generally recommended for data set documentation."
> >>
> >> I have also cc'd Matt Jones at NCEAS and Margaret who is an LTER
> >> information manager to see if they have any comments or insight into
> >> your "best practice" question.
> >>
> >> Callie
> >>
> >>
> >> Gail Steinhart wrote:
> >>> Hi Callie,
> >>>
> >>> We're wondering if there is a "best practice" when it comes to
> >>> specifying taxonomic coverage in EML. We have some data sets where
> >>> there are a couple of dozen species (fish), and others where there
> >>> might be hundreds (phytoplankton). In most cases we have or can make
> >>> (without too much effort) a complete table of species and upload that
> >>> as a data table, but is that overkill? Would it be better to simply
> >>> specify a higher taxa - (phytoplankton rather than all of the
> >>> species)? Can you offer any advice on that?
> >>>
> >>> Thanks,
> >>> Gail
> >>>
> >>>
> >>>
> >>> Gail Steinhart
> >>> Research Data & Environmental Sciences Librarian
> >>> Albert R. Mann Library
> >>> Cornell University
> >>> Ithaca, NY 14853
> >>>
> >>> Phone: 607-255-7251
> >>> Fax: 607-255-0318
> >>> E-mail: <mailto:GSS1 at cornell.edu>GSS1 at cornell.edu
> >>>
> > _______________________________________________
> > Eml-dev mailing list
> > <mailto:Eml-dev at ecoinformatics.org>Eml-dev at ecoinformatics.org
> > http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>--
>______________________________________________________________________________
>
>Wade M. Sheldon
>GCE-LTER Information Manager/SIMO Database Administrator
>School of Marine Programs
>University of Georgia
>Athens, GA 30602-3636
>Email: <mailto:sheldon at uga.edu>sheldon at uga.edu
>WWW: 
><http://gce-lter.marsci.uga.edu/public/app/personnel_bios.asp?id=wsheldon>http://gce-lter.marsci.uga.edu/public/app/personnel_bios.asp?id=wsheldon
>
>_______________________________________________
>Eml-dev mailing list
><mailto:Eml-dev at ecoinformatics.org>Eml-dev at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>
>--
>Nature is trying very hard to make us succeed, 
>but nature does not depend on us. We are not the only experiment.
>- R. Buckminster Fuller
>
>If I am not for myself, then who will be for me? 
>If I am for myself alone, then who am I? If not now, when?
>- Rabbi Hillel



Gail Steinhart
Research Data & Environmental Sciences Librarian
Albert R. Mann Library
Cornell University
Ithaca, NY 14853

Phone: 607-255-7251
Fax: 607-255-0318
E-mail: GSS1 at cornell.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20080630/ae7db8b8/attachment.htm 


More information about the Eml-dev mailing list