[eml-dev] [Bug 585] - internationalization needed in EML

Éamonn Ó Tuama eotuama at gbif.org
Tue Dec 9 14:44:51 PST 2008


Hi Matt,

I agree about the inaccessability of ISO standards - I also had to use a 
draft release of ISO 19115. At least the ISO 19139 XSD schemas are 
freely available once you accept to a stern copyright notice. You can 
view them 
here:http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/ 
, or download as a zipped archive by going here (one dir up) and 
searching for 19139: 
http://standards.iso.org/ittf/PubliclyAvailableStandards/ .

Regarding the points you raise below -

'contact' : details for the metadata writer could be different to those 
of the data custodian, collector, etc.

'locale' : I presume using the combination of language code and country 
is because one country can have multiple languages and one language can 
be spoken in many countries.

I was using the term "attribute" in the general sense of a property and 
not in the strict XML sense of element vs attribute. 'locale' is 
actually expressed as a repeating element in ISO 19139.

I have attached an incomplete example of an ISO 19139 instance document 
to show you how I understand multiple locales are used with related, 
translated free text elements. I have based this on an example in an 
INSPIRE document  "Draft Guidelines – INSPIRE metadata implementing 
rules based on ISO 19115 and ISO 19119" 
(http://inspire.brgm.fr/Documents/MD_IR_and_ISO_20080425.pdf)
See Section A.6, page 34 - 37.
I'm only beginning to explore the schemas themselves and the 
multilingual aspects of ISO 19139 so apart from what I have copied from 
the INSPIRE doc, the encoding in the example may not be fully correct.

I have joined the eml-dev list so I assume this will now get posted there.

Éamonn



Matt Jones wrote:
> Thanks, Eammon, for the information.  Very useful.
>
> The frustrating thing about ISO standards is how impossible they are 
> to obtain.  I have an old draft copy of ISO 19115, but neither I nor 
> the UC library has a copy of the current standard or of the 19139.  I 
> have a fundamental philosophical problem with standards that are not 
> free and open.  Nevertheless, I will continue to try to find a copy of 
> these so that I can look into it.
>
> In the meantime, a couple of comments below in your note...
>
> On Tue, Dec 9, 2008 at 12:23 AM, Eamonn O Tuama (GBIF) 
> <eotuama at gbif.org <mailto:eotuama at gbif.org>> wrote:
>
>     Hi All,
>
>     I presume any extensions to EML will involve changes to the
>     schemas and therefore versioning. I do not know how complicated
>     that will be – someone familiar with the EML schemas construction
>     is best suited to answering. However, I think we might be able to
>     learn something from the ISO 19115/19139 standard regarding
>     multilingual metadata.
>
>     First of all, it provides a distinct set of attributes for the
>     metadata document itself (rather than the data the metadata
>     document is describing). These include:
>
>     1. fileIdentifier
>
> same as EML packageID
>
>     2. language
>
> see my earlier discussion on this issue
>
>     3. characterSet
>
> same as 'encoding' attribute in the XML prolog
>
>     4. contact
>
>  why would the metadata contact be different from the data contact?  
> We have trouble enough keeping one up to date
>
>     5. dateStamp
>
> this would be useful, we should consider adding it to EML.  
> Presumably, this is the date on which the metadata document was last 
> updated, which would probably belong in the 'maintenance' section of EML
>
>     6. metadataStandardname
>
> provided in the namesapce of EML
>
>     7. metadataStandardVersion
>
> provided in the namespace of EML
>
>     8. locale
>
> this could be useful, although it seems like providing the language 
> code would be just as effective and essentially redundant.
>
>     ISO 19139 (the implementation standard for the conceptual model in
>     ISO 19115) also provides a means for encoding multilingual
>     metadata. This is achieved through use of an optional, repeatable
>     "locale" attribute consisting of language, country and
>     characterset encodings.
>
> This sounds interesting.  So, how does it repeat?  XML attributes are 
> not repeatable, nor do they have substructure.  Is it an element?  And 
> if so, is it a child element of every other element in the model?
>
>     Multiple instances of locale may be defined for a metadata
>     document and translations representing those locales provided for
>     each metadata element. So, repeatability in multiple languages is
>     built in.
>
> I don't quite see how this would work.  Could you show a brief snippet 
> as an example.  For example, for the title of the dataset, how would 
> you encode two titles, each in both english and spanish, and be able 
> to tell which of the elements were semantically linked?  Here's one 
> way I could see doing it, but its a bit clunky:
>  <title>
>     <translation xml:lang="en">Forests of New Mexico</translation>
>     <translation xml:lang="es">Bosques del Nuevo México</translation>
>  </title>
>  <title>
>     <translation xml:lang="en">Survey of Plants and Animals</translation>
>     <translation xml:lang="es">Estudio de Plantas y Animales</translation>
>  </title>
>  
> How would the ISO 19139 propose representing this content?
>
>     The ability to work with multiple languages is seen as a strong
>     advantage in moving from the FGDC metadata standard to the North
>     American Profile (NAP) of ISO 19115. The problem, at the moment,
>     is that a biological profile in ISO 19115 does not exist but it
>     seems that work is underway to express the FDGC Biological Profile
>     in ISO. (I understand that EML based their taxonomic module
>     directly on the FGDC biological profile component.)
>
> Actually, the BDP standard first got these fields from EML 1.3.x and 
> 1.4.x, and then EML 2.x reincorporated the changes from the BDP. 
> Either way, we've been looking at replacing the EML taxonomic module 
> with something more in line with TDWG standards, in particular with 
> TCS.  I have worked out a new set of schemas for eml-taxon with Jessie 
> Kennedy and Bob Peet that directly incorporate TCS, but I haven't had 
> time to introduce these changes to the rest of the EML community.  On 
> the TODO list.  Nevertheless, as you said, there's a lot of 
> compatibility between EML and the BDP.
>
>      
>
>     The European Union, because of its composition, has always faced
>     the challenge of dealing with multiple languages. A document by
>     the European Committee for Standardisation (CEN) on "Geographic
>     information — Standards, specifications, technical reports and
>     guidelines, required to implement Spatial Data Infrastructure"
>     (can't find URL where I downloaded originally but have PDF if
>     anyone wants it) provides some insights on "Cultural and
>     Linguistic Adaptibility" where it places the emphasis on use of
>     multilingual thesauri rather than efforts to translate element
>     contents.
>
> Interesting.  I'd like to see that.  So, given a metadata document in 
> Chinese, they are arguiing that scientists that speak other languages 
> can get by with multilingual thesausrus entries in place of the 
> natural language metadata?  I find this somewhat unconvincing if you 
> really want to re-use the data.
>
> Thanks for your comments, Eammon.
>
> Matt
>
>     See also Nowak et al paper "Issues of multilinguality in creating
>     a European SDI – the perspective for spatial data interoperability"
>
>     http://www.ec-gis.org/Workshops/11ec-gis/papers/309nowak.pdf
>
>      
>
>     Regards,
>
>      
>
>     Éamonn
>
>      
>
>      
>
>     *From:* David Blankman [mailto:dblankman1 at gmail.com
>     <mailto:dblankman1 at gmail.com>]
>     *Sent:* 08 December 2008 20:59
>     *To:* Matt Jones
>     *Cc:* inigo san gil; eml-dev at ecoinformatics.org
>     <mailto:eml-dev at ecoinformatics.org>;
>     bugzilla-daemon at ecoinformatics.org
>     <mailto:bugzilla-daemon at ecoinformatics.org>; Vivian B Hutchison;
>     burkeker at gate.sinica.edu.tw <mailto:burkeker at gate.sinica.edu.tw>;
>     chin at tfri.gov.tw <mailto:chin at tfri.gov.tw>; guoxb at igsnrr.ac.cn
>     <mailto:guoxb at igsnrr.ac.cn>; hehl at igsnrr.ac.cn
>     <mailto:hehl at igsnrr.ac.cn>; lijh at sdb.cnic.cn
>     <mailto:lijh at sdb.cnic.cn>; Aikiko Ogawa; Eamonn O Tuama; Kristin
>     Vanderbilt; Schentz Herbert; Shang; Su Wen; Werf, Bert van der
>
>     *Subject:* Re: [eml-dev] [Bug 585] - internationalization needed
>     in EML
>
>      
>
>     As I think back upon the discussions in China and my discussions
>     with Matt at ISEI, it seems to me that my initial thought that
>     multiple language versions of EML documents are probably better
>     handled by creating separate EML documents for each language used.
>     EML is already complex, I see no reason to make it more complex.
>
>
>
>     In the ILTER situation  we are asking ILTER member networks to
>     provide a core of EML in English, on the understanding that more
>     complete metadata may be in another language. In this case should
>     there be an EML module, eml-ilter or eml-language analogous to
>     eml-access that specifies the identifier of the "main"
>     eml-document and the language of that document. This module might
>     also include an element to record a brief statement about the
>     amount of data in that foreign language. I am not sure what else
>     might be appropriate for this module. I know that Matt was
>     thinking that there might be some modifications to metacat
>     replication that might be needed.
>
>     David
>
>
>
>
>     On Mon, Dec 8, 2008 at 1:34 PM, Matt Jones <jones at nceas.ucsb.edu
>     <mailto:jones at nceas.ucsb.edu>> wrote:
>
>     David and I discussed (briefly) some of these issues at ISEI.  And
>     we also discussed them at the ILTER meeting in China.  The
>     'language' tag in eml-resource defines the language of the
>     resource, which in the case of eml-dataset resources means the
>     language of the data.  Interestingly, we don't really have a
>     language tag per se for the EML document content itself, except
>     that all XML documents can use the built-in "xml:lang" attribute,
>     which is optional for all XML elements
>     (http://www.w3.org/TR/REC-xml/#sec-lang-tag).  This allows one to
>     set the language for each and every element in an XML document,
>     such as:
>
>     <title xml:lang="en">North American Forests</title>
>     <title xml:lang="es">Bosques de Norte Americano</title>
>
>     Two problems we would need to address with this approach come
>     immedately to mind:
>
>     1) Many elements in EML are not repeatable, and therefore it is
>     not possible to have one copy of the element in English and
>     another in a different language. So cardinality would have to be
>     updated throughout the EML schemas, which would make some aspects
>     of validation more confusing.
>     2) For those elements that are already repeatable or are made
>     repatable through a revision, there is no mechanism to indicate
>     that the two element nodes are meant to be have the same semantic
>     meaning in different languages, as opposed to two semantically
>     different elements that happen to also differ in their language.
>
>     This second issue is the one that would require more structural
>     changes to EML.  For example, one might sometimes want to have
>     more than one title (which is why title is currently repeatable),
>     but other times want to have one title in two different
>     languages.  Either way, EML's current structures don't allow these
>     subtleties to be specified.
>
>     Matt
>
>      
>
>     On Fri, Dec 5, 2008 at 12:54 PM, inigo san gil
>     <isangil at lternet.edu <mailto:isangil at lternet.edu>> wrote:
>
>
>     Metadata folks:
>
>     I think this opens (perhaps re-opens) and interesting discussion.
>
>     EML's resource (main module) offers us a <language> element that,
>     as I understand it, serves to specify the language used for the
>     document.
>     The cardinality is set to <= 1, so it is optional, and if used,
>     only one
>     language.
>
>     However, we understood from Kristin Valnderbilt and David Blankman
>     that at a recent ILTER meeting, there was an agreement to provide
>     referencial-level EML for all metadata in English (and perhaps more
>     rich EML in their native languages).
>     The option David proposes, providing content in two languages
>     one being english, does not play well with the EML schema as is.
>     There are options in the interim, while we think whether 'we' tweak
>     the EML schema.  Some solutions go in the direction of
>     "duplicating" the
>     original EML record: Take what it is in the native language, and
>     either
>     have it translate at some minimal-compliance level EML (ouch) or
>     run it by a translation web service and laugh (or rather cry) at
>     the results.
>
>     There are of course many other approaches to this problem, Mark
>     Servilla mentioned some in the hallways of the LTER Network Office.
>
>     The thing is that part of the international community in ecology has
>     expressed formal interest/commitment in using EML to document their
>     metadata. The ILTER group quickly realized of the Babelian challenge
>     ahead, (see Blankman's ISEI-6 presentation & future paper) and
>     David, Akiko Ocgawa and others took in helping the ILTER providing
>     basic EML in english (remember ILTER committed to use English
>     -chinglish and spanglish- as the lingua franca for referential
>     level EML,
>     EML level 1, title, creator, abstract, contact at least
>
>     Cheers,
>     Inigo
>
>
>
>
>     bugzilla-daemon at ecoinformatics.org
>     <mailto:bugzilla-daemon at ecoinformatics.org> wrote:
>
>     http://bugzilla.ecoinformatics.org/show_bug.cgi?id=585
>
>
>
>
>
>     ------- Comment #2 from mob at icess.ucsb.edu
>     <mailto:mob at icess.ucsb.edu>  2008-12-05 09:31 -------
>     This comment from an email from David Blankman:
>     As EML is becoming an international standard, we need to start
>     thinking about
>     ways to make EML more intelligent about multiple languages. While
>     EML allows
>     multiple titles, there is currently no way to indicated that
>     multiple titles
>     are equivalent. For example,if I have:
>     <title> North American Forests </title>  AND
>     <title> Bosques de Norte Americano</title>
>
>     EML currently has no way to indicate that these are the same
>     title, just in a
>     different language.
>
>     Matt and I were talking about this at the ISEI-Cancun meeting, but
>     I thought
>     that it would be a good idea to get this discussion started within
>     eml-dev and
>     the ILTER group as well.
>     _______________________________________________
>     Eml-dev mailing list
>     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>     http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>      
>
>
>     _______________________________________________
>     Eml-dev mailing list
>     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>     http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>     -- 
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     Matthew B. Jones
>     Director of Informatics Research and Development
>     National Center for Ecological Analysis and Synthesis (NCEAS)
>     UC Santa Barbara
>     jones at nceas.ucsb.edu <mailto:jones at nceas.ucsb.edu>                
>           Ph: 1-907-523-1960
>     http://www.nceas.ucsb.edu/ecoinfo
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>     _______________________________________________
>     Eml-dev mailing list
>     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>     http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>
>     -- 
>     Nature is trying very hard to make us succeed, but nature does not
>     depend on us. We are not the only experiment.
>      - R. Buckminster Fuller
>
>     If I am not for myself, then who will be for me? If I am for
>     myself alone, then who am I? If not now, when?
>     - Rabbi Hillel
>
>
>
>
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Matthew B. Jones
> Director of Informatics Research and Development
> National Center for Ecological Analysis and Synthesis (NCEAS)
> UC Santa Barbara
> jones at nceas.ucsb.edu <mailto:jones at nceas.ucsb.edu>                     
>   Ph: 1-907-523-1960
> http://www.nceas.ucsb.edu/ecoinfo
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20081209/09c3e1a1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ISO-19139-multilingual-example.xml
Type: text/xml
Size: 1757 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20081209/09c3e1a1/attachment-0001.xml>


More information about the Eml-dev mailing list