[eml-dev] [Bug 585] - internationalization needed in EML
Éamonn Ó Tuama
eotuama at gbif.org
Tue Dec 9 14:44:51 PST 2008
Hi Matt,
I agree about the inaccessability of ISO standards - I also had to use a
draft release of ISO 19115. At least the ISO 19139 XSD schemas are
freely available once you accept to a stern copyright notice. You can
view them
here:http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/
, or download as a zipped archive by going here (one dir up) and
searching for 19139:
http://standards.iso.org/ittf/PubliclyAvailableStandards/ .
Regarding the points you raise below -
'contact' : details for the metadata writer could be different to those
of the data custodian, collector, etc.
'locale' : I presume using the combination of language code and country
is because one country can have multiple languages and one language can
be spoken in many countries.
I was using the term "attribute" in the general sense of a property and
not in the strict XML sense of element vs attribute. 'locale' is
actually expressed as a repeating element in ISO 19139.
I have attached an incomplete example of an ISO 19139 instance document
to show you how I understand multiple locales are used with related,
translated free text elements. I have based this on an example in an
INSPIRE document "Draft Guidelines – INSPIRE metadata implementing
rules based on ISO 19115 and ISO 19119"
(http://inspire.brgm.fr/Documents/MD_IR_and_ISO_20080425.pdf)
See Section A.6, page 34 - 37.
I'm only beginning to explore the schemas themselves and the
multilingual aspects of ISO 19139 so apart from what I have copied from
the INSPIRE doc, the encoding in the example may not be fully correct.
I have joined the eml-dev list so I assume this will now get posted there.
Éamonn
Matt Jones wrote:
> Thanks, Eammon, for the information. Very useful.
>
> The frustrating thing about ISO standards is how impossible they are
> to obtain. I have an old draft copy of ISO 19115, but neither I nor
> the UC library has a copy of the current standard or of the 19139. I
> have a fundamental philosophical problem with standards that are not
> free and open. Nevertheless, I will continue to try to find a copy of
> these so that I can look into it.
>
> In the meantime, a couple of comments below in your note...
>
> On Tue, Dec 9, 2008 at 12:23 AM, Eamonn O Tuama (GBIF)
> <eotuama at gbif.org <mailto:eotuama at gbif.org>> wrote:
>
> Hi All,
>
> I presume any extensions to EML will involve changes to the
> schemas and therefore versioning. I do not know how complicated
> that will be – someone familiar with the EML schemas construction
> is best suited to answering. However, I think we might be able to
> learn something from the ISO 19115/19139 standard regarding
> multilingual metadata.
>
> First of all, it provides a distinct set of attributes for the
> metadata document itself (rather than the data the metadata
> document is describing). These include:
>
> 1. fileIdentifier
>
> same as EML packageID
>
> 2. language
>
> see my earlier discussion on this issue
>
> 3. characterSet
>
> same as 'encoding' attribute in the XML prolog
>
> 4. contact
>
> why would the metadata contact be different from the data contact?
> We have trouble enough keeping one up to date
>
> 5. dateStamp
>
> this would be useful, we should consider adding it to EML.
> Presumably, this is the date on which the metadata document was last
> updated, which would probably belong in the 'maintenance' section of EML
>
> 6. metadataStandardname
>
> provided in the namesapce of EML
>
> 7. metadataStandardVersion
>
> provided in the namespace of EML
>
> 8. locale
>
> this could be useful, although it seems like providing the language
> code would be just as effective and essentially redundant.
>
> ISO 19139 (the implementation standard for the conceptual model in
> ISO 19115) also provides a means for encoding multilingual
> metadata. This is achieved through use of an optional, repeatable
> "locale" attribute consisting of language, country and
> characterset encodings.
>
> This sounds interesting. So, how does it repeat? XML attributes are
> not repeatable, nor do they have substructure. Is it an element? And
> if so, is it a child element of every other element in the model?
>
> Multiple instances of locale may be defined for a metadata
> document and translations representing those locales provided for
> each metadata element. So, repeatability in multiple languages is
> built in.
>
> I don't quite see how this would work. Could you show a brief snippet
> as an example. For example, for the title of the dataset, how would
> you encode two titles, each in both english and spanish, and be able
> to tell which of the elements were semantically linked? Here's one
> way I could see doing it, but its a bit clunky:
> <title>
> <translation xml:lang="en">Forests of New Mexico</translation>
> <translation xml:lang="es">Bosques del Nuevo México</translation>
> </title>
> <title>
> <translation xml:lang="en">Survey of Plants and Animals</translation>
> <translation xml:lang="es">Estudio de Plantas y Animales</translation>
> </title>
>
> How would the ISO 19139 propose representing this content?
>
> The ability to work with multiple languages is seen as a strong
> advantage in moving from the FGDC metadata standard to the North
> American Profile (NAP) of ISO 19115. The problem, at the moment,
> is that a biological profile in ISO 19115 does not exist but it
> seems that work is underway to express the FDGC Biological Profile
> in ISO. (I understand that EML based their taxonomic module
> directly on the FGDC biological profile component.)
>
> Actually, the BDP standard first got these fields from EML 1.3.x and
> 1.4.x, and then EML 2.x reincorporated the changes from the BDP.
> Either way, we've been looking at replacing the EML taxonomic module
> with something more in line with TDWG standards, in particular with
> TCS. I have worked out a new set of schemas for eml-taxon with Jessie
> Kennedy and Bob Peet that directly incorporate TCS, but I haven't had
> time to introduce these changes to the rest of the EML community. On
> the TODO list. Nevertheless, as you said, there's a lot of
> compatibility between EML and the BDP.
>
>
>
> The European Union, because of its composition, has always faced
> the challenge of dealing with multiple languages. A document by
> the European Committee for Standardisation (CEN) on "Geographic
> information — Standards, specifications, technical reports and
> guidelines, required to implement Spatial Data Infrastructure"
> (can't find URL where I downloaded originally but have PDF if
> anyone wants it) provides some insights on "Cultural and
> Linguistic Adaptibility" where it places the emphasis on use of
> multilingual thesauri rather than efforts to translate element
> contents.
>
> Interesting. I'd like to see that. So, given a metadata document in
> Chinese, they are arguiing that scientists that speak other languages
> can get by with multilingual thesausrus entries in place of the
> natural language metadata? I find this somewhat unconvincing if you
> really want to re-use the data.
>
> Thanks for your comments, Eammon.
>
> Matt
>
> See also Nowak et al paper "Issues of multilinguality in creating
> a European SDI – the perspective for spatial data interoperability"
>
> http://www.ec-gis.org/Workshops/11ec-gis/papers/309nowak.pdf
>
>
>
> Regards,
>
>
>
> Éamonn
>
>
>
>
>
> *From:* David Blankman [mailto:dblankman1 at gmail.com
> <mailto:dblankman1 at gmail.com>]
> *Sent:* 08 December 2008 20:59
> *To:* Matt Jones
> *Cc:* inigo san gil; eml-dev at ecoinformatics.org
> <mailto:eml-dev at ecoinformatics.org>;
> bugzilla-daemon at ecoinformatics.org
> <mailto:bugzilla-daemon at ecoinformatics.org>; Vivian B Hutchison;
> burkeker at gate.sinica.edu.tw <mailto:burkeker at gate.sinica.edu.tw>;
> chin at tfri.gov.tw <mailto:chin at tfri.gov.tw>; guoxb at igsnrr.ac.cn
> <mailto:guoxb at igsnrr.ac.cn>; hehl at igsnrr.ac.cn
> <mailto:hehl at igsnrr.ac.cn>; lijh at sdb.cnic.cn
> <mailto:lijh at sdb.cnic.cn>; Aikiko Ogawa; Eamonn O Tuama; Kristin
> Vanderbilt; Schentz Herbert; Shang; Su Wen; Werf, Bert van der
>
> *Subject:* Re: [eml-dev] [Bug 585] - internationalization needed
> in EML
>
>
>
> As I think back upon the discussions in China and my discussions
> with Matt at ISEI, it seems to me that my initial thought that
> multiple language versions of EML documents are probably better
> handled by creating separate EML documents for each language used.
> EML is already complex, I see no reason to make it more complex.
>
>
>
> In the ILTER situation we are asking ILTER member networks to
> provide a core of EML in English, on the understanding that more
> complete metadata may be in another language. In this case should
> there be an EML module, eml-ilter or eml-language analogous to
> eml-access that specifies the identifier of the "main"
> eml-document and the language of that document. This module might
> also include an element to record a brief statement about the
> amount of data in that foreign language. I am not sure what else
> might be appropriate for this module. I know that Matt was
> thinking that there might be some modifications to metacat
> replication that might be needed.
>
> David
>
>
>
>
> On Mon, Dec 8, 2008 at 1:34 PM, Matt Jones <jones at nceas.ucsb.edu
> <mailto:jones at nceas.ucsb.edu>> wrote:
>
> David and I discussed (briefly) some of these issues at ISEI. And
> we also discussed them at the ILTER meeting in China. The
> 'language' tag in eml-resource defines the language of the
> resource, which in the case of eml-dataset resources means the
> language of the data. Interestingly, we don't really have a
> language tag per se for the EML document content itself, except
> that all XML documents can use the built-in "xml:lang" attribute,
> which is optional for all XML elements
> (http://www.w3.org/TR/REC-xml/#sec-lang-tag). This allows one to
> set the language for each and every element in an XML document,
> such as:
>
> <title xml:lang="en">North American Forests</title>
> <title xml:lang="es">Bosques de Norte Americano</title>
>
> Two problems we would need to address with this approach come
> immedately to mind:
>
> 1) Many elements in EML are not repeatable, and therefore it is
> not possible to have one copy of the element in English and
> another in a different language. So cardinality would have to be
> updated throughout the EML schemas, which would make some aspects
> of validation more confusing.
> 2) For those elements that are already repeatable or are made
> repatable through a revision, there is no mechanism to indicate
> that the two element nodes are meant to be have the same semantic
> meaning in different languages, as opposed to two semantically
> different elements that happen to also differ in their language.
>
> This second issue is the one that would require more structural
> changes to EML. For example, one might sometimes want to have
> more than one title (which is why title is currently repeatable),
> but other times want to have one title in two different
> languages. Either way, EML's current structures don't allow these
> subtleties to be specified.
>
> Matt
>
>
>
> On Fri, Dec 5, 2008 at 12:54 PM, inigo san gil
> <isangil at lternet.edu <mailto:isangil at lternet.edu>> wrote:
>
>
> Metadata folks:
>
> I think this opens (perhaps re-opens) and interesting discussion.
>
> EML's resource (main module) offers us a <language> element that,
> as I understand it, serves to specify the language used for the
> document.
> The cardinality is set to <= 1, so it is optional, and if used,
> only one
> language.
>
> However, we understood from Kristin Valnderbilt and David Blankman
> that at a recent ILTER meeting, there was an agreement to provide
> referencial-level EML for all metadata in English (and perhaps more
> rich EML in their native languages).
> The option David proposes, providing content in two languages
> one being english, does not play well with the EML schema as is.
> There are options in the interim, while we think whether 'we' tweak
> the EML schema. Some solutions go in the direction of
> "duplicating" the
> original EML record: Take what it is in the native language, and
> either
> have it translate at some minimal-compliance level EML (ouch) or
> run it by a translation web service and laugh (or rather cry) at
> the results.
>
> There are of course many other approaches to this problem, Mark
> Servilla mentioned some in the hallways of the LTER Network Office.
>
> The thing is that part of the international community in ecology has
> expressed formal interest/commitment in using EML to document their
> metadata. The ILTER group quickly realized of the Babelian challenge
> ahead, (see Blankman's ISEI-6 presentation & future paper) and
> David, Akiko Ocgawa and others took in helping the ILTER providing
> basic EML in english (remember ILTER committed to use English
> -chinglish and spanglish- as the lingua franca for referential
> level EML,
> EML level 1, title, creator, abstract, contact at least
>
> Cheers,
> Inigo
>
>
>
>
> bugzilla-daemon at ecoinformatics.org
> <mailto:bugzilla-daemon at ecoinformatics.org> wrote:
>
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=585
>
>
>
>
>
> ------- Comment #2 from mob at icess.ucsb.edu
> <mailto:mob at icess.ucsb.edu> 2008-12-05 09:31 -------
> This comment from an email from David Blankman:
> As EML is becoming an international standard, we need to start
> thinking about
> ways to make EML more intelligent about multiple languages. While
> EML allows
> multiple titles, there is currently no way to indicated that
> multiple titles
> are equivalent. For example,if I have:
> <title> North American Forests </title> AND
> <title> Bosques de Norte Americano</title>
>
> EML currently has no way to indicate that these are the same
> title, just in a
> different language.
>
> Matt and I were talking about this at the ISEI-Cancun meeting, but
> I thought
> that it would be a good idea to get this discussion started within
> eml-dev and
> the ILTER group as well.
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Matthew B. Jones
> Director of Informatics Research and Development
> National Center for Ecological Analysis and Synthesis (NCEAS)
> UC Santa Barbara
> jones at nceas.ucsb.edu <mailto:jones at nceas.ucsb.edu>
> Ph: 1-907-523-1960
> http://www.nceas.ucsb.edu/ecoinfo
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>
> --
> Nature is trying very hard to make us succeed, but nature does not
> depend on us. We are not the only experiment.
> - R. Buckminster Fuller
>
> If I am not for myself, then who will be for me? If I am for
> myself alone, then who am I? If not now, when?
> - Rabbi Hillel
>
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Matthew B. Jones
> Director of Informatics Research and Development
> National Center for Ecological Analysis and Synthesis (NCEAS)
> UC Santa Barbara
> jones at nceas.ucsb.edu <mailto:jones at nceas.ucsb.edu>
> Ph: 1-907-523-1960
> http://www.nceas.ucsb.edu/ecoinfo
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20081209/09c3e1a1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ISO-19139-multilingual-example.xml
Type: text/xml
Size: 1757 bytes
Desc: not available
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20081209/09c3e1a1/attachment-0001.xml>
More information about the Eml-dev
mailing list