[eml-dev] [Bug 585] - internationalization needed in EML
Matt Jones
jones at nceas.ucsb.edu
Mon Dec 8 10:34:59 PST 2008
David and I discussed (briefly) some of these issues at ISEI. And we also
discussed them at the ILTER meeting in China. The 'language' tag in
eml-resource defines the language of the resource, which in the case of
eml-dataset resources means the language of the data. Interestingly, we
don't really have a language tag per se for the EML document content itself,
except that all XML documents can use the built-in "xml:lang" attribute,
which is optional for all XML elements (
http://www.w3.org/TR/REC-xml/#sec-lang-tag). This allows one to set the
language for each and every element in an XML document, such as:
<title xml:lang="en">North American Forests</title>
<title xml:lang="es">Bosques de Norte Americano</title>
Two problems we would need to address with this approach come immedately to
mind:
1) Many elements in EML are not repeatable, and therefore it is not possible
to have one copy of the element in English and another in a different
language. So cardinality would have to be updated throughout the EML
schemas, which would make some aspects of validation more confusing.
2) For those elements that are already repeatable or are made repatable
through a revision, there is no mechanism to indicate that the two element
nodes are meant to be have the same semantic meaning in different languages,
as opposed to two semantically different elements that happen to also differ
in their language.
This second issue is the one that would require more structural changes to
EML. For example, one might sometimes want to have more than one title
(which is why title is currently repeatable), but other times want to have
one title in two different languages. Either way, EML's current structures
don't allow these subtleties to be specified.
Matt
On Fri, Dec 5, 2008 at 12:54 PM, inigo san gil <isangil at lternet.edu> wrote:
>
> Metadata folks:
>
> I think this opens (perhaps re-opens) and interesting discussion.
>
> EML's resource (main module) offers us a <language> element that,
> as I understand it, serves to specify the language used for the document.
> The cardinality is set to <= 1, so it is optional, and if used, only one
> language.
>
> However, we understood from Kristin Valnderbilt and David Blankman
> that at a recent ILTER meeting, there was an agreement to provide
> referencial-level EML for all metadata in English (and perhaps more
> rich EML in their native languages).
> The option David proposes, providing content in two languages
> one being english, does not play well with the EML schema as is.
> There are options in the interim, while we think whether 'we' tweak
> the EML schema. Some solutions go in the direction of "duplicating" the
> original EML record: Take what it is in the native language, and either
> have it translate at some minimal-compliance level EML (ouch) or
> run it by a translation web service and laugh (or rather cry) at the
> results.
>
> There are of course many other approaches to this problem, Mark
> Servilla mentioned some in the hallways of the LTER Network Office.
>
> The thing is that part of the international community in ecology has
> expressed formal interest/commitment in using EML to document their
> metadata. The ILTER group quickly realized of the Babelian challenge
> ahead, (see Blankman's ISEI-6 presentation & future paper) and
> David, Akiko Ocgawa and others took in helping the ILTER providing
> basic EML in english (remember ILTER committed to use English
> -chinglish and spanglish- as the lingua franca for referential level EML,
> EML level 1, title, creator, abstract, contact at least
>
> Cheers,
> Inigo
>
>
>
> bugzilla-daemon at ecoinformatics.org wrote:
>
>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=585
>>
>>
>>
>>
>>
>> ------- Comment #2 from mob at icess.ucsb.edu 2008-12-05 09:31 -------
>> This comment from an email from David Blankman:
>> As EML is becoming an international standard, we need to start thinking
>> about
>> ways to make EML more intelligent about multiple languages. While EML
>> allows
>> multiple titles, there is currently no way to indicated that multiple
>> titles
>> are equivalent. For example,if I have:
>> <title> North American Forests </title> AND
>> <title> Bosques de Norte Americano</title>
>>
>> EML currently has no way to indicate that these are the same title, just
>> in a
>> different language.
>>
>> Matt and I were talking about this at the ISEI-Cancun meeting, but I
>> thought
>> that it would be a good idea to get this discussion started within eml-dev
>> and
>> the ILTER group as well.
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew B. Jones
Director of Informatics Research and Development
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara
jones at nceas.ucsb.edu Ph: 1-907-523-1960
http://www.nceas.ucsb.edu/ecoinfo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20081208/5106880e/attachment.html>
More information about the Eml-dev
mailing list