[eml-dev] EML 2.0.2 changes to text leaf nodes
Mark Servilla
servilla at lternet.edu
Fri Mar 21 20:44:51 PDT 2008
Hi Everyone,
This is a great discussion, and certainly presses the issue of meaning
versus presentation in XML. In my humble opinion, I disagree with the
movement toward allowing more presentation-like tagging within EML
specifically, and XML in general. I realize that it simplifies the
decisions to be made within the rendering process, but it does not add
any meaning to the textual components of the content in question except
when inferred by the human who is viewing the rendered content. This is
because the "meaning" is context sensitive. As an example, I can infer
that "Ephedra trifurca" is a specie name in the title "Sex in Ephedra
trifurca (Ephedraceae) with Relation to Chihuahuan Desert Habitats"
Brunt, J.W. et al (1987) because it is italicized and I am familiar with
plant ecology. In this example, the title would be written "<title>Sex
in <emphasis>Ephedra trifurca</emphasis> (Ephedraceae) with Relation to
Chihuahuan Desert Habitats</title>" based on the suggested changes to
the EML 2.0.1 schema. Would it not be more powerful to provide semantic
tagging to textual components, thereby giving the content specific and
concise meaning? As an alternative - "<title>Sex in <specie>Ephedra
trifurca</sepcie> (Ephedraceae) with Relation to Chihuahuan Desert
Habitats</title>." In the later example, "Ephedra trifurca" is clearly
defined as a specie and the rendering process can decide how to publish
the text based on its meaning. This approach may open a can of worms
because of the unlimited number of possible tags, but it is certainly
more informative in systems where context cannot be inferred, such as
machine-to-machine interactions. I would make a similar argument
against the use of superscript and subscript for use in both chemical
and mathematical formula; the former can easily result in mistaking an
exponent for a footnote, while the later can result in mistaking a
chemical formula for a variable index in a mathematical expression. I
believe I understand the motivation for the suggested changes, but I
don't believe they will serve as a benefit in the long run. Please bang
on me if I am really missing something here. And with the economy
tanking, it is only my 0.0002 cents.
Sincerely,
Mark
inigo wrote:
>
>
> ...And how do you envision, in practice, XSL interpreting
> a bare "string" into formatted text? If you don't give any cues
> to XSLT in the form of markup tags (as for when to emphasize,
> make a newline, or a new section, an underline, or boldface)
> it is a guessing game.
>
> Those markup tags do not get on the way of content.
> Whenever is chosen, XSL can flatten out all the content
> of a branch (leaf) and pipe it as desired. On the contrary,
> without markup for formatting, you lose all the richness
> associated with text. Did you ever wonder why the vast
> majority of people choose <i>MS word</i> or <i> OpenOffice</i>
> as opposed to 'vi', 'ed', or DOS 'edit'. We are not just
> programming here, we are passing content with certain
> syntactical and formal cues to the reader. Do you ever
> wonder why a scientist in Grenoble decided to come
> up with HTML? may be adding some tags (title,underline,
> strikeout, italics, boldface, and a suite of fonts, etc) was
> not such a bad idea to replace the good ol' gophers.
> Imagine e-commerce in flat text.
>
> In the extreme, is the case of people who pass ASCII
> based "maps" of plot division (I.e:Cedar Creek LTER)
> completely destroyed by the Metacat Stylesheets that
> are unable to observe the minimum markup (such
> as "literalLayout".) But how about methodogies that
> are not well described by the tandem <substep>-<description>?
> A little format goes a long way in helping the reader.
> And it does not get that much in the way of
> the "content". But
> Christopher Jones wrote:
>> Hi all,
>>
>> I strongly agree that content and presentation, ideally, should be
>> kept separate by allowing stylesheets to handle the latter. I'm
>> struggling a bit with what constitutes 'content'. A structural tag
>> such as <title> lends 'meaning' to the contained text, at least in
>> english. A <b> tag in HTML seems much more presentational - it
>> doesn't add meaning, merely emphasis. However, when formatting
>> conventions in scientific domains lend 'meaning' to text, like
>> italicizing species binomials, it seems that we need to provide the
>> facility for this, lest we lose semantic information.
>>
>> I agree with Wade that we walk a fine line here between expressing
>> semantics and presenting. Cluttered EML docs could abound. Is the
>> preservation of 'meaning' worth the trade-off?
>>
>> On Mar 20, 2008, at Mar20---3:06:43 PM, Wade Sheldon wrote:
>>
>>> I think your casual example makes this point very well - what real
>>> use is preserving <emphasis> markup in a data set title? That's what
>>> XSL is for. If this is a legacy issue for some metadata providers,
>>> then I think they should be encouraged (or helped) to offload
>>> embedded display markup when porting to EML.
>>>
>>
>> True, my example was a bit simple. A better example would be the
>> species binomial case:
>>
>> <title>
>> Acetylene reduction and 15N2 uptake rates for
>> <emphasis>Alnus tenuifolia</emphasis> and
>> <emphasis>Alnus crispa</emphasis>
>> in six different successional habitats
>> </title>
>>
>> where the stylesheet treats title tags followed by emphasis tags with
>> italics. This certainly is a presentation issue, but one that imparts
>> meaning based on known conventions. Notice how the 15N2 also seems to
>> lose meaning in this title without appropriate formatting.
>>
>> Perhaps there is another way to deal with this, though? It seems too
>> big of a job to try to infer meaning from straight xs:string word
>> combinations (such as Alnus tenuifolia) and then present it correctly
>> with the right markup for presentation.
>>
>> On Mar 20, 2008, at Mar20---3:22:06 PM, inigo wrote:
>>
>>> Margaret O'Brien and myself with help of Mark Servilla, and to some
>>> extent J. Brunt and Corinna Gries worked on this minor fix. In it,
>>> we addressed the bug that Chris is talking about, yet the workaround
>>> that Chris is proposing does not fix the fact that there are
>>> DocBook 4.*
>>> Schema tags present in the documentation module of EML not declared
>>> in the text-module of EML. Examples are <url> and <citetitle>. By
>>> redefining the types, we address these errors partially, yet some
>>> stringent XML editors (the XML Spy 2007, 2008) will call on the
>>> existence of these undeclared tag, critical errors. This makes the
>>> schema
>>> rather unprofessional.
>>>
>>
>> On Mar 20, 2008, at Mar20---3:39:10 PM, James Brunt wrote:
>>
>>> Also, I'm in agreement with Inigo that making the schema "clean"
>>> should be a priority in this bug-fix release.
>>>
>>
>> Fair enough. Consistent and complete support for either DocBook 4.x
>> or DocBook 5.x throughout the EML schemas (in the eml-text module and
>> the documentation tags in every module) seems like a good goal, and
>> one that isn't particularly onerous. Likewise, an audit of the
>> documentation tags is in order to ensure completeness.
>>
>> Questions -
>>
>> Have the EML-2.0.2 proposed fixes stated in the "Community opinion on
>> minor revision of EML" post been implemented in a branch in the
>> Ecoinformatics EML repository? If so, are they tagged?
>>
>> Besides bug #s 2054 and 2073, have the other 11 bullets in this email
>> post been entered into the ecoinfo bugzilla?
>>
>> Cheers,
>> Chris
>> _________________________________________________________________
>> christopher jones cjones at msi.ucsb.edu (805) 680-5946
>> marine science institute university of california, santa barbara
>> _________________________________________________________________
>>
>>
>>
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
--
Mark Servilla, Ph.D.
LTER Network Office
Department of Biology
MSC 03 2020
1 University of New Mexico
Albuquerque, NM 87131-0001
servilla at lternet.edu
Office (505) 277-2619
Cell (505) 453-8593
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3249 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20080321/8bff6a84/smime-0001.bin
More information about the Eml-dev
mailing list