[eml-dev] [Bug 2512] - require text content in elements to be non-empty

bugzilla-daemon at ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Sat Nov 8 12:56:55 PST 2008


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2512





------- Comment #3 from mob at icess.ucsb.edu  2008-11-08 12:56 -------
We need to look at the effect on instance documents of switching all xs:string
to NonEmptyStringType. This type-switch will probably have a bigger effect on
the ability of authors to migrate their documents than the changes to the
document structure itself. Structure changes will be accomplished by the xsl
stylesheet, but retyping all strings means that content could now be required
where none previously existed. 

To start, I considered just the anonymous simple type elements that are 
required by EML and are type="xs:string". It seemed reasonable that if an
element was optional, that its content could also be optional.  In all, there
are 81 of these, which are generally easy to retype with a statement like:
sed -e  '/\<xs:element\ name/{
/minOccurs=\"0\"/!s/xs:string/res:NonEmptyStringType/
}
' 

There are other elements which could be examined and retyped manually, or would 
be caught by a general s/xs:string/res:NonEmptyStringType/ E.g., see <keyword> 
(eml-resource.xsd) -- a complexType/simpleContent, so the reference to 
xs:string occurs below the element declaration. Other elements (and many 
attributes) use xs:restriction base="xs:string" as the start of an enumeration 
list, but changing these to base="NonEmptyStringType" seems superfluous.

So to start, only one schema file, "eml-resource.xsd", has been checked into 
CVS, so that others can try out the effect of NonEmptyStringType while 
its scope is small. Particularly, I was thinking about Morpho. 7 element 
declarations occur in this file that were formerly of xs:string, and now are 
NonEmptyStringType. See the list below. I think that Morpho wizards deal with 
only title, references and keyword, although any are available in the tree 
editor. My local copy has all 81 (anonymous, simple) element declarations 
retyped (in 17 schema docs), plus the 5 anonymous attributes. I am testing a 
variety of EML201 documents from the LTER metacat against this schema as I 
convert them -- basically while I work on the XSL stylesheet.

title
distribution/connectionDefinition/parameterDefinition/name
distribution/connectionDefinition/parameterDefinition/description
distribution/connection/parameter/name
distribution/connection/parameter/description
distribution/offline/MediumName
references (multiple paths)
keyword (a named type)


More information about the Eml-dev mailing list