About projectDB Schema


Table of Contents

1. Introduction
2. Features and differences from EML 2
2.1. Root-level element is <researchProject>
2.2. Import the resource group
2.3. New nodes added
3. Module Overview
3.1. Introduction to projectDB modules and their use
3.2. Root-level structure
3.2.1. The lter-project module - Research context for resources
3.3. Modules Used
3.3.1. The eml-resource module - Base information for all resources
3.3.2. The eml-physical module - Physical file format
3.3.3. The eml-party module - People and organization information
3.3.4. The eml-coverage module - Geographic, temporal, and taxonomic extents of resources
3.3.5. The eml-literature module - Citation specific information
3.3.6. The eml-access module - Access control rules for resources
4. Module Descriptions (Normative)
4.1. lter-project
4.2. eml-access
4.3. eml-coverage
4.4. eml-literature
4.5. eml-party
4.6. eml-physical
4.7. eml-resource
4.8. eml-text
4.9. eml-unitTypeDefinitions
Index

Chapter 1. Introduction

The schema for the projectDB is closely based on the EML project module. The EML development community is aware of this use, and we plan to recommend a set of changes to the EML schema in the course of this work.

Chapter 2. Features and differences from EML 2

The projectDB schema incorporates most of the important features of EML2. We have imported the EML2.1 series of schema docs, in anticipation of it's release early in 2009. Some changes have been made to the project schema for projectDB, and so it differs from EML 2's project module in these ways:

2.1. Root-level element is <researchProject>

The root-level element is <researchProject> instead of <eml>.

  • Documents written against this schema may use any namespace prefix, but authors creating documents for LTER applications should use the prefix "lter".
  • At some future time, the root level element may become <eml:eml>, and the researchProject elevated to it's first child. This structure would be analogous to the dataset, citation, software, or protocol modules in EML 2.

2.2. Import the resource group

The project schema uses the resource group, as do other top-level EML elements.

2.3. New nodes added

Four new nodes were added to the eml-project schema to accommodate use cases. All are optional and repeatable. One new node was added to the resource module.

  1. <reporting>: to contain information about reporting needs. This node is generic, with elements for the name of a report section and a value (text), and attributes desribing the report's recipient and a date.
  2. <permissions>: to contain information about project management. Like reporting, this node is generic, elements for a permissions category and a value (text), and attributes desribing the premission grantor and a date.
  3. <associatedMaterial>: to contain distribution info about an associated resource of the project, such as a dataset or publication
  4. <associatedProject>: to contain the name and relationship of a project associated with the project being descibed. The relationship is limited to "parent", since this is the only information required to build geneological relationships. In the EML 2.1 project schema module, related projects can be nested. This may be appropirate for a datset or citation, but for this use, a nested structure might result in complex branching documents in which relationships were difficult to follow. A structure more similar to "triples" results in simpler instance documents and is simpler to implement.
  5. <ongoing>: the temporal coverage node (in the resource group) can accomodate projects without an end date with the new "ongoing" element. This element has an attribute "asOfDate" to assist in tracking the element's maintenance history. EML currently has no mechanism for describing "ongoing" resources.

Chapter 3. Module Overview

3.1. Introduction to projectDB modules and their use

The following section briefly describes each EML module used by the projectDB.

3.2. Root-level structure

3.2.1. The lter-project module - Research context for resources

This module is based on the eml-project module. It describes a project which might form a context for research. The definition of a "project" is not constrained. It may include scientific research investigations or student-led thesis projects, working groups of limited scope, or cyberinfrastructure coding projects. It is intended to house information on how a project was created, including descriptions of motivations and goals, funding, personnel, description of the study area or study design. This module also has descriptors for associated resources and projects, and for material intended for (or derived from) reports, and related to project management issues such as approval or permission.

Unlike the eml-project module, instance documents written against this schema are standalone documents; the lter:researchProject element is a top level element. It includes the "resource group" which includes basic elements for title, abstract, responsible parties, coverage, keywords, etc, as do other top-level EML resources.

Since it is derived from the EML-2.1.0 family of schemas, most namespaces were left unchanged. If an imported schema was edited to provide functionality for the lter-project schema, its namespace was updated to include the term 'lter-' to distinguish it from the namespaces used by EML.

3.3. Modules Used

The following modules are used

3.3.1. The eml-resource module - Base information for all resources

The eml-resource module contains general information that describes dataset resources, literature resources, protocol resources, and software resources. Each of the above four types of resources share a common set of information, but also have information that is unique to that particular resource type. Each resource type uses the eml-resource module to document the information common to all resources, but then extend eml-resource with modules that are specific to that particular resource type. For instance, all resources have creators, titles, and perhaps keywords, but only the dataset resource would have a "data table" within it. Likewise, a literature resource may have an "ISBN" number associated with it, whereas the other resource types would not.

The eml-resource module is exclusively used by other modules, and is therefore not a stand-alone module.

3.3.2. The eml-physical module - Physical file format

The eml-physical module describes the external and internal physical characteristics of a data object as well as the information required for its distribution. Examples of the external physical characteristics of a data object would be the filename, size, compression, encoding methods, and authentication of a file or byte stream. Internal physical characteristics describe the format of the data object being described. Both named binary or otherwise proprietary formats can be cited (e.g., Microsoft Access 2000), or text formats can be precisely described (e.g., ASCII text delimited with commas). For these text formats, it also includes the information needed to parse the data object to extract the entity and its attributes from the data object. Distribution information describes how to retrieve the data object. The retrieval information can be either online (e.g., a URL or other connection information) or offline (e.g., a data object residing on an archival tape).

The eml-physical module, like other modules, may be "referenced" via the <references> tag. This allows a physical document to be described once, and then used as a reference in other locations within the EML document via its ID.

3.3.3. The eml-party module - People and organization information

The eml-party module describes a responsible party (person or organization), and is typically used to name the originator of a resource or metadata document. It contains detailed contact information for the party, be it an individual person, an organization, or a named position within an organization. The eml-party module is used throughout the other EML modules where detailed contact information is needed.

The eml-party module, like other modules, may be "referenced" via the <references> tag. This allows a party to be described once, and then used as a reference in other locations within the EML document via its ID.

3.3.4. The eml-coverage module - Geographic, temporal, and taxonomic extents of resources

The eml-coverage module contains fields for describing the coverage of a resource in terms of time, space, and taxonomy. These coverages (temporal, spatial, and taxonomic) represent the extent of applicability of the resource in those domains. The Geographic coverage section allows for 2 means of expressing coverage on the surface of the earth: 1) via a set of bounding coordinates that define the North, South, East and West points in a rectangular area, optionally including a bounding altitude, and 2) using a G-Ring polygon definition, where an irregularly shaped area may be defined using a ordered list of latitude/longitude coordinates. A G-Ring may also include an "inner G-Ring" that defines one or more "cut-outs" in the area, i.e. the donut hole concept.

The temporal coverage section allows for the definition of either a single date/time, or a range of dates/times. These date/times may be expressed as a calendar date according to the ISO 8601 Date and Time Specification, or or by using an alternate time scale, such as the geologic time scale. In order to express an "ongoing" time frame, the end date in the range would likely use the alternate time scale fields with a value of "ongoing", whereas the begin date would use the specific calendar date fields.

The taxonomic coverage section allows for detailed description of the taxonomic extent of the dataset or resource. The taxonomic classification consists of a recursive set of taxon rank names, their values, and their common names. This construct allows for a taxonomic hierarchy to be built to show the level of identification (e.g. Rank Name = Kingdom, Rank Value = Animalia, Common Name = Animals, and so on down the hierarchy.) The taxonomic coverage module also allows for the definition of the classification system in cases where alternative systems are used.

The eml-coverage module, like other modules, may be "referenced" via the <references> tag. This allows the coverage extent to be described once, and then used as a reference in other locations within the EML document via its ID.

3.3.5. The eml-literature module - Citation specific information

The eml-literature module contains information that describes literature resources. It is intended to provide overview information about the literature citation, including title, abstract, keywords, and contacts. Citation types follow the conventions laid out by EndNote, and there is an attempt to represent a compatible subset of the EndNote citation types. These citation types include: article, book, chapter, edited book, manuscript, report, thesis, conference proceedings, personal communication, map, generic, audio visual, and presentation. The "generic" citation type would be used when one of the other types will not work.

The eml-literature module, like other modules, may be "referenced" via the <references> tag. This allows a citation to be described once, and then used as a reference in other locations within the EML document via its ID.

3.3.6. The eml-access module - Access control rules for resources

The eml-access module describes the level of access that is to be allowed or denied to a resource for a particular user or group of users, and can be described independently for metadata and data. The eml-access module uses a reference to a particular authentication system to determine the set of principals (users or groups) that can be specified in the access rules. The special principal 'public' can be used to indicate that any user or group has access permission, thereby making it easier to specify that anonymous access is allowed.

There are two mechanisms for including access control via the eml-access module: 1) Each top-level resource module (eml-dataset, eml-literature, eml-software, and eml-protocol) may be accompanied by an optional <access> element that is used to establish the default access control at the resource level for the entire EML package. If this access element is ommitted from the document, then the package submitter should be given full access to the package but all other users should be denied all access. To allow the package to be publicly viewable, the EML author must explicitly include a rule stating so. 2) Exceptions for particular entity-level components of the package can be controlled at a finer grain by using an access description in that entity's physical/distribution tree. When access control rules are specified at this level, they apply only to the data in the parent distribution element, and not to the metadata. Thus, it will control access to the content of the <inline> element, as well as resources that are referenced by the <online/url> and <online/connection> elements. These exceptions to access for particular data resources are applied after the default access rules at the package-level have been applied, so they effectively override the default rules when they overlap.

In previous versions of EML access rules for entity-level distribution were contained in <additionalMetadata> sections and referenced via the <describes> tag. Although in theory these could have referenced any node, in application such node-level access control is problematic. Since the most common uses of access control rules were to limit access to specific data entities, the access tree has been placed there explicitly in EML 2.1.0.

Access is specified with a choice of child elements, either <allow> or <deny>. Within these rules, values can be assigned for each <principal> using the <permission> element. Users given "read" permission can view the resource; "write" allows changes to the resource excluding changes to the access rules; "changePermission" includes "write" plus the changing of access rules. Users allowed "all" permissions; may do all of the above.

An example is given below, with non-critical sections deleted:

  <eml>
      <access 
          authSystem="ldap://ldap.ecoinformatics.org:389/dc=ecoinformatics,dc=org" 
          order="allowFirst">
        <allow>
          <principal>uid=alice,o=NASA,dc=ecoinformatics,dc=org</principal>
          <permission>read</permission>
          <permission>write</permission>
        <allow>
      </access>
      <dataset>
      ...
      ...
      <dataTable id="entity123">
      ...
        <physical>
        ...
          <distribution>
          ...
            <access 
            authSystem="ldap://ldap.ecoinformatics.org:389/dc=ecoinformatics,dc=org" 
            order="allowFirst">
              <deny>
                <principal>uid=alice,o=NASA,dc=ecoinformatics,dc=org</principal>
                <permission>write</permission>
            </deny>
          </access>
         </distribution>
       </physical>
      </dataTable>
      <dataTable id="entity234">
        ...
        <physical>
        ...
          <distribution>
            ...
            <access>
              <references>entity123</references>
            </access>
          </distribution>
        </physical>
      </dataTable>
      ...    
    </dataset>
  <eml>
In this example, the overall default access is to allow the user=alice (but no one else) to read and write all metadata and data. However, under "entity123" and "entity234", there is an additional rule saying that user=alice does not have write permission. The net effect is that Alice can read and make changes to the metadata, but cannot make changes to the two data entities. In addition, Alice cannot change these access rules; although the submitter can.

This example also shows how the eml-access module, like other modules, may be "referenced" via the <references> tag. This allows an access control document to be described once, and then used as a reference in other locations within the EML document via its ID.

In summary, access rules can be applied in two places in an eml document. Default access rules are established in the top <access> element for the main eml resource (e.g., "/eml/access"). These default rules can be overridden for particular data entities by adding additional <access> elements in the physical/distribution trees of those entities.

Chapter 4. Module Descriptions (Normative)

4.1. lter-project

Normative technical docs for lter-project

4.2. eml-access

Normative technical docs for eml-access

4.3. eml-coverage

Normative technical docs for eml-coverage

4.4. eml-literature

Normative technical docs for eml-literature

4.5. eml-party

Normative technical docs for eml-party

4.6. eml-physical

Normative technical docs for eml-physical

4.7. eml-resource

Normative technical docs for eml-resource

4.8. eml-text

Normative technical docs for eml-text

4.9. eml-unitTypeDefinitions

Normative technical docs for eml-unitTypeDefinitions

Index

A
associatedMaterial-lter-project
associatedMaterial-lter-project
associatedMaterial-lter-project
associatedMaterial-lter-project
associatedProject-lter-project
access-eml-access
allow-eml-access
alternativeTimeScale-eml-coverage
altitudeMaximum-eml-coverage
altitudeMinimum-eml-coverage
altitudeUnits-eml-coverage
article-eml-literature
audioVisual-eml-literature
address-eml-party
administrativeArea-eml-party
access-eml-physical
attributeOrientation-eml-physical
authentication-eml-physical
abstract-eml-resource
additionalInfo-eml-resource
alternateIdentifier-eml-resource
associatedParty-eml-resource
B
beginDate-eml-coverage
beginDate-eml-coverage
boundingCoordinates-eml-coverage
boundingAltitudes-eml-coverage
book-eml-literature
bookTitle-eml-literature
binaryRasterFormat-eml-physical
byteorder-eml-physical
bandrowbytes-eml-physical
bandgapbytes-eml-physical
C
citation-lter-project
citation-lter-project
coverage-lter-project
citation-lter-project
categoryTitle-lter-project
categoryValue-lter-project
calendarDate-eml-coverage
classificationSystem-eml-coverage
classificationSystemCitation-eml-coverage
classificationSystemModifications-eml-coverage
commonName-eml-coverage
citation-eml-literature
contact-eml-literature
chapter-eml-literature
conferenceProceedings-eml-literature
chapterNumber-eml-literature
conferenceName-eml-literature
conferenceDate-eml-literature
conferenceLocation-eml-literature
communicationType-eml-literature
conferenceName-eml-literature
conferenceDate-eml-literature
conferenceLocation-eml-literature
city-eml-party
country-eml-party
compressionMethod-eml-physical
characterEncoding-eml-physical
collapseDelimiters-eml-physical
complex-eml-physical
collapseDelimiters-eml-physical
citation-eml-physical
connection-eml-physical
creator-eml-resource
coverage-eml-resource
connection-eml-resource
connectionDefinition-eml-resource
connectionDefinition-eml-resource
citetitle-eml-text
D
descriptor-lter-project
descriptorValue-lter-project
designDescription-lter-project
description-lter-project
distribution-lter-project
description-lter-project
deny-eml-access
datasetGPolygon-eml-coverage
datasetGPolygonOuterGRing-eml-coverage
datasetGPolygonExclusionGRing-eml-coverage
degree-eml-literature
deliveryPoint-eml-party
dataFormat-eml-physical
distribution-eml-physical
distribution-eml-resource
description-eml-resource
definition-eml-resource
defaultValue-eml-resource
E
endDate-eml-coverage
eastBoundingCoordinate-eml-coverage
editedBook-eml-literature
edition-eml-literature
editor-eml-literature
edition-eml-literature
edition-eml-literature
electronicMailAddress-eml-party
encodingMethod-eml-physical
externallyDefinedFormat-eml-physical
emphasis-eml-text
F
funding-lter-project
fieldDelimiter-eml-physical
fieldWidth-eml-physical
fieldStartColumn-eml-physical
fieldDelimiter-eml-physical
formatName-eml-physical
formatVersion-eml-physical
G
geographicCoverage-eml-coverage
geographicDescription-eml-coverage
gRingPoint-eml-coverage
gRing-eml-coverage
gRingPoint-eml-coverage
gRing-eml-coverage
gRingLatitude-eml-coverage
gRingLongitude-eml-coverage
generalTaxonomicCoverage-eml-coverage
generic-eml-literature
geographicCoverage-eml-literature
givenName-eml-party
H
I
identificationReference-eml-coverage
identifierName-eml-coverage
issue-eml-literature
institution-eml-literature
institution-eml-literature
individualName-eml-party
inline-eml-physical
intellectualRights-eml-resource
inline-eml-resource
itemizedlist-eml-text
itemizedlist-eml-text
J
journal-eml-literature
k
keywordSet-eml-resource
keyword-eml-resource
keywordThesaurus-eml-resource
L
literalCharacter-eml-physical
lineNumber-eml-physical
lineNumber-eml-physical
literalCharacter-eml-physical
layout-eml-physical
language-eml-resource
literalLayout-eml-text
listitem-eml-text
M
manuscript-eml-literature
map-eml-literature
maxRecordLength-eml-physical
multiBand-eml-physical
metadataProvider-eml-resource
mediumName-eml-resource
mediumDensity-eml-resource
mediumDensityUnits-eml-resource
mediumVolume-eml-resource
mediumFormat-eml-resource
mediumNote-eml-resource
N
northBoundingCoordinate-eml-coverage
numberOfVolumes-eml-literature
numberOfVolumes-eml-literature
numHeaderLines-eml-physical
numFooterLines-eml-physical
numPhysicalLinesPerRecord-eml-physical
nbands-eml-physical
nbits-eml-physical
name-eml-resource
name-eml-resource
O
ongoing-eml-coverage
originator-eml-coverage
originalPublication-eml-literature
organizationName-eml-party
onlineUrl-eml-party
objectName-eml-physical
online-eml-physical
offline-eml-physical
onlineDescription-eml-physical
online-eml-resource
offline-eml-resource
onlineDescription-eml-resource
orderedlist-eml-text
orderedlist-eml-text
P
permissions-lter-project
permissionCategory-lter-project
principal-eml-access
permission-eml-access
personalCommunication-eml-literature
presentation-eml-literature
pageRange-eml-literature
publisher-eml-literature
publicationPlace-eml-literature
publisher-eml-literature
publicationPlace-eml-literature
pageRange-eml-literature
publisher-eml-literature
publicationPlace-eml-literature
publisher-eml-literature
publicationPlace-eml-literature
publisher-eml-literature
publisher-eml-literature
publicationPlace-eml-literature
performer-eml-literature
publisher-eml-literature
publicationPlace-eml-literature
positionName-eml-party
phone-eml-party
postalCode-eml-party
party-eml-party
physical-eml-physical
physicalLineDelimiter-eml-physical
pubDate-eml-resource
parameterDefinition-eml-resource
parameter-eml-resource
para-eml-text
para-eml-text
para-eml-text
Q
quoteCharacter-eml-physical
quoteCharacter-eml-physical
researchProject-lter-project
reporting-lter-project
reportSection-lter-project
rangeOfDates-eml-coverage
repository-eml-coverage
report-eml-literature
reportNumber-eml-literature
recipient-eml-literature
referenceType-eml-literature
reprintEdition-eml-literature
reviewedItem-eml-literature
recordDelimiter-eml-physical
rowColumnOrientation-eml-physical
role-eml-resource
references-eml-resource
S
studyAreaDescription-lter-project
sectionTitle-lter-project
sectionValue-lter-project
singleDateTime-eml-coverage
southBoundingCoordinate-eml-coverage
specimen-eml-coverage
scale-eml-literature
salutation-eml-party
surName-eml-party
size-eml-physical
simpleDelimited-eml-physical
skipbytes-eml-physical
shortName-eml-resource
series-eml-resource
schemeName-eml-resource
section-eml-text
subscript-eml-text
superscript-eml-text
section-eml-text
subscript-eml-text
superscript-eml-text
T
temporalCoverage-lter-project
temporalCoverage-eml-coverage
taxonomicCoverage-eml-coverage
time-eml-coverage
timeScaleName-eml-coverage
timeScaleAgeEstimate-eml-coverage
timeScaleAgeUncertainty-eml-coverage
timeScaleAgeExplanation-eml-coverage
timeScaleCitation-eml-coverage
taxonomicSystem-eml-coverage
taxonomicProcedures-eml-coverage
taxonomicCompleteness-eml-coverage
taxonomicClassification-eml-coverage
taxonRankName-eml-coverage
taxonRankValue-eml-coverage
taxonomicClassification-eml-coverage
thesis-eml-literature
totalPages-eml-literature
totalFigures-eml-literature
totalTables-eml-literature
totalPages-eml-literature
totalPages-eml-literature
totalPages-eml-literature
totalFigures-eml-literature
totalTables-eml-literature
totalPages-eml-literature
textFormat-eml-physical
textFixed-eml-physical
textDelimited-eml-physical
totalrowbytes-eml-physical
title-eml-resource
temporalCoverage-eml-resource
text-eml-text
title-eml-text
U
userId-eml-party
url-eml-physical
url-eml-resource
ulink-eml-text
V
vouchers-eml-coverage
volume-eml-literature
volume-eml-literature
volume-eml-literature
value-eml-resource
W
westBoundingCoordinate-eml-coverage
X
Y
Z