eml-physical Documentation

Module Documentation: eml-physical

The eml-physical module - Physical file format

The eml-physical module describes the external and internal physical characteristics of a data object as well as the information required for its distribution. Examples of the external physical characteristics of a data object would be the filename, size, compression, encoding methods, and authentication of a file or byte stream. Internal physical characteristics describe the format of the data object being described. Both named binary or otherwise proprietary formats can be cited (e.g., Microsoft Access 2000), or text formats can be precisely described (e.g., ASCII text delimited with commas). For these text formats, it also includes the information needed to parse the data object to extract the entity and its attributes from the data object. Distribution information describes how to retrieve the data object. The retrieval information can be either online (e.g., a URL or other connection information) or offline (e.g., a data object residing on an archival tape).

The eml-physical module, like other modules, may be "referenced" via the <references> tag. This allows a physical document to be described once, and then used as a reference in other locations within the EML document via its ID.

Module details

Recommended Usage:	Any data object that is being described by EML needs this information so the entities and attributes that reside with in the data object can be extracted.
Stand-alone:	yes
Imports:	eml-documentation, eml-literature, eml-resource, eml-access
Imported By:
View an image of the schema:	eml-physical image

Element Definitions:

physical

This element has no default value.

Content of this field:

Description of this field:

Type: PhysicalType

The content model for physical is a CHOICE between "references" and all of the elements that let you describe the internal/external characteristics and distribution of a data object (e.g., dataObject, dataFormat, distribution.) A physical element can contain a reference to an physical element defined elsewhere. Using a reference means that the referenced physical is identical, not just in name but identical in its complete description.

objectName

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The name of the data object. This is possibly distinct from the entity name in that one physical object can contain multiple entities, even though that is not a recommended practice. The objectName often is the filename of a file in a file system or that is accessible on the network.
Example(s):
rainfall-sev-2002-10.txt

size

This element has no default value.

Content of this field:

Description of this field:

Attributes:	Use:	Default Value:
unit	optional	byte

This element contains information of the physical size of the entity, by default represented in bytes unless the unit attribute is provided to change the units.
Example(s):
134

authentication

This element has no default value.

Content of this field:

Description of this field:

Attributes:	Use:	Default Value:
method	optional

This element describes authentication procedures or techniques, typically by giving a checksum value for the object. The method used to compute the authentication value (e.g., MD5) is listed in the method attribute.
Example(s):
f5b2177ea03aea73de12da81f896fe40

compressionMethod

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element lists a compression method used to compress the object, such as zip, compress, etc. Compression and encoding methods must be listed in the order in which they were applied, so that decompression and decoding should occur in the reverse order of the listing. For example, if a file is compressed using zip and then encoded using MIME base64, the compression method would be listed first and the encoding method second.
Example(s):
zip
gzip
compress

encodingMethod

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element lists a encoding method used to encode the object, such as base64, BinHex, etc. Compression and encoding methods must be listed in the order in which they were applied, so that decompression and decoding should occur in the reverse order of the listing. For example, if a file is compressed using zip and then encoded using MIME base64, the compression method would be listed first and the encoding method second.
Example(s):
base64
uuencode
binhex

characterEncoding

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element contains the name of the character encoding. This is typically ASCII or UTF-8, or one of the other common encodings.
Example(s):
UTF-8

dataFormat

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A choice of (
textFormat	required
OR
externallyDefinedFormat	required
OR
binaryRasterFormat	required
)

This element is the parent which is a CHOICE between four possible internal physical formats which describe the internal physical characteristics of the data object. Using this information the user should be able parse physical object to extract the entity and its attributes. Note that this is the format of the physical object itself.

textFormat

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
numHeaderLines	optional
numFooterLines	optional
recordDelimiter	optional	unbounded
physicalLineDelimiter	optional	unbounded
numPhysicalLinesPerRecord	optional
maxRecordLength	optional
attributeOrientation	required
A choice of (
simpleDelimited	required
OR
complex	required
)
)

Description of a text formatted object. The description includes detailed parsing instructions for extracting attributes from the bytestream for simple delimited file formats (e.g., CSV), fixed format files that use fixed columns for attribute locations, and mixtures of the two. It also supports records that span multiple lines.

numHeaderLines

This element has no default value.

Content of this field:

Description of this field:

Type: xs:int

Number of header lines preceding data. Lines are determined by the physicalLineDelimiter, or if it is absent, by the recordDelimiter. This value indicated the number of header lines that should be skipped before starting to parse the data.
Example(s):
4

numFooterLines

This element has no default value.

Content of this field:

Description of this field:

Type: xs:int

Number of footer lines following data. Lines are determined by the physicalLineDelimiter, or if it is absent, by the recordDelimiter. This value indicated the number of footer lines that should be skipped after parsing the data. If this value is omitted, parsers should assume the data continues to the end of the data stream.
Example(s):
4

recordDelimiter

This element has no default value.

Content of this field:

Description of this field:

Type: xs:string

This element specifies the record delimiter character when the format is text. The record delimiter is usually a linefeed (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two linefeed characters (\n\n). As record delimiters are often non-printing characters, one can use either the special value "\n" to represent a linefeed (ASCII 0x0a) and "\r" to represent a carriage return (ASCII 0x0d). Alternatively, one can use the hex value to represent character values (e.g., 0x0a).
Example(s):
\n\r

physicalLineDelimiter

This element has no default value.

Content of this field:

Description of this field:

Type: xs:string

This element specifies the physical line delimiter character when the format is text. The line delimiter is usually a linefeed (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two linefeed characters (\n\n). As line delimiters are often non-printing characters, one can use either the special value "\n" to represent a linefeed (ASCII 0x0a) and "\r" to represent a carriage return (ASCII 0x0d). Alternatively, one can use the hex value to represent character values (e.g., 0x0a). If this value is not provided, processors should assume that the physical line delimiter is the same as the record delimiter.
Example(s):
\n\r

numPhysicalLinesPerRecord

This element has no default value.

Content of this field:

Description of this field:

Type: xs:unsignedInt

A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, it is necessary to know the number of lines per record in order to correctly read them. If this value is not provided, processors should assume that records are wholly contained on one physical line. If the value is greater than 1, then processors should examine the lineNumber field for each attribute to determine which line of the record contains the information.
Example(s):
3

maxRecordLength

This element has no default value.

Content of this field:

Description of this field:

Type: xs:unsignedLong

The maximum number of characters in any record in the physical file. For delimited files, the record length varies and this is not particularly useful. However, for fixed format files that do not contain record delimiters, this field is critical to tell processors when one record stops and another begins.
Example(s):
597

attributeOrientation

This element has no default value.

Content of this field:

Description of this field:

Specifies whether the attributes described in the physical stream are found in columns or rows. The valid values are column or row. If set to 'column', then the attributes are in columns. If set to 'row', then the attributes are in rows. Row orientation is rare, but some systems such as SPlus and R utilize it. For example, some data with column orientation: DATE PLOT SPECIES 2002-01-15 hfr5 acer rubrum 2002-01-15 hfr5 acer xxxx The same data in a rowMajor table: DATE 2002-01-15 PLOT hfr5 SPECIES acer rubrum acer xxxx
Example(s):
column
row

Derived from: xs:string (by xs:restriction)

Allowed values:

column
row

simpleDelimited

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
fieldDelimiter	required	unbounded
collapseDelimiters	optional
quoteCharacter	optional	unbounded
literalCharacter	optional	unbounded
)

A simple delimited format that uses one of a series of delimiters to indicate the ends of fields in the data stream. More complex formats such as fixed format or mixed delimited and fixed formats can be described using the "complex" element.

fieldDelimiter

This element has no default value.

Content of this field:

Description of this field:

Type: xs:string

This element specifies a character to be used in the object for indicating the ending column for an attribute. The delimiter character itself is not part of the attribute value, but rather is present in the column following the last character of the value. Typical delimiter characters include commas, tabs, spaces, and semicolons. The only time the fieldDelimiter character is not interpreted as a delimiter is if it is contained in a quoted string (see quoteCharacter) or is immediately preceded by a literalCharacter. Non-printable quote characters can be provided as their hex values, and for tab characters by its ASCII string "\t". Processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
,
\t
0x09
0x20

collapseDelimiters

This element has no default value.

Content of this field:

Description of this field:

The collapseDelimiters element specifies whether sequential delimiters should be treated as a single delimiter or multiple delimiters. An example is when a space delimiter is used; often there may be several repeated spaces that should be treated as a single delimiter, but not always. The valid values are yes or no. If it is set to yes, then consecutive delimiters will be collapsed to one. If set to no or absent, then consecutive delimiters will be treated as separate delimiters. Default behaviour is no; hence, consecutive delimiters will be treated as separate delimiters, by default.
Example(s):
yes
no

Derived from: xs:string (by xs:restriction)

Allowed values:

quoteCharacter

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element specifies a character to be used in the object for quoting values so that field delimiters can be used within the value. This basically allows delimiter "escaping". The quoteChacter is typically a " or '. When a processor encounters a quote character, it should not interpret any following characters as a delimiter until a matching quote character has been encountered (i.e., quotes come in pairs). It is an error to not provide a closing quote before the record ends. Non-printable quote characters can be provided as their hex values.
Example(s):
"
'

literalCharacter

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element specifies a character to be used for escaping special character values so that they are treated as literal values. This allows "escaping" for special characters like quotes, commas, and spaces when they are intended to be used in an attribute value rather than being intended as a delimiter. The literalCharacter is typically a \.
Example(s):
\

complex

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A choice of (
textFixed	required
OR
textDelimited	required
)

A complex text format that can describe delimited fields, fixed width fields, and mixtures of the two. This supports multiline records (where one record is distributed across multiple physical lines). When using the complex format, the number of textFixed and textDelimited elements should exactly equal the number of attributes that have been described for the entity, and the order of the textFixed and textDelimited elements should correspond to the order of the attributes as described in the entity. Thus, for a delimited file with fourteen attributes, one should provide exactly fourteen textDelimited elements.

textFixed

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
fieldWidth	required
lineNumber	optional
fieldStartColumn	optional
)

Describes the physical format of data sequences that use a fixed number of characters in a specified position in the stream to locate attribute values. This method is common in sensor-derived data and in legacy database systems. To parse it, one must know the number of characters for each attribute and the starting column and line to begin reading the value.

fieldWidth

This element has no default value.

Content of this field:

Description of this field:

Type: xs:unsignedLong

Fixed width fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number.
Example(s):
7

lineNumber

This element has no default value.

Content of this field:

Description of this field:

Type: xs:unsignedLong

A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. The lineNumber should never greater that the number of physical lines per record.
Example(s):
3

fieldStartColumn

This element has no default value.

Content of this field:

Description of this field:

Type: xs:long

Fixed width fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number. If the starting column is not provided, processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
58

textDelimited

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
fieldDelimiter	required
collapseDelimiters	optional
lineNumber	optional
quoteCharacter	optional	unbounded
literalCharacter	optional	unbounded
)

Describes the physical format of data sequences that use delimiters in the stream to locate attribute values. This method is common in data exported from spreadsheets and database systems, To parse it, one must know the character that indicates the end of each attribute and the line to begin reading the value.

fieldDelimiter

This element has no default value.

Content of this field:

Description of this field:

Type: xs:string

This element specifies a character to be used in the object for indicating the ending column for an attribute. The delimiter character itself is not part of the attribute value, but rather is present in the column following the last character of the value. Typical delimiter characters include commas, tabs, spaces, and semicolons. The only time the fieldDelimiter character is not interpreted as a delimiter is if it is contained in a quoted string (see quoteCharacter) or is immediately preceded by a literalCharacter. Non-printable quote characters can be provided as their hex values, and for tab characters by its ASCII string "\t". Processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
,
\t
0x09
0x20

collapseDelimiters

This element has no default value.

Content of this field:

Description of this field:

The collapseDelimiters element specifies whether sequential delimiters should be treated as a single delimiter or multiple delimiters. An example is when a space delimiter is used; often there may be several repeated spaces that should be treated as a single delimiter, but not always. The valid values are yes or no. If it is set to yes, then consecutive delimiters will be collapsed to one. If set to no or absent, then consecutive delimiters will be treated as separate delimiters. Default behaviour is no; hence, consecutive delimiters will be treated as separate delimiters, by default.
Example(s):
yes
no

Derived from: xs:string (by xs:restriction)

Allowed values:

lineNumber

This element has no default value.

Content of this field:

Description of this field:

Type: xs:unsignedLong

A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. The lineNumber should never be greater that the number of physical lines per record. When parsing the first field on a physical line as a delimited field, they should assume that the field data starts in the first column. Otherwise, follow the rules indicated under fieldDelimiter.
Example(s):
3

quoteCharacter

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element specifies a character to be used in the object for quoting values so that field delimiters can be used within the value. This basically allows delimiter "escaping". The quoteChacter is typically a " or '. When a processor encounters a quote character, it should not interpret any following characters as a delimiter until a matching quote character has been encountered (i.e., quotes come in pairs). It is an error to not provide a closing quote before the record ends. Non-printable quote characters can be provided as their hex values.
Example(s):
"
'

literalCharacter

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

This element specifies a character to be used for escaping special character values so that they are treated as literal values. This allows "escaping" for special characters like quotes, commas, and spaces when they are intended to be used in an attribute value rather than being intended as a delimiter. The literalCharacter is typically a \.
Example(s):
\

externallyDefinedFormat

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
formatName	required
formatVersion	optional
citation	optional
)

Information about a non-text or proprietary formatted object. The description names the format explicitly, but assumes a processor implicitly knows how to parse that format to extract the data. A format version can be included. This is mainly used for proprietary formats, including binary files like Microsoft Excel and text formats like ESRI's ArcInfo export format. This is not a recommended way to permanently archive data because the software to parse the format is unlikely to be available over extended periods, but is included to allow for commonly used physical formats.

formatName

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

Name of the format of the data object
Example(s):
Microsoft Excel

formatVersion

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

Version of the format of the data object
Example(s):
2000 (9.0.2720)

citation

This element has no default value.

Content of this field:

Description of this field:

Type: cit:CitationType

Citation providing more detail about the physical format, including parsing information or information about the software required for reading the object.

binaryRasterFormat

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
rowColumnOrientation	required
multiBand	optional
nbits	required
byteorder	required
skipbytes	optional
bandrowbytes	optional
totalrowbytes	optional
bandgapbytes	optional
)

The binaryRasterInfo element is a container for various parameters used to described the contents of binary raster image files. In this case, it is based on a white paper on the ESRI site that describes the header information used for BIP and BIL files ("Extendable Image Formats for ArcView GIS 3.1 and 3.2").

rowColumnOrientation

This element has no default value.

Content of this field:

Description of this field:

Specifies whether the data should be read across rows or down columns. The valid values are column or row. If set to 'column', then the data are read down columns. If set to 'row', then the data are read across rows.
Example(s):
column
row

Derived from: xs:string (by xs:restriction)

Allowed values:

column
row

multiBand

This element has no default value.

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
nbands	required
layout	required
)

Information needed to properly interpret a multiband image.

nbands

This element has no default value.

Content of this field:

Description of this field:

Type: xs:int

The number of spectral bands in the image. Must be greater than 1.
Example(s):
2

layout

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The organization of the bands in the image file. Acceptable values are bil - Band interleaved by line. bip - Band interleaved by pixel. bsq - Band sequential.
Example(s):
bil
bip
bsq

nbits

This element has no default value.

Content of this field:

Description of this field:

Type: xs:int

The number of bits per pixel per band. Acceptable values are typically 1, 4, 8, 16, and 32. The default value is eight bits per pixel per band. For a true color image with three bands (R, G, B) stored using eight bits for each pixel in each band, nbits equals eight and nbands equals three, for a total of twenty-four bits per pixel.
Example(s):
8

byteorder

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The byte order in which values are stored. The byte order is important for sixteen-bit and higher images, that have two or more bytes per pixel. Acceptable values are little-endian (common on Intel systems like PCs) and big-endian (common on Motorola platforms).
Example(s):
little-endian
big-endian

skipbytes

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The number of bytes of data in the image file to skip in order to reach the start of the image data. This keyword allows you to bypass any existing image header information in the file. The default value is zero bytes.
Example(s):
0

bandrowbytes

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The number of bytes per band per row. This must be an integer. This keyword is used only with BIL files when there are extra bits at the end of each band within a row that must be skipped.
Example(s):
3

totalrowbytes

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The total number of bytes of data per row. Use totalrowbytes when there are extra trailing bits at the end of each row.
Example(s):
8

bandgapbytes

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The number of bytes between bands in a BSQ format image. The default is zero.
Example(s):
1

distribution

This element has no default value.

Content of this field:

Description of this field:

Type: PhysicalDistributionType

This element provides information on how the resource is distributed. Connections to online systems can be described as URLs or as a list of connection parameters. Please see the Type definition for complete information.

online

This element has no default value.

Content of this field:

Description of this field:

Type: PhysicalOnlineType

Information for a resource that is distributed online. Please see the Type definition for complete information.

offline

This element has no default value.

Content of this field:

Description of this field:

Type: res:OfflineType

Information for a resource that is distributed offline. Please see the Type definition for complete information.

inline

This element has no default value.

Content of this field:

Description of this field:

Type: res:InlineType

Information for a resource that is distributed inline, i.e., along with the metadata. Please see the Type definition for complete information.

access

This element has no default value.

Content of this field:

Description of this field:

Type: acc:AccessType

When this element occurs in a distribution module, it controls access only to the resource being described by the same distribution parent. Please see the Type definition for complete information on constructing an access tree.

onlineDescription

This element has no default value.

Content of this field:

Description of this field:

Type: res:NonEmptyStringType

The onlineDescription element can hold a brief description of the content of the online element's online|offline|inline child. This description element could supply content for an html anchor tag.

url

This element has no default value.

Content of this field:

Description of this field:

Type: res:UrlType

The URL of the resource that is available online. Please see the Type definition for complete information.

connection

This element has no default value.

Content of this field:

Description of this field:

Type: res:ConnectionType

A connection to a resource that is available online. Please see the Type definition for complete information.

Attribute Definitions:

unit

Use: optional

Default value: byte

This element gives the unit of measurement for the size of the entity, and is by default a byte.
Example(s):
byte

method

Type: xs:string

Use: optional

This element names the method used to calculate and authentication checksum that can be used to validate a bytestream. Typical checksum methods include MD5 and CRC.
Example(s):
MD5

Type: res:IDType

Use: optional

system

Type: res:SystemType

Use: optional

scope

Type: res:ScopeType

Use: optional

Default value: document

Type: res:IDType

Use: optional

system

Type: res:SystemType

Use: optional

scope

Type: res:ScopeType

Use: optional

Default value: document

Complex Type Definitions:

PhysicalType

Content of this field:

Description of this field:

Elements:	Use:	How many:
A choice of (
A sequence of (
objectName	required
size	optional
authentication	optional	unbounded
A choice of (
compressionMethod	required
OR
encodingMethod	required
)
characterEncoding	optional
dataFormat	required
distribution	optional	unbounded
)
OR
res:ReferencesGroup
)
Attributes:	Use:	Default Value:
id	optional
system	optional
scope	optional	document

The eml-physical module describes the physical characteristics of a data object and the information required for its distribution. External physical characteristics include the filename, size, compression, encoding methods, and authentication of a file or byte stream. Internal physical characteristics describe the format of the data object. Proprietary formats can be cited (e.g., Microsoft Access 2000), or text formats can be precisely described (e.g., ASCII text delimited with commas). The module includes the information needed to parse the text data object to extract the entity and its attributes. Distribution information describes how to retrieve the data object, either as online (a URL or connection definition), offline (e.g., a data object residing on an archival tape), or inline (i.e., the data are included with the metadata).

Like many other EML elements, a physical Type can contain a reference to another physical element defined elsewhere in the document instead of a description of the resource. Using a reference means that the referenced physical is identical, not just in name but identical in its complete description.

PhysicalDistributionType

Content of this field:

Description of this field:

Elements:	Use:	How many:
A choice of (
A sequence of (
A choice of (
online	required
OR
offline	required
OR
inline	required
)
access	optional
)
OR
res:ReferencesGroup
)
Attributes:	Use:	Default Value:
id	optional
system	optional
scope	optional	document

The PhysicalDistributionType contains the information required for retrieving the resource.

It differs from the res:DistributionType :

Generally, the PhysicalDisribtutionType is intended for download whereas the Type at the resource level is intended primarily for information.

The phys:PhysicalDistributionType includes an optional access tree which can be used to override access rules applied at the resource level. Access for the documents included entities can then be managed individually.

Also see individual sub elements for more information.

PhysicalOnlineType

Content of this field:

Description of this field:

Elements:	Use:	How many:
A sequence of (
onlineDescription	optional
A choice of (
url	required
OR
connection	required
)
)

Distribution information for accessing the resource online, represented either as a URL or as the series of named parameters needed to connect. The URL field can contain a simple web address or an entire query string. The connection element allows the components of a complex protocol to be described individually.

The PhysicalOnlineType differs from the res:OnlineType in that this type only allows a connectionDefinition to appear as the child of a connection. In other words, in a PhysicalOnlineType, the connectionDefinition cannot be abstracted, and must be included as part of an actual connection.

Simple Type Definitions:

Group Definitions:

Web Contact: jones@nceas.ucsb.edu