Module Documentation: eml-physical
Back to EML Contents
The eml-physical module - Physical file format

The eml-physical module describes the external and internal physical characteristics of a data object as well as the information required for its distribution. Examples of the external physical characteristics of a data object would be the filename, size, compression, encoding methods, and authentication of a file or byte stream. Internal physical characteristics describe the format of the data object being described. Both named binary or otherwise proprietary formats can be cited (e.g., Microsoft Access 2000), or text formats can be precisely described (e.g., ASCII text delimited with commas). For these text formats, it also includes the information needed to parse the data object to extract the entity and its attributes from the data object. Distribution information describes how to retrieve the data object. The retrieval information can be either online (e.g., a URL or other connection information) or offline (e.g., a data object residing on an archival tape).

The eml-physical module, like other modules, may be "referenced" via the <references> tag. This allows a physical document to be described once, and then used as a reference in other locations within the EML document via its ID.

Module details
Recommended Usage: Any data object that is being desribed by EML needs this information so the entities and attributes that reside with in the data object can be extracted.
Stand-alone: yes
Imports: eml-documentation, eml-literature, eml-resource, eml-access
Imported By:
View an image of the schema: eml-physical image

Element Definitions:

physical  This element has no default value.
Content of this field: Description of this field:
Type: PhysicalType
The content model for physical is a CHOICE between "references" and all of the elements that let you describe the internal/external characteristics and distribution of a data object (e.g., dataObject, dataFormat, distribution.) A physical element can contain a reference to an physical element defined elsewhere. Using a reference means that the referenced physical is identical, not just in name but identical in its complete description.
objectName  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The name of the data object. This is possibly distinct from the entity name in that one physical object can contain multiple entities, even though that is not a recommended practice. The objectName often is the filename of a file in a filesytem or that is accessible on the network.
Example(s):
rainfall-sev-2002-10.txt
size  This element has no default value.
Content of this field: Description of this field:
Attributes: Use: Default Value:
unit optional byte
This element contains information of the physical size of the entity, by default represented in bytes unless the unit attribute is provided to change the units.
Example(s):
134
authentication  This element has no default value.
Content of this field: Description of this field:
Attributes: Use: Default Value:
method optional
This element describes authentication procedures or techniques, typically by giving a checksum value for the onject. The method used to compute the authentication value (e.g., MD5) is listed in the method attribute.
Example(s):
f5b2177ea03aea73de12da81f896fe40
compressionMethod  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element lists a compression method used to compress the object, such as zip, compress, etc. Compression and encoding methods must be listed in the order in which they were applied, so that decompression and deencoding should occur in the reverse order of the listing. For example, if a file is compressed using zip and then encoded using MIME base64, the compression method would be listed first and the encoding method second.
Example(s):
zip
gzip
compress
encodingMethod  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element lists a encoding method used to encode the object, such as base64, binhex, etc. Compression and encoding methods must be listed in the order in which they were applied, so that decompression and deencoding should occur in the reverse order of the listing. For example, if a file is compressed using zip and then encoded using MIME base64, the compression method would be listed first and the encoding method second.
Example(s):
base64
uuencode
binhex
characterEncoding  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element contains the name of the character encoding. This is typically ASCII or UTF-8, or one of the other common encodings.
Example(s):
UTF-8
dataFormat  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
textFormat required
OR
externallyDefinedFormat required
OR
binaryRasterFormat required
)
This element is the parent which is a CHOICE between four possible internal physical formats which describe the internal physical characteristics of the data object. Using this information the user should be able parse physical object to extract the entity and its attributes. Note that this is the format of the physical object itself.
textFormat  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
numHeaderLines optional
numFooterLines optional
recordDelimiter optional unbounded
physicalLineDelimiter optional unbounded
numPhysicalLinesPerRecord optional
maxRecordLength optional
attributeOrientation required
A choice of (
simpleDelimited required
OR
complex required
)
)
Description of a text formatted object. The description includes detailed parsing instructions for extracting attributes from the bytestream for simple delimited file formats (e.g., CSV), fixed format files that use fixed columns for attribute locations, and mixtures of the two. It also supports records that span multiple lines.
numHeaderLines  This element has no default value.
Content of this field: Description of this field:
Type: xs:int
Number of header lines preceding data. Lines are determined by the physicalLineDelimiter, or if it is absent, by the recordDelimiter. This value indicated the number of header lines that should be skipped before starting to parse the data.
Example(s):
4
numFooterLines  This element has no default value.
Content of this field: Description of this field:
Type: xs:int
Number of footer lines following data. Lines are determined by the physicalLineDelimiter, or if it is absent, by the recordDelimiter. This value indicated the number of footer lines that should be skipped after parsing the data. If this value is omitted, parsers should assume the data continues to the end of the data stream.
Example(s):
4
recordDelimiter  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies the record delimiter character when the format is text. The record delimiter is usually a linefeed (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two linefeed characters (\n\n). As record delimeters are often non-printing characters, one can use either the special value "\n" to represent a linefeed (ASCII 0x0a) and "\r" to represent a carriage return (ASCII 0x0d). Alternatively, one can use the hex value to represent character values (e.g., 0x0a).
Example(s):
\n\r
physicalLineDelimiter  This element has no default value.
Content of this field: Description of this field:
This element specifies the physical line delimiter character when the format is text. The line delimiter is usually a linefeed (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two linefeed characters (\n\n). As line delimeters are often non-printing characters, one can use either the special value "\n" to represent a linefeed (ASCII 0x0a) and "\r" to represent a carriage return (ASCII 0x0d). Alternatively, one can use the hex value to represent character values (e.g., 0x0a). If this value is not provided, prcessors should assume that the physical line delimiter is the same as the record delimiter.
Example(s):
\n\r
numPhysicalLinesPerRecord  This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedInt
A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, it is necessary to know the number of lines per record in order to correctly read them. If this value is not provided, processors should assume that records are wholly contained on one physical line. If the value is greater than 1, then processers should examine the lineNumber field for each attribute to determine which line of the record contains the information.
Example(s):
3
maxRecordLength  This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
The maximum number of chanracters in any record in the physical file. For delimited files, the record length varies and this is not particularly useful. However, for fixed format files that do not contain record delimiters, this field is critical to tell processors when one record stops and another begins.
Example(s):
597
attributeOrientation  This element has no default value.
Content of this field: Description of this field:
Specifies whether the attributes described in the physical stream are found in columns or rows. The valid values are column or row. If set to 'column', then the attributes are in columns. If set to 'row', then the attributes are in rows. Row orientation is rare, but some systems such as Splus and R utilize it. For example, some data with column orientation: DATE PLOT SPECIES 2002-01-15 hfr5 acer rubrum 2002-01-15 hfr5 acer xxxx The same data in a rowMajor table: DATE 2002-01-15 PLOT hfr5 SPECIES acer rubrum acer xxxx
Example(s):
column
row

Derived from: xs:string (by xs:restriction)

Allowed values:

  • column
  • row

simpleDelimited  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
fieldDelimiter required unbounded
collapseDelimiters optional
quoteCharacter optional unbounded
literalCharacter optional unbounded
)
A simple delimited format that uses one of a series of delimiters to indicate the ends of fields in the data stream. More complex formats such as fixed format or mixed delimited and fixed formats can be described using the "complex" element.
fieldDelimiter  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element specifies a character to be used in the object for indicating the ending column for an attribute. The delimiter character itself is not part of the attribute value, but rather is present in the column following the last character of the value. Typical delimiter characters include commas, tabs, spaces, and semicolons. The only time the fieldDelimiter character is not interpreted as a delimiter is if it is contained in a quoted string (see quoteCharacter) or is immediately preceded by a literalCharacter. Non-printable quote characters can be provided as their hex values, and for tab characters by its ASCII string "\t". Processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
,
\t
0x09
0x20
collapseDelimiters  This element has no default value.
Content of this field: Description of this field:
The collapseDelimiters element specifies whether sequential delimiters should be treated as a single delimiter or multiple delimiters. An example is when a space delimiter is used; often there may be several repeated spaces that should be treated as a single delimiter, but not always. The valid values are yes or no. If it is set to yes, then consecutive delimiters will be collapsed to one. If set to no or absent, then consecutive delimiters will be treated as seperate delimiters. Default behaviour is no; hence, consecutive delimiters will be treated as seperate delimiters, by default.
Example(s):
yes
no

Derived from: xs:string (by xs:restriction)

Allowed values:

  • yes
  • no

quoteCharacter  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies a character to be used in the object for quoting values so that field delimeters can be used within the value. This basically allows delimeter "escaping". The quoteChacter is typically a " or '. When a processor encounters a quote character, it should not interpret any following characters as a delimiter until a matching quote character has been encountered (i.e., quotes come in pairs). It is an error to not provide a closing quote before the record ends. Non-printable quote characters can be provided as their hex values.
Example(s):
"
'
literalCharacter  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies a character to be used for escaping special character values so that they are treated as literal values. This allows "escaping" for special characters like quotes, commas, and spaces when they are intended to be used in an attribute value rather than being intended as a delimiter. The literalCharacter is typically a \.
Example(s):
\
complex  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
textFixed required
OR
textDelimited required
)
A complex text format that can describe delimited fields, fixed width fields, and mixtures of the two. This supports multiline records (where one record is distributed across multiple physical lines). When using the complex format, the number of textFixed and textDelimited elements should exactly equal the number of attributes that have been described for the entity, and the order of the textFixed and textDelimited elements should correspond to the order of the attributes as described in the entity. Thus, for a delimited file with fourteen attributs, one should provide exactly fourteen textDelimited elements.
textFixed  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
fieldWidth required
lineNumber optional
fieldStartColumn optional
)
Describes the physical format of data sequences that use a fixed number of characters in a specified position in the stream to locate attribute values. This method is common in sensor-derived data and in legacy database systems. To parse it, one must know the number of characters for each attribute and the starting column and line to begin reading the value.
fieldWidth  This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
Fixed width fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number.
Example(s):
7
lineNumber  This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. The lineNumber should never greater that the number of physical lines per record.
Example(s):
3
fieldStartColumn  This element has no default value.
Content of this field: Description of this field:
Type: xs:long
Fixed width fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number. If the starting column is not provided, processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
58
textDelimited  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
fieldDelimiter required
collapseDelimiters optional
lineNumber optional
quoteCharacter optional unbounded
literalCharacter optional unbounded
)
Describes the physical format of data sequences that use delimiters in the stream to locate attribute values. This method is common in data exported from spreadsheets and database systems, To parse it, one must know the character that indicates the end of each attribute and the line to begin reading the value.
fieldDelimiter  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
This element specifies a character to be used in the object for indicating the ending column for an attribute. The delimiter character itself is not part of the attribute value, but rather is present in the column following the last character of the value. Typical delimiter characters include commas, tabs, spaces, and semicolons. The only time the fieldDelimiter character is not interpreted as a delimiter is if it is contained in a quoted string (see quoteCharacter) or is immediately preceded by a literalCharacter. Non-printable quote characters can be provided as their hex values, and for tab characters by its ASCII string "\t". Processors should assume that the field starts in the column following the previous field if the previous field was fixed, or in the column following the delimiter from the previous field if the previous field was delimited.
Example(s):
,
\t
0x09
0x20
collapseDelimiters  This element has no default value.
Content of this field: Description of this field:
The collapseDelimiters element specifies whether sequential delimiters should be treated as a single delimiter or multiple delimiters. An example is when a space delimiter is used; often there may be several repeated spaces that should be treated as a single delimiter, but not always. The valid values are yes or no. If it is set to yes, then consecutive delimiters will be collapsed to one. If set to no or absent, then consecutive delimiters will be treated as seperate delimiters. Default behaviour is no; hence, consecutive delimiters will be treated as seperate delimiters, by default.
Example(s):
yes
no

Derived from: xs:string (by xs:restriction)

Allowed values:

  • yes
  • no

lineNumber  This element has no default value.
Content of this field: Description of this field:
Type: xs:unsignedLong
A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. The lineNumber should never be greater that the number of physical lines per record. When parsing the first field on a physical line as a delimited field, they should assume that the field data starts in the first column. Otherwise, follow the rules indicated under fieldDelimiter.
Example(s):
3
quoteCharacter  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies a character to be used in the object for quoting values so that field delimeters can be used within the value. This basically allows delimeter "escaping". The quoteChacter is typically a " or '. When a processor encounters a quote character, it should not interpret any following characters as a delimiter until a matching quote character has been encountered (i.e., quotes come in pairs). It is an error to not provide a closing quote before the record ends. Non-printable quote characters can be provided as their hex values.
Example(s):
"
'
literalCharacter  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
This element specifies a character to be used for escaping special character values so that they are treated as literal values. This allows "escaping" for special characters like quotes, commas, and spaces when they are intended to be used in an attribute value rather than being intended as a delimiter. The literalCharacter is typically a \.
Example(s):
\
externallyDefinedFormat  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
formatName required
formatVersion optional
citation optional
)
Information about a non-text or propriateary formatted object. The description names the format explicitly, but assumes a processor implicitly knows how to parse that format to extract the data. A format version can be included. This is mainly used for proprietary formats, including binary files like Microsoft Excel and text formats like ESRI's ArcInfo export format. This is not a recommended way to permenantly archive data because the software to parse the format is unlikely to be available over extended periods, but is included to allow for commonly used physical formats.
formatName  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
Name of the format of the data object
Example(s):
Microsoft Excel
formatVersion  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
Version of the format of the data object
Example(s):
2000 (9.0.2720)
citation  This element has no default value.
Content of this field: Description of this field:
Type: cit:CitationType
Citation providing more detail about the physical format, including parsing information or information about the software required for reading the object.
binaryRasterFormat  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
rowColumnOrientation required
multiBand optional
nbits required
byteorder required
skipbytes optional
bandrowbytes optional
totalrowbytes optional
bandgapbytes optional
)
The binaryRasterInfo element is a container for various parameters used to described the contents of binary raster image files. In this case, it is based on a white paper on the ESRI site that describes the header information used for BIP and BIL files ("Extendable Image Formats for ArcView GIS 3.1 and 3.2").
rowColumnOrientation  This element has no default value.
Content of this field: Description of this field:
Specifies whether the data should be read across rows or down columns. The valid values are column or row. If set to 'column', then the data are read down columns. If set to 'row', then the data are read across rows.
Example(s):
column
row

Derived from: xs:string (by xs:restriction)

Allowed values:

  • column
  • row

multiBand  This element has no default value.
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
nbands required
layout required
)
Information needed to properly interpret a multiband image.
nbands  This element has no default value.
Content of this field: Description of this field:
Type: xs:int
The number of spectral bands in the image. Must be greater than 1.
Example(s):
2
layout  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The organization of the bands in the image file. Acceptable values are bil - Band interleaved by line. bip - Band interleaved by pixel. bsq - Band sequential.
Example(s):
bil
bip
bsq
nbits  This element has no default value.
Content of this field: Description of this field:
Type: xs:int
The number of bits per pixel per band. Acceptable values are typically 1, 4, 8, 16, and 32. The default value is eight bits per pixel per band. For a true color image with three bands (R, G, B) stored using eight bits for each pixel in each band, nbits equals eight and nbands equals three, for a total of twenty-four bits per pixel.
Example(s):
8
byteorder  This element has no default value.
Content of this field: Description of this field:
Type: res:NonEmptyStringType
The byte order in which values are stored. The byte order is important for sixteen-bit and higher images, that have two or more bytes per pixel. Acceptable values are little-endian (common on Intel systems like PCs) and big-endian (common on Motorola platforms).
Example(s):
little-endian
big-endian
skipbytes  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
The number of bytes of data in the image file to skip in order to reach the start of the image data. This keyword allows you to bypass any existing image header information in the file. The default value is zero bytes.
Example(s):
0
bandrowbytes  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
The number of bytes per band per row. This must be an integer. This keyword is used only with BIL files when there are extra bits at the end of each band within a row that must be skipped.
Example(s):
3
totalrowbytes  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
The total number of bytes of data per row. Use totalrowbytes when there are extra trailing bits at the end of each row.
Example(s):
8
bandgapbytes  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
The number of bytes between bands in a BSQ format image. The default is zero.
Example(s):
1
distribution  This element has no default value.
Content of this field: Description of this field:
Type: PhysicalDistributionType
This element provides information on how the resource is distributed. Connections to online systems can be described as URLs or as a list of connection parameters.
online  This element has no default value.
Content of this field: Description of this field:
Type: PhysicalOnlineType
offline  This element has no default value.
Content of this field: Description of this field:
Type: res:OfflineType
inline  This element has no default value.
Content of this field: Description of this field:
Type: res:InlineType
access  This element has no default value.
Content of this field: Description of this field:
Type: acc:AccessType
onlineDescription  This element has no default value.
Content of this field: Description of this field:
Type: xs:string
url  This element has no default value.
Content of this field: Description of this field:
Type: res:UrlType
connection  This element has no default value.
Content of this field: Description of this field:
Type: res:ConnectionType

Attribute Definitions:

unit

Use: optional

Default value: byte

This element gives the unit of measurement for the size of the entity, and is by default a byte.
Example(s):
byte
method

Type: xs:string

Use: optional

This element names the method used to calculate and authentication checksum that can be used to validate a bytestream. Typical checksum methods include MD5 and CRC.
Example(s):
MD5
id

Type: res:IDType

Use: optional

system

Type: res:SystemType

Use: optional

scope

Type: res:ScopeType

Use: optional

Default value: document

id

Type: res:IDType

Use: optional

system

Type: res:SystemType

Use: optional

scope

Type: res:ScopeType

Use: optional

Default value: document

Complex Type Definitions:

PhysicalType 
Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
A sequence of (
objectName required
size optional
authentication optional unbounded
A choice of (
compressionMethod required
OR
encodingMethod required
)
characterEncoding optional
dataFormat required
distribution optional unbounded
)
OR
res:ReferencesGroup    
)
Attributes: Use: Default Value:
id optional
system optional
scope optional document
PhysicalDistributionType 
Content of this field: Description of this field:
Elements: Use: How many:
A choice of (
A sequence of (
A choice of (
online required
OR
offline required
OR
inline required
)
access optional
)
OR
res:ReferencesGroup    
)
Attributes: Use: Default Value:
id optional
system optional
scope optional document
PhysicalOnlineType 
Content of this field: Description of this field:
Elements: Use: How many:
A sequence of (
onlineDescription optional
A choice of (
url required
OR
connection required
)
)

Simple Type Definitions:

Group Definitions:

Web Contact: jones@nceas.ucsb.edu