ASDF Schemas¶
ASDF uses JSON Schema to perform validation of ASDF files. Schema validation of ASDF files serves the following purposes:
ASDF readers can detect whether an ASDF file has been modified in a way that would render it unreadable or unrecognizable.
Enables interoperability between ASDF implementations. Implementations that recognize the same schema definitions should be able to interpret files containing instances of data types that conform to those schemas.
Allows for the definition of custom data types. External software packages can provide ASDF schemas that correspond to types provided by that package, and then serialize instances of those types in a way that is standardized and portable.
All ASDF implementations must implement the types defined by the core schemas and validate against them when reading files. [1] The ASDF specification also defines other schemas, which are optional for ASDF implementations and maintained as part of the specification (mainly for historical reasons):
The ASDF specification also defines two metaschemas which are used to validate the ASDF schemas themselves:
More information on the schemas defined by ASDF can be found in ASDF Specification Schema Definitions.
Schema Implementation¶
ASDF schemas are encoded in YAML and conform to a superset of JSON Schema called YAML Schema. The version of YAML supported by ASDF is 1.1. Accordingly, all schemas begin with the following YAML header:
%YAML 1.1
---
The following top-level attributes are required for all ASDF schemas:
$schema: Indicates the metaschema definition used to validate this schemaid: A name that uniquely identifies the schema
Each of these attributes is described in more detail below.
$schema¶
ASDF schemas use the top-level $schema attribute to declare the metaschema
that is used to validate the schema itself. Most custom ASDF schemas will
conform to YAML Schema and so will have the following
top-level attribute:
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
Some ASDF schemas use the ASDF metaschema instead (e.g. ndarray). It is also possible to create custom metaschemas.
Warning
Creating a new custom metaschema is highly discouraged. Many JSON schema libraries do not support custom metaschemas. If creating a custom metaschema it should inherit from YAML Schema. [2]
Some ASDF implementations may choose to validate the schemas themselves (e.g.
as part of a regression testing suite). The $schema keyword should be used
to determine the metaschema to be used for validation. All schemas should also
validate successfully against YAML Schema.
id¶
The id represents the globally unique name of the schema. It must be a
valid URI and cannot be an empty
string or an empty fragment (e.g. #). See Naming Conventions for
conventions to ensure global uniqueness.
While the id must be a valid URI, it does not have to describe a real
location on disk or on a network. For example, the id values for all
schemas in the ASDF core schemas begin with the prefix
http://stsci.edu/schemas/asdf/. However, as of this writing, none of the
schemas are actually hosted at that location.
The id keyword is used for reference resolution both within a schema and
between schemas. Relative references within a schema are resolved against the
id of that schema. A reference to an external schema uses the id of
that schema. See References below for additional information.
Each ASDF implementation must define how to resolve a schema id to a real
resource that contains the schema itself. This resource will often be a local
file but this detail is left up to the implementation.
Descriptive information¶
Each schema may optionally contain descriptive fields: title,
description and examples. These fields may contain core markdown
syntax (which will be used for the purposes of rendering schema documentation
by, for example, sphinx-asdf).
title: A one-line summary of the data type described by the schemadescription: A lengthier prose description of the schemaexamples: A list of example content that conforms to the schema, illustrating how to use it.
References¶
JSON Schema allows rerencing other schemas by including
a mapping containing a single key $ref and value containing the uri
of the referenced schema.
Naming Conventions¶
Schema id attributes must be valid URIs. Furthermore tag uris must be
valid URIs that conform to the tag URI scheme defined in RFC 4151.
Certain conventions are strongly recommended in order to ensure
uniqueness and to enable a simple correspondence between the id and tag
attributes. These conventions are described below.
All schema ids should encode the following information:
organization: Indicates the organization that created the schema
standard: The “standard” this schema belongs to. This will usually correspond to the name of the software package that provides this schema.
name: The name of the data type corresponding to this schema.
version: The version of the schema. See Versioning Conventions for more details.
Consider ASDF core schemas as an example. In this case, the
organization is stsci.edu, which is the web address of the organization
that created the schemas. The standard is asdf. Each individual schema
in the core schemas has a different name field. In the case of the
ndarray data type, for example, the name is
core/ndarray. The version of ndarray is 1.1.0.
Some types in the core schemas have multiple versions, such as
ndarray-1.0.0 and ndarray-1.1.0.
While schema ids can be any valid URI, under this convention they always begin
with http://. The general format of the id attribute becomes:
http://<organization>/schemas/<standard>/<name>-<version>
Continuing with the example of ndarray, we have:
id: "http://stsci.edu/schemas/asdf/core/ndarray-1.1.0"
The idea behind the convention for id is that it should be possible (in
principle if not in practice) for schemas to be hosted at the corresponding
URL. This motivates the choice of the organization’s web address as the
organization component. However, this is not a requirement. The primary
objective is to create a globally unique id.
Designing a new tag and schema¶
This section will walk through the development of a new tag and schema. In the
example, suppose we work at the Example Organization, which can be
found on the world wide web at example.org. We’re developing a new
instrument, foo, and we need a way to define the specialized metadata to
describe the exposures that it will be generating.
According to the Naming Conventions, our tag and id will
consist of the following components:
organization:
example.orgstandard:
fooname:
metadataversion:
1.0.0(by convention the starting version for all new schemas)
So, for our example instrument metadata, the tag is:
tag:example.org:foo/metadata-1.0.0
Each tag should be associated with a schema in order to validate it. Each
schema must also have a universally unique id, which is in the form of
unique URI. We will assign the following URI to refer to our schema:
http://example.org/schemas/foo/metadata-1.0.0
Note
Note that this URI doesn’t actually have to resolve to anything. In fact,
visiting that URL in your web browser is likely to bring up a 404 error.
All that’s necessary is that it is universally unique and that the tool reading
the ASDF file is able to map from a tag name to a schema URI, and then load the
associated schema.
Therefore, in our schema file, we have the following defining the id of the schema:
id: "http://example.org/schemas/foo/metadata-1.0.0"
Since our schema is just a basic ASDF schema, we will declare that it conforms to YAML Schema defined by the ASDF specification:
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
Descriptive information¶
Continuing our example, we include some descriptive metadata about the data type declared by the schema itself:
title: |
Metadata for the foo instrument.
description: |
This stores some information about an exposure from the foo instrument.
examples:
-
- A minimal description of an exposure.
- |
tag:example.org:foo/metadata-1.0.0
exposure_time: 0.001
The schema proper¶
The rest of the schema describes the acceptable data types and their structure. The format used for this description comes straight out of JSON Schema, and rather than documenting all of the things it can do here, please refer to the resources available at json-schema.org.
In our example, we’ll define two metadata elements: the name of the investigator, and the exposure time, each of which also have a description:
type: object
properties:
investigator:
type: string
description: |
The name of the principal investigator who requested the
exposure.
exposure_time:
type: number
description: |
The time of the exposure, in nanoseconds.
We’ll also define an optional element for the exposure time unit. This is a somewhat contrived example to demonstrate how to include elements in your schema that are based on the custom types defined in the ASDF core schemas:
exposure_time_units:
allOf:
- $ref: "http://stsci.edu/schemas/asdf/unit/unit-1.0.0"
description: |
The unit of the exposure time.
In brief this requires that any value for exposure_time_units must
comply with the unit schema.
Lastly, we’ll declare exposure_time as being required, and allow
extra elements to be added:
required: [exposure_time]
additionalProperties: true
The complete example¶
Here is our complete schema example:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://example.org/schemas/foo/metadata-1.0.0"
title: |
Metadata for the foo instrument.
description: |
This stores some information about an exposure from the foo instrument.
examples:
-
- A minimal description of an exposure.
- |
tag:example.org:foo/metadata-1.0.0
exposure_time: 0.001
type: object
properties:
investigator:
type: string
description: |
The name of the principal investigator who requested the
exposure.
exposure_time:
type: number
description: |
The time of the exposure, in nanoseconds.
exposure_time_units:
allOf:
- $ref: "http://stsci.edu/schemas/asdf/unit/unit-1.0.0"
description: |
The unit of the exposure time.
required: [exposure_time]
additionalProperties: true
Extending an existing schema¶
JSON Schema does not support the concept of inheritance, which makes it somewhat awkward to express type hierarchies. However, it is possible to create a custom schema that adds attributes to an existing schema (e.g. one defined in the ASDF core schemas). It is important to remember that it is not possible to override or remove any of the attributes from the existing schema.
The following important caveats apply when extending an existing schema:
It is not possible to redefine, override, or delete any attributes in the original schema.
It will not be possible to add attributes to any node where the original schema declares
additionalProperties: falseInstances of the custom type will not be recognized as an instance of the original type when resolving schema references or processing YAML tags (i.e. there is no concept of polymorphism).
Here’s an example of extending a schema using the software schema defined in the ASDF core schemas. Here’s the original schema, for reference:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://stsci.edu/schemas/asdf/core/software-1.0.0"
title: |
Describes a software package.
description: |
General-purpose description of a software package.
type: object
properties:
name:
description: |
The name of the application or library.
type: string
author:
description: |
The author (or institution) that produced the software package.
type: string
homepage:
description: |
A URI to the homepage of the software.
type: string
format: uri
version:
description: |
The version of the software used. It is recommended, but not
required, that this follows the (Semantic Versioning
Specification)[http://semver.org/spec/v2.0.0.html].
type: string
required: [name, version]
additionalProperties: true
...
Since the software schema permits additional properties, we are free to extend it to include an email address for contacting the author:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://somewhere.org/schemas/software_extended-1.0.0"
title: |
Describes a software package.
description: |
Extension of ASDF core software schema to include the
software author's contact email.
allOf:
- $ref: http://stsci.edu/schemas/asdf/core/software-1.0.0
- properties:
author_email:
description: |
The contact email of the software author.
type: string
required: [author_email]
...
The crucial portion of this schema definition is the way that the allOf
operator is used to join a reference to the base software schema with the
definition of a new property called author_email.
The allOf combiner means that any instance that is validated against
software_extended-1.0.0 will have to conform to both the base software schema
and the properties specific to the extended schema.
Default annotation¶
Warning
It is recommended that default is no longer used in new schema.
As noted below newer versions of the ASDF specification will ignore these values.
The JSON Schema spec includes a schema annotation attribute called default that
can be used to describe the default value of a data attribute when that attribute
is missing. Recent versions of the spec point out
that there is no single correct way to choose an annotation value when multiple
are available due to references and combiners. This presents a problem when
trying to fill in missing data in a file based on the schema default: if
multiple conflicting values are available, the software does not know how to choose.
Previous versions of the ASDF specification did not offer guidance on how
to use default. The Python reference implementation read the first default
that it encountered as a literal value and inserted that value into the tree when
the corresponding attribute was otherwise missing. Until version 2.8, it also
removed attributes on write whose values matched their schema defaults. The
resulting files would appear to the casual viewer to be missing data, and may in
fact be invalid against their schemas if the any of the removed attributes were required.
Implementations must not remove attributes with default values from the tree.
Beginning with ASDF specification 1.6.0, implementations also must not fill default values
directly from the schema. This will avoid ambiguity when multiple schema defaults
are present, and also permit the default attribute to contain a description
that is not appropriate to use as a literal default value. For example:
default: An array of zeros matching the dimensions of the data array.
For ASDF specification < 1.6.0, filling default values from the schema is required. This is necessary to support files written by older versions of the Python implementation.
Footnotes