Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Goal:

Given any persistent identifier for a data publication, be able to retrieve all of the available metadata for it without customization for the identifier scheme or source

...

Common Convention Options:

1) HTTP Accept Header:

Summary: Given an id, resolve it using current mechanism, use an Accept:application/json-ld header to retrieve metadata

...

To access metadata given a persistent identifier, clients should resolve the identifier to a landing page URL via the persistent identifier type-specific resolver mechanism (e.g. using doi.org for DOIs), and the request that URL with an "application/json-ld" Accept header. The client should be prepared to follow a 303 redirect response from the server.

Justification: Group discussion to date has indicated that there are no strong preferences w.r.t. the exact mechanism selected, e.g. between the choices of ACCEPT Headers or conventions on URL naming, particularly given the low anticipated cost of adding such a mechanism over existing interfaces. Looking at the CoolURI best practices, it appears that using ACCEPT headers would be more consistent with other semantic web applications and, with the ability to support a 303 redirect, would enable applications to also adopt standard or software specific URLs in addition.

Agreement: 

Issues/Concerns:

2) Metadata format

With an expectation that some groups implementing this specification will work to harmonize their data models and metadata choices, the core specification only defines a means to distinguish three types of entities w.r.t. metadata: the metadata document, the publication, and component(s) of the publication. Using json-ld formatting, metadata providers may use any vocabulary(ies) to describe these three types of entities in the json-ld document they return. We propose the use of the OAI Object Reuse and Exchange (OAI-ORE)  vocabulary to identify and relate these three entities and the adoption of the json-ld serialization of OAI-ORE. This means:

  • The returned json-ld document contains one top-level JSON object representing the represents the document itself and is of ORE type "ResourceMap"
  • This JSON object has an ORE "describes" relationship to an embedded JSON Object of ORE type "Aggregation"
  • The Aggregation JSON Object has an ORE "aggregates" realtionship to a JSON Array of one or more JSON Objects of ORE type "AggregatedResource"

Each of these entities - the ResourceMap, the Aggregation, and one or more AggregatedResources may have arbitrary metadata that is serialized as json-ld within the basic structure outlined above.

Justification:

While current practices across the NDS consortium and across the range of systems considered in RDA groups can have significant differences in terms of the structure of their data publications, the nature of resources comprising a publication, and the types of descriptive information about them, there appears to be a consensus that publications form different sources are comprised of one or more items, and that these are distinct from any document(s) that represent them. It thus makes sense to include in the core specification, a means of identifying these entities and of navigating through a returned metadata document to discover them, independent of whether further agreement on the data model or metadata vocabularies used is possible. As an example of value, consider the inclusion of metadata in the returned document that identifies a creator, title, creation date or similar concept. Standardizing how the document, publication, and included resources are distinguished in the metadata document allows one to understand whether a given instance of  creator/creation date, etc. metadata refers to the document, the publication, or a specific resource within it, all of which could be different. As with the choice of using Http Access headers, it does not appear that any new mechanism is needed: the OAI-ORE specification addresses these issues, has a defined json-ld serialization, and, while the OAI-ORE specification includes more than just these types and relationships above, does not require use of any additional concepts. Further, OAI-ORE can co-exist with other type systems, e.g. allowing an AggregatedResource to also have the type myschema:file, prov:entity, or someDBorg:query, which would indicate model or metadata constraints to anyone knowing those types.

A valid question in deciding on a format for returned metadata (beyond the use of json-ld as a syntax) is whether further consensus on the data model and/or minimal vocabulary can be reached. This is potentially hard to answer without a concrete counter proposal, but the discussion involved in putting this proposal together noted that questions of whether data models are flat, hierarchical, hierarchical with the potential for multiple parents, graph-oriented (e.g. a provenance graph, or in a more complex example, a graph including provenance relationships along with associations with instruments, and samples, or with spatial relationships, etc. ), or whether published resources are file-like (represented by a single stream of bytes), or represent a query or workflow over a base resource, or represent live objects (e.g. streaming sensor data that will be different when retrieved at different times, are not things that we have consensus on across systems. Similarly, current practices involve use of multiple vocabularies. While it seems possible that we could on a minimal model and minimal metadata (e.g. requiring that all models must define a default hierarchy and all resources must have basic title, creator, creationdate, type metadata in some vocabulary (e.g. Dublin Core)), it seems like such agreements might take protracted discussion and result in more work for metadata providers without a clear benefit to users. (The alternative here is not to provide no guidance, but to make such extensions optional, so that, for example, hierarchical collections can be presented in a standard way but graph-oriented publications are still visible and do not have to be mapped into an artificial hierarchy unless it makes sense and is motivated by user interest.)

Agreement:

Issues/Concerns:

One potential issue with OAI-ORE is that, while it includes the notion of hierarchy - Aggregations may aggregate other Aggregations, each Aggregation must be described in its own ResourceMap. This means a publication consisting of nested folders and files could not be fully described in a single returned ResourceMap metadata document - if the ORE aggregates relationship is used to define the hierarchy. One can accept that limitation (as in DataOne current practice) or use an alternate term to define the internal structure of publications(e.g. SEAD's current practice using dcterms:hasPart relationships to define structure in flat array of ORE 'aggregated' folders and files), so this is not unworkable. Never-the-less, it does embed one (optional) model to structure data in the core specification, which could cause confusion.