RDA Products
Disclaimer: Analyses contain personal opinions and meant to be informative but not authoritative or complete reviews of these efforts. Further - the current state of this page is DRAFT and additions/corrections/additional opinions are all very welcome. As the pilot proceeds, this information/opinions may help us decide where/how to maintain connections with RDA efforts.
For those looking for a shortcut - the groups are listed alphabetically and two of the most closely connected groups appear to be the PID Information Types and Research Data Repository Interoperability WGs.
Data Citation WG :
Product:
Recommends practices to support citation of data subsets - timestamp and version the data itself and generate persistent identifiers for stored, normalized queries of the data
Initial Analysis:
While the language is database-centric (versus file-centric), the recommendations and good practices are general (e.g. considering a path within a hierarchical file collection as a query) and could be applicable to this pilot if/when we look at a common way to browse structure within a dataset. Also includes some general good practices such as providing standard citation text on the landing page that the persistent identifier resolves to.
Data Description Registry Interoperability (DDRI) WG :
Product:
Recommends a standard model for describing researchers, data, publications, and grants, and their relationships that can aid in discovery. Software/services exist that can harvest this information and support queries/visualization over the results.
Initial Analysis:
With the emphasis on a metadata model, this work is orthogonal to the initial goals of the pilot to standardize access, but use of the model directly or through inference (e.g. using models that can be converted to it) would help with interpreting these types of metadata/relationships. Given that the open source RD-switchboard software developed in connection with the group js able to harvest from different sources, it may be more directly relevant to standardizing access.
Data Foundation and Terminology WG:
Product:
Recommends a standard terminology for discussing digital data. The core vocabulary defines entities such as digital objects, persistent identifiers, metadata, bitstream, metadata repository, and checksum.
Initial Analysis:
The terminology was synthesized from looking at a many systems and is probably a useful common language for this pilot as we talked about our individual systems.
Data Type Registries WG & #2:
Product:
Recommends a standard model for documenting controlled vocabularies (types) and defines a RESTful API for creating, reading, updating, deleting terms. An example would be definitions for scientific units such as temperature in degrees Celsius.
Initial Analysis:
The group's output provides a way to define and manage types/terms with persistent identifiers and using a standard interface. There is a prototype registry with example terms, but it does not appear that there are endorsed vocabularies available yet. The effort appears to be an alternative/service-oriented variant to practices in the RDF community of posting an RDF schema document for the vocabulary (e.g. for dcterms). For this pilot, the work is fairly orthogonal though potentially relevant, as with the RDA groups producing specific vocabularies, in task 3 where we'll look for commonalities in metadata (with the pilot looking more to document de-facto standards and the capabilities enabled through their use than in selecting/developing something new)
Metadata Standards Directory WG:
Product:
The specific goal of the group is to provide a directory of available metadata standards to promote their adoption and help researchers discover relevant standards for their domain. The group website points at some existing directories. The group also contributed to the development of a Metadata Principles document that describes some principles, a categorization of metadata types/purposes, and the challenges and requirements for interoperability.
Initial Analysis:
The work of this group is also likely to be of interest to pilot participants. The most directly relevant aspect appears to be the requirements for interoperability, where the challenges of standardizing models is described. The principles document seems, however, to ignore the initial challenge of retrieving metadata given the different identifier schemes and metadata syntaxes used across systems. In that sense, this pilot should be complimentary, addressing a 'pre-requisite' - the ability to retrieve the metadata being made available regardless of source or identifier scheme, There is one area where there could be a direct connect: item 8 in the "Interoperability" section of the principles document describes documenting current usage of the various standards/their elements to provide guidance. This overlaps with the task in this pilot to document use and the capabilities that depend on/are enabled by that use.
PID Information Types WG:
Product:
This group explored the idea of creating a common API for retrieving metadata given an identifier and the basic syntax and meta-model for how the metadata would be formatted. The final report describes a prototype JSON API and service implementation that ties to teh Data Type Registry.
Initial Analysis:
The work of this group is probably most directly aligned with the goals of this pilot effort. The idea of a common API for retrieving metadata from persistent identifier services is central to both. While there is a direct connect, there are several reasons, some noted by the WG itself, why the prototype API may not be a practical solution for the pilot, but it should definitely be considered. Some aspects the pilot might consider:
the PIT group notes two related metadata models (property-type-profile and type-profile) but does not appear to address working across PID systems with both types
would JSON-LD be a useful extension to connect simple terms with formal URIs, making the API less tied to a particular type definition approach?
while the PIT focus was on discovery metadata, it's extensible, so it could be applicable as a means to work with 'all' metadata, but that may require extensions for performance (one of the future directions items).
PID Kernel Information WG:
Product:
This group is just starting. It has the goal of creating a 'kernel' set of metadata types that could be stored in the type registry and used with the PIT API to support standardized data discovery.
Initial Analysis:
This group's effort is, analogous with the others that are defining a particular vocabulary/metadata standard, orthogonal to this pilots group of enabling access to such metadata. Also, as with other such groups, the standard that emerges could be of interest to the pilot participants and relates to the effort to capture de-facto standards.
Research Data Repository Interoperability WG:
Product:
This group is just starting. It has the goal of establishing interoperability between data repository platforms, based on a common API and/or import/export formats, enabling, for example, machine migration of data between platforms. The draft primer discussed at the 11/2016 meeting provides a list of potential APIs and formats that the group plans to consider.
Initial Analysis:
The goals of this group are well aligned with this pilot and differ primarily in how far the effort will go in standardizing the core metadata standards and data structure and perhaps how much these standards are backed into the API/protocol. This is definitely a group we should interact with and minimal we can share analysis of candidate APIs/formats and exchange use cases/data samples. A joint effort may be a possibility. The initial interoperability options listed by this group doesn't include the PIT/Data Type Registry and other RDA products though connections with other RDA groups was on their November meeting agenda.
Research Data Collections WG:
Product:
This group is developing a standard model for research collections and an API for working with them.
Initial Analysis:
The collection modeling effort are potentially relevant to the pilot group task of standardizing the ability to traverse data publication structure. Based on a quick look, it wasn't clear how this WG connects with aggregation models such as ORE or the Portland Data Model (in the review list for the Research Data Repository Interoperability WG), but the API appears to go further in handling mutable collections and supporting intersection/union queries.
Interest Groups:
There are a number of RDA Interest Groups that explore interoperability and sometimes serve as umbrella groups in areas such as metadata and persistent identifiers. In general, these groups don't create products at the level the working groups do, so they are not all included in the list here.
Long tail of research data IG:
Product:
This group is developing a set of good practices for institutional data management systems. They have created a matrix of desirable functionality for such systems.
Initial Analysis:
Of interest to NDS, but not directly relevant to this pilot. Interesting that the ability to export full-fidelity metadata + data records is not one of the functionalities identified in the matrix.