Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Draft Language for discussion:

Jim - per TAC discussion on 7/14 : reworked this document to talk about a TAC task force on interoperability that will interact with/coordinate/guide pilot projects that will tackle the types of deliverables outlined here already)

Purpose:

To support the NDS Vision of a national capability for research data to be broadly accessible across disciplines and across repositories, services, and applications, and to help define an effective level of interoperability between data services participating in NDS Share, the NDS Technical Advisory Committee will launch a task force consisting of self-selecting TAC members and, if and where needed, invited external experts. The Task Force members will work to promote interoperability within NDS and will work to encourage interoperability as a goal within NDS pilot projects, to promote the development of pilot projects specifically focusing on interoperability, and to track and encourage coordination between pilot projects and NDS Share resource and service providers.

...

  • Developing an online list of interoperability-related pilot projects and references to their interoperability related resources and accomplishments
  • Gathering information from NDS resource and service providers regarding their current mechanisms for import/export/exchange of data publications and their interest/involvement in interoperability pilot projects
  • Tracking (in collaboration with pilots) relevant external interoperability efforts (e.g. the proposed RDA Interoperability WG)
  • Reporting to the TAC and conveying TAC feedback to NDS Share providers, pilot projects, and the NDS Membership
  • Reporting at NDS National Meetings and helping to organize cross-group interoperability sessions at such meetings
  • Alerting the TAC when one or more options exist that could be recommended as baseline interoperability requirements for NDS Share resources

 

Interoperability-related Pilot Projects:

The TAC is currently reviewing two pilot project proposals, at least one of which appears to be interoperability-related. The 'pre-proposal' below would be an additional, complimentary effort that would be open to any/all NDS participants and would focus specifically on the development of a data publication exchange mechanism (API or serialization format) intentionally consistent with the NDS Share interest in assuring that offered resources provide a basic level of interoperability.

 

Public Pilot Project Preposal: (formatted to match the current Pilot Project Request Form (http://goo.gl/forms/uObA1cDJIUE02gqz2))

Project Title:  Open NDS Interoperability Pilot 

Contact person for the NDS Pilot submission (First and last name): James Myers

Email address for the contact person: myersjd@umich.edu

...

TBD (please volunteer (edit now)!)

Why is this project pertinent?

Making a broad range of published data accessible across the ecosystem of tools and services is integral to the NDS Vision. This pilot project, open to all members of the NDS community, intends to reach a consensus on a standard mechanism(s) for any application or service to access the full set of data and metadata comprising a data publication, regardless of source, and to work individually and collectively across the NDS Consortium to implement those interfaces within existing tools. Further, the pilot proposes the development of a suite of test data and tools to assess conformance to the agreed standards.

These mechanisms would be an important initial step in enabling data to flow seamless across tools and services in a National Data Service ecosystem. Our group anticipates that the proposed work will enable a broad range of further efforts, from assessments of the variations in data types, formats, size, purpose, metadata vocabularies across disciplines and tools, to efforts that would define defining minimal metadata standards that would enable interoperability at the deeper levels required for functionality such as federated faceted search or data fusion. Thus we anticipate direct and indirect benefits for NDS.

What community does this project serve?

The project's direct community is NDS itself, At the NDS 5 meeting, a significant number of individuals (10+), representing a broad range of NSF DataNet, DIBBS, and other projects, expressed interest in participating in an interoperability effort in NDS that this pilot would address. These projects span a wide range of disciplines and include projects that provide general, community-agnostic services.

What is the size of the community?

The pilot project is intended to be open. The active communities using tools or services provided by the initial project membership is expected to represent hundreds to thousands of users.

Can you describe the use case or use cases?

The family of use cases targeted by this group would include those in which a researcher, through some means, has identified a potentially relevant data publication, identified by a persistent identifier of some type, and wishes to inspect its contents and, if the data is indeed relevant, retrieve all or some of the sub-components of the publication for further processing in other inter-operable tools or services. The tools may support inspection,  visualization, analysis, modeling, analytics, publication or other actions relevant to scientific research. The focus of the project would be to define mechanisms to 'resolve' an identifier, regardless of type, to discover a common API or export format from which it would then be possible to identify and retrieve the components of the data publication and the metadata associated with the publication and with specific components. The degree to which the components and metadata retrieved can be interpreted, while important for deep interoperability, is not a focus of the pilot beyond potential tasks to document current/best practices. (A quick example: a researcher seeks a GeoTiff file for display in a mapping application from within an identified data publication. The result of this pilot would enable the researcher to find the components within the publication and look for metadata, such as Dublin Core title, description, and format information which, if supplied, would let them identify the correct file to display. This group may also discover that most/all participants supply a Dublin Core title or RDF Label metadata as a short, human readable name and thus allow tools to automatically display a useful name regardless of which metadata is supplied. If future work standardized and mandated specific minimal metadata, displaying a name could be fully automated and processing would be simplified.The ability to find components within a publication, to identify the metadata for that component, and to identify the vocabulary and value provided, all of which would be addressed by this pilot, would support any of these scenarios and would provide value regardless of whether further consensus could be achieved across NDS.)

What is the data need being addressed?

The use cases addressed would support a variety of needs including

    • the types of data transfer between tools described in the NDS Vision and vision video,
    • the need within NDS Share to be able support continuing data publication and use as individual software components and compute/storage resources join or leave (i.e. the proposed work would simplify transferring data between resources if/when needed), 
    • the expressed desires of NDS participants to be able to make their tools more inter-operable without having to make bespoke agreements between individual projects, and
    • the interest of researchers in being able to combine multiple services provided by NDS participants into their overall research workflows.

Which existing tools/platforms/content sources do you aim to use? What part of these can you supply. and what do you need from the NDS?

The pilot participants will supply software, services, and test data developed through their ongoing projects. Given the range of DataNet, DIBBs, repository, and other projects participating in NDS, we anticipate considerable variation in the size/scope/architecture of individual tools. We also anticipate that participants in the pilot project will be developing extensions to their existing products and/or separate tools for generating, inspecting, and parsing information on data publications provided through the API(s) or export format(s) developed during the pilot. 

We anticipate that the use of NDS Labs resources as a place to run some of the participants applications/services, to share tools developed through the groups work, to store suites of test data and the results of interoperability tests, etc.At present, we anticipate that participants would primarily want to run one or a few related applications during their testing and to be able to read and write to a shared test data suite (which may consist of exported/serialized data publications or a service providing an API over test datasets). We anticipate that this use would primarily consist of dockerized components that could be spun up as needed for development and testing rather than large and/or long-lived service instances that would represent an ongoing use of resources. As a starting point, we would anticipate that persistent storage on the order of <1 TB would be sufficient for the pilot). (Large, real world data publications that would be of interest to the pilot are expected to be hosted in external repositories and/or in NDS Share (i.e. based on separate arrangements made between the creators of the publication and NDS). We would request that NDS Labs provide login credentials and a minimal capability to launch and run applications to pilot participants upon request. We anticipate that some groups may be interested in specific dockerized containers, but the pilot as a whole does not have any request for support of any specific components and expects to rely on the base capability to create and launch containers and data volumes. Best-effort user support for using NDS Labs would be valuable. 

The pilot group will also be producing group documentation and could potentially leverage the NDS Wiki if that could be made accessible to group members.

Describe what you propose to do in terms of interfaces, architectures, and other technological/design aspect for both existing/to be created components.

The specific API and or export formats produced by the group and the details of test data suites and stand-alone software tools and libraries will be developed through the project's activities. Initial discussions have focused on the RESTful and often JSON, or JSON-LD-based interfaces that are common to many modern data applications/services and on the combination of BagIT and OAI-ORE standards for serializing and structuring data and metadata that are likewise used by multiple NDS participants. We thus anticipate that these are likely to among the proposed directions for the pilot's efforts. (Note: The technologies mentioned are useful but not sufficient for creating the level of interoperability proposed.) The pilot does intend to focus tightly on the idea of an exchange API and/or serialization format rather than the architecture within given software components or the specifics of any service that may support the API and/or format.

Are there other projects that are trying to provide a similar service solution? If so, what are these projects and how is this one different?

We are aware of a working group within RDA  ( https://rd-alliance.org/group/research-data-repository-interoperability-wg/case-statement/research-data-repository), as well as a wide range of past and current discussions on data interoperability within, for example, Earth Cube. There are also specific tools - from identifier minting and discover services, to RDA products (e.g. type registry), and vocabulary mapping services (e.g. the Earthcube Geosemantics service). We anticipate the primary differences in the proposed pilot are the focus on the mechanics of getting/putting data and metadata to read/create data publications rather than debating minimal metadata standards and the emphasis on working code that will support an initial level of interoperability across the software and services of NDS participants.

We also anticipate that bespoke interoperability efforts between different DataNet/DIBBS/other projects, being pursued as part of those projects existing plans, would provide relevant guidance for this effort and could potentially be leveraged in support of the pilot's efforts. There has also been work at a previous NDS Hackathon to demonstrate a 'universal resolver' that could provide data publication metadata in a common JSON format across different identifier schemes that, at a minimum, helps demonstrate the concept.

If appropriate, include a review of your pertinent previous work, including content repositories, pieces of software or other tools you believe to be important to the execution of your project.

We expect that pilot participants will help develop this background material as the project proceeds. We anticipate that most participants will already have some form of API and or export format from their prior work and that these in turn will reference underlying standards and software components.

...

Plan:

The group will be open and will solicit participation from the NDS community at large.

In the first month, the group will collect documentation of existing APIs and serialized formats in use today. The group will also document the range of options that could be pursued in defining a common mechanism. As part of this effort, the group will look at available standards, interoperability mechanisms that have been proposed through RDA (e.g. https://rd-alliance.org/group/research-data-repository-interoperability-wg/case-statement/research-data-repository), Earth Cube, or other forums.

The group will plan a series of online discussions to evaluate the options and work to achieve a consensus recommended data interoperability mechanisms. As necessary, group leadership will focus the effort around a minimal set of options based on a consensus across projects that are willing to implement them and an assessment of whether initial implementations can be developed within 6 months.

The project will leverage common software engineering tools (e.g. GitHub) and the cloud resources available through NDS Labs as applicable (e.g. to store test data and share prototype code and testing tools). The group does not anticipate needing staff time from NDS beyond basic support in using the NDS Labs cloud capabilities. (NDS Labs staff would be welcome to participate in the group, contributing time and effort on the same volunteer basis as other participants.)

Products:

...


...

Activities at NDS 6:

Presentations on the Task Force and it's relationship to NDS Share, Labs, the TAC, and pilot projects were given at the NDS 6 tutorial day and in the main program. 


Interoperability-related Pilot Projects:

The TAC ITF is currently tracking two pilot projects that are interoperability related. The ITF would like to help these and other pilot efforts to coordinating with each other and making the outcomes of their efforts that are relevant to NDS Share and the overall NDS vision more visible to the consortium. The ITF will specifically work with pilots to develop a recommendation to be delivered through the TAC regarding minimal requirements for participation in NDS SHare. 

Project Proposal Form: http://goo.gl/forms/uObA1cDJIUE02gqz2

Existing Interoperability Related Pilot(s):

Presented at NDS 6: 

Bringing Visibility to Food Security Data Results: Harvests of PRAGMA and RDA Quan (Gabriel) Zhou, Indiana U

Proposed Interoperability Pilot(s):

Universally Accessible Data Publications Pilot