2018-03-01 Meeting notes

Date

01 Mar 2018

Attendees

chard
Jim
Ray
Craig
Larry
Alainna
Todd (CC)
Lee (CC)
Sandra

Goals

Discussion items

Time

Item

Who

Notes

FRDR Presentation

Lee and Todd

Collaboration between compute canada and canadian association of research libraries.

- Hosted on compute candada hardware and providing tech support

- Service side operated by Portage (e.g., curation)

FRDR

platform for digital research data management and discovery of canadian research data

Main areas: discvoery, deposit, preservation

Discovery

- Harvester that harvests 31 canadian repositories (125K datasets)

- Supports: OAI-PMH, CKAN, CSW, Marklogic

- Often from institutional dataverse instances and domain specific repos

- Identified 200 and prioritization

- Aim to augment not replace existing repos

- Also link to private data, but show that its private

- Search interface developed at UBC

Metadata

- Use dublin core/datacite standard

- Mapping subject-specific metadata one of the challenges

Deposit repository

Dont want to replace, but provide a place for people who dont have a local or domain-specific options
Globus-based allowing support for very large datasets
Storage is federated
- Currently all in the compute canada cloud
- Also allows institution to bring their own storage

Download via HTTPS and Globus
Supports issuing of DOIs (via DataCite)

Preservation

- Archivematica integration

- Converts file formats into preservation formats

- Working to develop a registry of type mappings

Current status: Limited production

Anyone can download and search
Deposit is limited to a few research groups

Future directions

Developing tools for supporting active research data
Near to point of collection, data changing a lot,
Need a space for sharing data, applying metadata and file organization earlier, etc.
So that data by the time it is ready for publication its ready

Architecture

Globus search platform
Harvesting themselves and other repositories
Globus publication system (open source code deployed)
- Reliant on Globus data transfer

Software citation/publication

Broad definition of data, could preserve/publish repository like this
No virtual environmnets so no way to run software

Main efforts of development

Discovery UI
Publication repository
HTTPS
Preservation
Most effort on the deposit (40%)

Adding a new repository is not too hard. If they expose standard interfaces then its easy. OAI often only has simple dublin core metadata. So only the other interfaces provide much information.

Most dataverse repos only getting dublin core.

Most records from the CKAN and other government ones that expose more metadata

Is there a format for the cross walk?

Document defined by the library community
6 month effort from 8 librarians to map metadata schema

2 year effort

3 close to full time, developers, managers, and other contributors
Many expert groups in Portage are helping

Archivematica as a service

Current approach is not automated, future to make this automated
Not too much effort to set up in basic model
Had to create a farm of 4 servers, plus queue listening for events in globus, every time something published it starts a job to perform the archival
Much configuration to get neccesary performance (e.g., turn off virus scanner etc.)
Investigating support for A/V formats

Resource level

Fairly small VMs running much of this
Storage 50TB and likely increasing soon
Archivematica nodes are bigger to process APEs

Feedback

Positive feedback, hit all use cases that have been given
Other technology that still has its place and this will work with it
- I.e., like datavers for institutional repositories

Repeating in the US

code available and open source
so technically wouldnt be a problem to repeat

Active data

Big problem upcoming is internal discoverability related to research data management
Versioning control
Looking at SEAD/Clowder type approach, OSF has interesting functionality, HubZero, GitLab

----

National Data Service Consortium

2018-03-01 Meeting notes

Date

Attendees

Goals

Discussion items

Action items