2018-03-01 Meeting notes

Date

Attendees

  • chard

  • Jim

  • Ray
  • Craig
  • Larry
  • Alainna
  • Todd (CC)
  • Lee (CC)
  • Sandra

Goals

Discussion items

TimeItemWhoNotes
FRDR PresentationLee and Todd
Collaboration between compute canada and canadian association of research libraries.
    - Hosted on compute candada hardware and providing tech support
    - Service side operated by Portage (e.g., curation)
FRDR
  • platform for digital research data management and discovery of canadian research data
  • Main areas: discvoery, deposit, preservation
Discovery
    - Harvester that harvests 31 canadian repositories (125K datasets)
    - Supports: OAI-PMH, CKAN, CSW, Marklogic
    - Often from institutional dataverse instances and domain specific repos
        - Identified 200 and prioritization
    - Aim to augment not replace existing repos
    - Also link to private data, but show that its private
    - Search interface developed at UBC
Metadata
    - Use dublin core/datacite standard
    -  Mapping subject-specific metadata one of the challenges
Deposit repository
  • Dont want to replace, but provide a place for people who dont have a local or domain-specific options
  • Globus-based allowing support for very large datasets
  • Storage is federated
    • Currently all in the compute canada cloud
    • Also allows institution to bring their own storage
  • Download via HTTPS and Globus
  • Supports issuing of DOIs (via DataCite)
Preservation
    - Archivematica integration
    - Converts file formats into preservation formats

- Working to develop a registry of type mappings


Current status: Limited production

  • Anyone can download and search
  • Deposit is limited to a few research groups

Future directions

  • Developing tools for supporting active research data
  • Near to point of collection, data changing a lot, 
  • Need a space for sharing data, applying metadata and file organization earlier, etc.
  • So that data by the time it is ready for publication its ready

Architecture

  • Globus search platform
  • Harvesting themselves and other repositories
  • Globus publication system (open source code deployed)
    • Reliant on Globus data transfer

Software citation/publication

  • Broad definition of data, could preserve/publish repository like this
  • No virtual environmnets so no way to run software

Main efforts of development

  • Discovery UI
  • Publication repository
  • HTTPS
  • Preservation
  • Most effort on the deposit (40%)

Adding a new repository is not too hard. If they expose standard interfaces then its easy. OAI often only has simple dublin core metadata. So only the other interfaces provide much information. 

Most dataverse repos only getting dublin core. 

Most records from the CKAN and other government ones that expose more metadata

Is there a format for the cross walk?

  • Document defined by the library community
  • 6 month effort from 8 librarians to map metadata schema

2 year effort

  • 3 close to full time, developers, managers, and other contributors
  • Many expert groups in Portage are helping 

Archivematica as a service

  • Current approach is not automated, future to make this automated
  • Not too much effort to set up in basic model
  • Had to create a farm of 4 servers, plus queue listening for events in globus, every time something published it starts a job to perform the archival
  • Much configuration to get neccesary performance (e.g., turn off virus scanner etc.)
  • Investigating support for A/V formats

Resource level

  • Fairly small VMs running much of this
  • Storage 50TB and likely increasing soon
  • Archivematica nodes are bigger to process APEs

Feedback

  • Positive feedback, hit all use cases that have been given
  • Other technology that still has its place and this will work with it
    • I.e., like datavers for institutional repositories

Repeating in the US

  • code available and open source
  • so technically wouldnt be a problem to repeat

Active data

  • Big problem upcoming is internal discoverability related to research data management
  • Versioning control
  • Looking at SEAD/Clowder type approach, OSF has interesting functionality, HubZero, GitLab


  • ----



Action items

  •