SBDB


The SBGrid DataBank (https://data.sbgrid.org) supports computational accessibility of its public datasets from identifiers in two main ways: 


  • The download URLs for a given dataset is directly related to the DOI of the dataset.  For example, doi:10.15785/SBGRID/179 is available through rsync://data.sbgrid.org/10.15785/SBGRID/179,rsync://sbgrid.icm.uu.se/10.15785/SBGRID/179, rsync://sbgrid.pasteur.edu/10.15785/SBGRID/179, and rsync://sbgrid.ncpss.org/10.15785/SBGRID/179.  In this example, the host portion of the URL corresponds to DAA members distributing the data files.  This approach has the advantage of simplicity, but disadvantage that is assumes knowledge of the DAA hosts providing a dataset and information about how a given repository relates identifiers to access URLs.
  • The dataset landing page / DOI target URL supports proof-of-concept content-type negotiation (modeled after http://citation.crosscite.org/docs.html ~Mar 10 2017) for BibTex (`application/x-bibtex`), Datacite XML (`application/vnd.datacite.datacite+xml`), and DATS JSON (`application/vnd.biocaddie.dats+json`) https://github.com/biocaddie/WG3-MetadataSpecifications ; the content-type for DATS was improvised because the implementer wasn't able to find a specification for it).  The `accessModalities` key (under `distributions`), provides the access URLs for a given dataset.  Supporting JSON-LD (through API access or content negotation) was investigated, but deferred until the validation tools for JSON-LD improve.


 There are several points of interest:

  • Data files are distributed using the `rsync` protocol, rather than the more common `http`/`https` protocol.
  • Datasets are typically available from multiple geographically distributed hosts.
  • No use cases for individual files, as opposed to all files within a dataset, have been identified.  This is partly a consequence of the domain specificity of the repository; but results in infrastructure focusing on distributing the complete set of data files (with the assumption that recursive transfer of a given directory with contain the complete set), instead of focusing on support for individual files.
  • The SBDB is collaborating with Dataverse (http://dataverse.org); when the SBDB has migrated some portion of this description will probably become obsolete.