Container Analysis Environments Participants

Below are brief descriptions of the participating groups that might be useful as you consider your presentation topics. Please let me know if I've made any mistakes or feel free to correct/add detail as you see fit.

Projects

Description

SciServer

(Kim, Lemson)

SciServer Compute supports interactive and batch access to to multiple large public datasets across several domains (including the Sloan Digital Sky Survey) via containers. They support Rstudio/Jupyter/Matlab interactive environments and have developed a custom job scheduler for containers, each with supporting scripting libraries for SciServer component and data integration. SciServer compute supports SSO with user defined group access to shared storage and access to centralized datasets, some in relational databases

See also:

Devisetty et al. Bringing your tools to CyVerse Discovery Environment using Docker. F1000Res.

Whole Tale, yt.Hub, RSL, Data Exploration Lab

(Turk, Kowalik)

The yt Hub provides access to very large datasets (both observation and simulation based) via the integration of Girder and Jupyter Notebook/Lab. Entire available data is locally mounted to compute nodes of a Docker Swarm cluster via NFS. However, the physical location of the data is abstracted through a FUSE filesystem, which allows to provide only a subset of data selected by the user inside the container running Jupyter Notebook/Lab.

The basic architecture of the yt Hub: Girder + remote environment with data selection, is currently being extended as a part of the Whole Tale project, which provides (among other things) the ability to launch containerized applications over a wide variety of the *remote* datasets (e.g., via DataOne). They are addressing complexity of exposing data to containers via a variety of underlying mechanisms (posix, S3, HTTP, Globus, etc) through a data management framework. In contrast to the yt Hub, data is provided inside the computing environment on demand using a sync mechanism and local cache, rather than being served locally. Containers also play a role in provenance/preservation of scientific workflows and publication process.

The Renaissance Labs project will leverage this same approach to provide access to the Renaissance Simulations at SDSC – adding the ability to move analysis to HPC resources and adding a custom UI.

TERRA-REF

(LeBauer, Burnette)

Blue Waters

NDS

(Willis, Lambert, Coakley)

The NDS Labs Workbench is a generic platform for launching containerized environments near remote datasets, leveraging Kubernetes. Labs Workbench is deployed on OpenStack as a Kubernetes cluster with GlusterFS for a shared user filesystem across containers (e.g., home directory). Workbench is used by the TERRA-REF project and increasingly for training/education environments (hackathons, workshops, bootcamps, etc). The DataDNS project is an emerging vision for supporting access to remote computational environments. Workbench is a single optional component of the DataDNS framework.

See also:

Willis et al. 2017. Container-based analysis environments for low-barrier access to research data. PEARC'17.

CyberGIS

(Liu, Terstriep)

See also:

Yin et al 2017. A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale. PEARC'17.

SDSC (Zonca)

Deployment of Jupyterhub with Docker Swarm and batch spawner support in HPC environments in support of science gateways, research, and training/education.

See also: