Container Analysis Environments (Interest Group)


Contact information:

Group email list: https://groups.google.com/forum/#!forum/container-analysis-environments

Participants:

  • Jing Ge (NCSA/KnowEng)
  • Jai Won Kim (JHU/SciServer)
  • Kacper Kowalik (NCSA/Whole Tale/ytHub)
  • Gerard Lemson (JHU/SciServer)
  • Charles McKay (SDSC/NDS)
  • Pete Myers (HMS)
  • Craig Willis (NCSA/TERRA-REF)

Background:

This interest group emerged from break-out meetings at NDS7 concerning container-based architectures for interactive analysis and batch pipelines. The pattern of providing interactive container-based analysis environments (e.g., Jupyter notebooks, Rstudio, Matlab and other programming environments) near large public datasets has been evolving for several years. Examples can be found in ytHub, SciServer, Whole Tale, and the Labs Workbench projects.  Additionally, we are seeing use cases to support batch processing (including distributed). Examples can be found in KnowEng and the SBGrid projects.

Purpose:

The purpose of this interest group is to share information about container-based architectures to support interactive and batch access to research data. 

Activities:

Specific activities may include:

  • Sharing use cases and system architecture information
  • Coordinating workshop

Breakout Group Summary

Brief summary of discussion at NDS7:

  • SciServer: Uses Docker containers (Jupyter, Matlab, Rstudio) to enable access to several public datasets (astronomy, turbulence, genomics, oceanography), mounted as volumes. They provide custom images for specific domains that include required dependencies. Users are given access to scratch-space and a local database.  Soon to release job submission service. 
  • SBGrid: Repository for instrument data for x-ray diffraction. Data is distributed at multiple sites.  Containerizing workflows using Singularity to run distributed on XSEDE/Comet. Requirements are currently driven by project not users (no interactive analysis). Discussed use of workflow languages.

  • TERRA-REF: Uses Docker via Kubernetes (Labs Workbench) to enable access to large public reference dataset. Labs Workbench supports both official and user-provided containers, including customized Jupyter/Rstudio and development environments (cloud IDEs, console). 
  • KnowEng: Uses containerized pipelines executed via Mesos to support genetics analysis.
  • yt.hub: Combines Girder and Jupyter tmpnb top provide access to very large astronomy datasets. 
  • Mention of SciDas project.