Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Below are brief descriptions of the participating groups that might be useful as you consider your presentation topics.  Please let me know if I've made any mistakes or feel free to correct/add detail as you see fit.

ProjectsDescription

SciServer

(Kim, Lemson)

SciServer Compute supports interactive and batch access to to multiple large public datasets across several domains (including the Sloan Digital Sky Survey) via containers. They support Rstudio/Jupyter/Matlab interactive environments and have developed a custom job scheduler for containers, each with supporting scripting libraries for SciServer component and data integration. SciServer compute supports SSO with user defined group access to shared storage and access to centralized datasets, some in relational databases

See also

Cyverse

(McEwan, Fronner)

The Cyverse Discovery Environment (DE) uses containers to support customizable, non-interactive workflows for data stored in Cyverse using HTCondor. They are also working to support interactive access. Through Atmosphere, Cyverse support's provisioning cloud resources on demand for researchers with access to HPC resources through TACC.

TACC has installed Singularity container support on all of its HPC systems and is working with BioContainers to make 2400+ BioConda applications findable and accessible at TACC or any HPC system that supports Singularity. The end result of these efforts is to support all BioConda packages across all Cyverse infrastructure. This already works using Docker on the Cyverse Condor cluster and will also provide solution for other HPC systems using Singularity. 

See also:

Whole Tale, yt.Hub, RSL, Data Exploration Lab

(Turk, Kowalik)

The yt Hub provides access to very large datasets (both observation and simulation based) via the integration of Girder and Jupyter Notebook/Lab. Entire available data is locally mounted to compute nodes of a Docker Swarm cluster via NFS. However, the physical location of the data is abstracted through a FUSE filesystem, which allows to provide only a subset of data selected by the user inside the container running Jupyter Notebook/Lab.

The basic architecture of the yt Hub: Girder + remote environment with data selection, is currently being extended as a part of the Whole Tale project, which provides (among other things) the ability to launch containerized applications over a wide variety of the *remote* datasets (e.g., via DataOne). They are addressing complexity of exposing data to containers via a variety of underlying mechanisms (posix, S3, HTTP, Globus, etc) through a data management framework. In contrast to the yt Hub, data is provided inside the computing environment on demand using a sync mechanism and local cache, rather than being served locally. Containers also play a role in provenance/preservation of scientific workflows and publication process.

The Renaissance Labs project will leverage this same approach to provide access to the Renaissance Simulations at SDSC – adding the ability to move analysis to HPC resources and adding a custom UI.

TERRA-REF

(LeBauer, Burnette)
The TERRA-REF projects provides access to a large reference dataset for plant biology.  They support interactive access to the data via the NDS Labs Workbench (see below), which allows users to deploy customized container environments near the data.  They also use containers for the data processing pipeline, which runs on a combination of VMs and the ROGER cluster.

Blue Waters

The Blue Waters supercomputer now supports containers through NERSC's Shifter. Container support was added to support three use cases (LCH/Atlas, LIGO, LSST/DES). Technical challenges include containerizing applications, handling permissions, storage/IO, and MPI.

See also:

LSST/DES

(Kind)

This is one of the drivers for the BW Shifter implementation.  They also use Docker/Kubernetes for a variety of other services, including DES Labs.

See also:

NDS 

(Willis, Lambert, Coakley)

The NDS Labs Workbench is a generic platform for launching containerized environments near remote datasets, leveraging Kubernetes. Labs Workbench is deployed on OpenStack as a Kubernetes cluster with GlusterFS for a shared user filesystem across containers (e.g., home directory). Workbench is used by the TERRA-REF project and increasingly for training/education environments (hackathons, workshops, bootcamps, etc). The DataDNS project is an emerging vision for supporting access to remote computational environments. Workbench is a single optional component of the DataDNS framework.

See also:

CyberGIS

(Liu, Terstriep)

The CyberGIS project recently developed CyberGIS-Jupyter to integrate cloud-based Jupyter notebooks with HPC resources. The project adopts Jupyter notebooks instead of web GIS as the front-end interface for both developers and users. Advanced GIS capabilities are provided in a pre-configured containerized environment. The system also supports on-demand provisioning to deploy multiple instances of gateway applications.

See also:

SDSC (Zonca)

Deployment of Jupyterhub with Docker Swarm and batch spawner support in HPC environments in support of science gateways, research, and training/education.

See also:

  • No labels