Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 24 Next »

Information about the 2017 Container Analysis Environments Workshop hosted by the Data Exploration Lab and NDS.



Date/Location

Goals

The idea of this workshop emerged from the 7th NDS Workshop as an opportunity for groups leveraging container-based technologies in research computing environments to build community, share information and address common challenges. 

We are proposing the following themes (please feel free to suggest other topics):

  • Users/researchers and why they are using container-based environments (e.g., consistent environments, packaging applications, shifting computation around)
  • Interactive vs non-interactive use cases including approaches to workflow management, orchestration, scalability, and moving analysis to HPC environments
  • Data/storage focusing on models for managing and exposing data to containers
  • Security/permissions/licensing
  • Challenges/what new use cases are we seeing?

The goal will be to approach this from both user and system perspectives, focusing on the science/research cases and how technical solutions are addressing these needs.

Workshop report

  • One output of the workshop will be a report summarizing common use cases, best practices, and challenges supporting container-based analysis and computing environments.

Participating groups (tentative)


ProjectsDescription

SciServer

(Kim, Lemson)

SciServer Compute supports interactive and batch access to to multiple large public datasets across several domains (including the Sloan Digital Sky Survey) via containers. They support Rstudio/Jupyter/Matlab interactive environments and have developed a custom job scheduler for containers, each with supporting scripting libraries for SciServer component and data integration. SciServer compute supports SSO with user defined group access to shared storage and access to centralized datasets, some in relational databases

Cyverse

(McEwan, Fronner)


Whole Tale, yt.Hub, RSL, Data Exploration Lab

(Turk, Kowalik)

The yt Hub provides access to very large datasets (both observation and simulation based) via the integration of Girder and Jupyter Notebook/Lab. Entire available data is locally mounted to compute nodes of a Docker Swarm cluster via NFS. However, the physical location of the data is abstracted through a FUSE filesystem, which allows to provide only a subset of data selected by the user inside the container running Jupyter Notebook/Lab.

The basic architecture of the yt Hub: Girder + remote environment with data selection, is currently being extended as a part of the Whole Tale project, which provides (among other things) the ability to launch containerized applications over a wide variety of the *remote* datasets (e.g., via DataOne). They are addressing complexity of exposing data to containers via a variety of underlying mechanisms (posix, S3, HTTP, Globus, etc) through a data management framework. In contrast to the yt Hub, data is provided inside the computing environment on demand using a sync mechanism and local cache, rather than being served locally. Containers also play a role in provenance/preservation of scientific workflows and publication process.

The Renaissance Labs project will leverage this same approach to provide access to the Renaissance Simulations at SDSC – adding the ability to move analysis to HPC resources and adding a custom UI.

TERRA-REF

(LeBauer, Burnette)

Blue Waters


NDS 

(Willis, Lambert, Coakley)

The NDS Labs Workbench is a generic platform for launching containerized environments near remote datasets, leveraging Kubernetes. Labs Workbench is deployed on OpenStack as a Kubernetes cluster with GlusterFS for a shared user filesystem across containers (e.g., home directory). Workbench is used by the TERRA-REF project and increasingly for training/education environments (hackathons, workshops, bootcamps, etc). The DataDNS project is an emerging vision for supporting access to remote computational environments. Workbench is a single optional component of the DataDNS framework.

CyberGIS

(Liu, Terstriep)


SDSC (Zonca)Deployment of Jupyterhub with Docker Swarm and batch spawner support in HPC environments in support of science gateways, research, and training/education.


Format:

The format of the workshop will be presentations mixed with discussions, working groups, or deep-dive presentations. The working groups will be self-organized, based on the goals and needs of participants. Each person/group will present for ~20 minutes. Participants will be from a variety of backgrounds. Presentations should cover supported science cases, overall systems architecture, and any challenges/new directions related to supporting container-based analysis and computing environments.

Communication:

Schedule (tentative):

Below is the tentative workshop schedule.  Presentations and times are subject to change. Please contact us with any dietary restrictions.

Monday August 14th

Tentative: The first day we will focus on participant presentations with significant Q&A and discussion time.  Topics that emerge from the presentations will be considered for breakout/deep-dive discussions on Tuesday and Wednesday. 

TimeDescription
9:00 - 9:30Introduction/welcomeNDS/DataExpLab
9:30 - 11:00

Presentations

Discussion

Cyverse

Cyverse/TACC

SciServer

11:00 - 11:15Break
11:15 - 12:30

Presentations

Discussion

yt.Hub

Whole Tale

RSL

12:30 - 1:30Lunch
1:30 - 3:00

Presentations

Discussion

Blue Waters

LSST/DES?

3:00 - 3:15Break
3:15 - 4:00

Presentations

Discussion

TERRA-REF

CyberGIS

NDS Labs

4:00 - 5:00Discussion/planning

Tuesday August 15th

Tentative: Based on discussions from Monday, we will organize breakout/deep-dive discussions for topics of interest (feel free to suggest).  Additional presentations may be scheduled, depending on participation:


TimeDescription
9:00 - 9:30Discussion/planning
9:30 - 11:30Breakout groupsBreak ~10:45
11:30 - 12:00Group discussion
1:00 - 3:00Breakout groups
3:00 - 3:15Break
3:15 - 4:30Presentations/discussion
4:30 - 5:00Planning

Wednesday August 16th

Tentative: We will use the morning to conclude any break-out work and to consolidate group output (presentations, documents, etc) for the workshop report.

TimeDescription
9:00 - 9:30Discussion/planning
9:30 - 11:30Breakout groups
11:30 - 12:00Closing
  • No labels