Developer's Guide

The Labs Workbench is currently in beta testing. Documentation is subject to change.

This document refers to the Labs Workbench demonstration instance available at https://www.workbench.nationaldataservice.org.


Introduction

This guide is for users of the Labs Workbench that are interested in:

  • Packaging your own applications for use in Labs Workbench
  • Adding your application to Labs Workbench
  • Developing applications using Labs Workbench

Packaging your application

Preparing your application for use in Labs Workbench requires the following:

  • Creating one or more Docker images
  • Modifying startup scripts to support/use environment variables, particularly for dependencies
  • Identify ports and volumes
  • Following logging best practices
  • Estimating your resource requirements.

Creating a Docker image

The Labs Workbench requires that all services be packaged as Docker containers.

If you are unfamiliar with Docker, start with one of these tutorials:

Note that you can use the Docker service provided in the Labs Workbench if you don't have access to Docker locally.

A few things to keep in mind:

  • Separate installation from configuration:  installation should occur in your Dockerfile, dynamic configuration should happen during container startup. 
  • Use official images where possible
  • Always pin version numbers (avoid "latest")
  • Run dependent services in separate containers. If your service requires MySQL, configure it as a dependency, don't install it in your image. 
  • Always use the VOLUME instruction to specify the location of mutable data.  For example, Postgres data directory.

Docker best practices

Here is a summary of points from the Docker Best Practices and the Docker Official Images guidelines.
  • Containers are ephemeral: they can be stopped and destroyed and a new one built with absolute minimum setup and configuration.
  • One process per container
  • Minimize layers
  • Sort multi-line arguments and split RUN statements for readability
  • Use official images as basis for images (Docker recommends Debian because it's small and kept up-to-date)
  • Pin versions to avoid failures. Always pin the version of the main application, if installing via apt-get or similar.
  • Use CMD or ENTRYPOINT for service-based images. 
    • ENTRYPOINT should be used when you want the container to behave exclusively as if it were the executable it's wrapping. 
    • CMD should be used if the user needs flexibility to run any executable they choose when starting the container.
    • You can combine ENTRYPOINT and CMD to specify a default executable with default arguments that can be overridden
    • Understand shell vs exec uses of CMD.  Prefer exec form for signal handling.
    • If image requires initialization, use ENTRYPOINT and CMD
    • Make sure "bash" works too
    • Model entrypoint.sh use after Postgres
  • Use EXPOSE for ports
  • Use VOLUME for mutable/customizable parts of image
  • Use WORKDIR for clarity
  • Use https where possible, import PGP keys with full fingerprint to check package signing, embed checksums in Dockerfile if PGP signing not available
  • Forked/detached processes: Containers will stop when the process specified in the "docker run" command stops. For services that generally run detached (e.g., httpd, nginx, postgres), the Dockerfile should call a command that runs in the foreground. For example, the Apache httpd has a "-DFOREGROUND" option.

Dependencies

Often our applications depend on other services.  A particular web application might consist of a web server, application server, database, and search engine. Dependencies introduce complexity when running containers. For example:

  • Startup order:  we need a dependent service to be running before we start another one.
  • Shared configuration: we want to share configuration between services – such as a username or password.
  • Initialization: during first run, we need to initialize databases, etc.
  • Restartability: if a container is restarted (which it will be), we need to restart without re-initializing.

Labs Workbench supports dependencies through the "dependencies" array in the service specification.  Dependencies are either required or optional.  If required, dependencies are always started before the dependent services. 

For all services associated with a single application, Labs Workbench injects environment variables into the containers during startup.  These environment variables follow the Docker compose model of "SERVICEKEY_PORT_PORTNUM_TCP_ADDR" and  "SERVICEKEY_PORT_PORTNUM_TCP_PORT". For example, if your service depends on MongoDB (key=mongo), your container will have the following environment variables at runtime:

$ env | grep ^MONGO
MONGO_PORT_27017_TCP_ADDR=<internal IP>                                                            
MONGO_PORT_27017_TCP_PORT=27017

Your application can always rely on these environment variables to be present.

Environment Variables

Labs Workbench supports the following types of environment variables:

  • Dependencies: as discussed above, environment variables providing the IP and PORT of all services in a stack are injected into all related containers.
  • System variables: provided for convenience in all containers
  • Custom variables

System variables

VariableDescription
NDSLABS_DOMAINDomain of the Workbench cluster
NDSLABS_HOSTNAMEFull host name of your running service
NDSLABS_STACKUnique ID of your application stack
NDSLABS_HOMEHome directory for your account
NDSLABS_EMAILAccount email address

Custom variables

Labs Workbench supports custom configuration through the use of environment variables. 

Accounts and permissions

Currently, containers always run as root. However, because some containers use non-root users, permissions on your account filesystem are permissive.

Ports

TODO: Protocols, port numbers, access type, and context path

Volumes

TODO: Service volumes versus customization

Logging

Labs Workbench relies on Docker logging. The Docker logging driver reads log events from stderr and stdout only. If possible, all application log messages should be sent to stdout or stderr.  If your application logs to a file, you will not be able to view it through the Labs Workbench log viewer, but you can always use the console feature to access it.

Estimating Resource Requirements

In a future release, the Workbench will provide usage information for each container to help estimate CPU and memory requirements.  In the meantime, the best solution is to run your container outside of the Workbench or in the provided Docker environment and use the "docker stats" command.

$ docker stats 
CONTAINER     CPU %    MEM USAGE / LIMIT     MEM %   NET I/O               BLOCK I/O
201337f8e5c7  0.00%    1.987 MB / 8.376 GB   0.02%   385.2 kB / 1.778 MB   0 B / 0 B

This information can be used to estimate service resource requirements, including min/max CPU and min/max memory.  Memory estimates are more important at this time. Most services will likely work with the default CPU limits.

Readiness Probes

Privileged containers

Kubernetes concepts

Labs Workbench uses Kubernetes internally for container orchestration. While you do not need to understand Kubernetes to use or develop for Labs Workbench, some concepts might useful for those interested:

  • Each account is a namespace
  • Each application service has a service and a replication controller with a single replica pod
  • If the service is configured to be external, it has an an ingress rule

Adding your application to NDS Labs

To submit to the official NDS Labs catalog, your image must:

  • have an explicit tag on base image (i.e. not "latest")
  • have an explicit tag on resulting image (i.e. not "latest")

Automated Builds

You can actually set up Docker Hub to automatically build your images when changes occur to the underlying source.

We recommend setting up such automated builds for any custom Docker images referenced by your spec(s).

A fantastic guide for doing so can be found here: https://docs.docker.com/docker-hub/builds/

You simply need to:

  1. Link a Github / Bitbucket account to your Docker Hub account.
  2. Create an Automated Build pointed at the repository containing your Dockerfile.
  3. Give a set of rules outlining which branches / tags should be built and where they should be pushed to upon success.

Limitations of Automated Builds

While extremely convenient for asynchronous development and testing, Docker Hub's automated builds are by no means perfect.

if you desire a more rapid "code-build-test-repeat" workflow, you may want to stick to manual builds until you reach a stable state as speed seems to be the main problem with automated builds.

Some things that contribute to the slow build times:

  1. The free tier only allows one concurrent build per-user or per-organization (it uses a queue)
  2. It doesn't seem to use the caching mechanism that a local Docker build would (it passes in --no-cache to docker build by default)

Image Tags and Versioning

You will likely want to stick with this pattern of semantic versioning that has worked well for other projects.

Below is a set of build rules that can help keep your images up-to-date automatically and versioned appropriately.

Master Branch → "Latest" Tag

New commits pushed to master on GitHub should be built and pushed to latest on Docker Hub.


This way, latest will always reflect your latest pushed code on the master branch.

Git Tag → Version Tag

New tags pushed to GitHub (i.e. 1.0.0) should be built and pushed to the same name on Docker Hub.

Benefits of proper versioning:

  1. these images should never change after you create them
  2. allow you to keep track of proper versions

Git Tag → "Stable" Tag

In addition, you may want every new tag pushed to be built and pushed to stable.

This way, the stable tag will always be your latest tagged release on GitHub.

Submit a Pull Request

Our official application specs for Labs Workbench are housed here: https://github.com/nds-org/ndslabs-specs

The following snippet can help you add your service to the Workbench's official catalog:

Fork the ndslabs-specs repository

This will create a personal copy of the repo that you can clone and modify as you see fit:

  1. Navigate to the link above
  2. Sign into GitHub, if you are not already
  3. In the top-right corner of the page, click Fork

See https://help.github.com/articles/fork-a-repo/ for more information


Clone your forked repository

export gitUsername=YOUR_GIT_USERNAME_HERE
git clone https://github.com/${gitUsername}/ndslabs-specs
cd ndslabs-specs/

Create a new Branch

export specKey=YOUR_SPEC_KEY_HERE
git checkout -b ${specKey}

Create a new folder and add each new spec

For each application (i.e. entries in the catalog that users can add), create a new folder and place your spec(s) there.

mkdir ${specKey}
cd ${specKey}/
vi ${specKey}.json # Paste contents of JSON spec into this file
git add ${specKey}.json

Repeat this step for all dependent services that users will need to run this application.

Add, Commit, and Push all new specs

git commit -a -m "Added new application: ${specKey}"
git push origin 

Create a Pull Request

This will create a request to merge the changes from your branch back into the master branch of the official Labs Workbench catalog:

  1. Navigate to https://github.com/YOUR_GIT_USERNAME_HERE/ndslabs-specs

  2. Click on New pull request at the top-left of the file table

  3. Click the compare across forks link at the top-right. This should display some new drop-down menus in the section directly below it.

  4. In the box on the left, choose:

    1. base fork: nds-org/ndslabs-specs

    2. base branch: master

  5. In the box on the right, choose:
    1. head fork: YOUR_GIT_USERNAME_HERE/ndslabs-specs
    2. compare: YOUR_SPEC_KEY_HERE
  6. Click the dialogue below the drop-down menus to create a new pull request
  7. Enter a name / description for this pull request. Please include in your description any relevant links to technical documentation regarding this service.

See https://help.github.com/articles/creating-a-pull-request/ for more information


Development using Labs Workbench