What is the ThinkChicago Workbench?

The ThinkChicago Workbench is a cloud-based service with a set of general-purpose development and data analysis environments to help you to explore your ideas with the ThinkChicago data. All applications run as Docker containers on a system hosted by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.

A few guidelines:

Start only the services that you need. Each account has limited resources. You will likely only need to run one or two of the provided services.
For the larger datasets, don't try to read them all at once. You'll need to work with subsets of the data.
If you have questions/problems, post to Slack (https://thinkchicago.slack.com/)

What data is available?

The ThinkChicago workbench provides access to the following datasets. Many of these datasets are also available via the City of Chicago Data Portal REST API.

Dataset	Description	API	Size (Format)	Records
2FM Tech Challenge	Fleet and Facilities Management (2FM) vehicle and equipment data.	N/A	534M CSV
Array of Things Locations	Locations of Array of Things sensor nodes.	REST API	6.5K CSV; 28K JSON
Crimes 2001 - present	Incidents of crime since 2001.	REST API	1.5G CSV	6.39 million
Divvy Trips	Individual Divvy bike sharing trips, including the origin, destination, and timestamps for each trip	REST API	2.8G CSV	11.5 million
Divvy Bicycle Stations (historical)	Historical availability of bicycles and docks to return bicycles at the Divvy stations.	REST API	14G CSV	87 million
Taxi Trips	Taxi trips reported to the City of Chicago.	REST API	42GB CSV	111 million rows

Each of these datasets is available in the /shared directory of any running application in Workbench.

Note that some of these files are large and you will not be able to read the entire file into memory. Please prepare accordingly (and see resource limits below). You can either work with subsets of the data (via commands like head -1000) or use the provided REST APIs.

F.A.Q.

What applications are available?

Cloud9 development environments for popular languages including Python, Java, PHP, and Node.js
Data analysis environments including Jupyter Notebooks and RStudio
Database and data management software including MySQL, PostgresSQL, and MongoDB

What are the limits on my account?

Each user account is limited to 4 cores, 8GB RAM memory, and 10GB storage. This means that you will not be able to read entire datasets into memory or start many different applications.

How do I get data on/off of Workbench?

Github: We strongly encourage you to use Github (or similar service) to store all of your source code and data.
File Manager: The Workbench File Manager application can be used to upload/download data.

Where can I find the provided datasets?

Along with the /shared directory described above, your /home directory is available in every running application. Just look under /home/<userid>.

Where can I get further help?

Join the ThinkChicago Slack team https://thinkchicago.slack.com/ and post a message.
Send an email to ndslabs-support@nationaldataservice.org

Are there any examples available?

Yes, please take a look at https://github.com/nds-org/thinkchicago-examples for some simple examples to help you get started. We've provided some simple examples in common programming languages for Cloud9 as well as for RStudio.