Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

Polar Geospatial Center (PGC)

At University of Minnesota, we choose PGC as an example of science driven use cases. The Polar Geospatial Center is an NSF/GEO/PLR funded national center (PLR- 1559691) that provides geospatial and logistical support for the NSF funded polar science community. PGC is also the NSF liaison with the National Geospatial Intelligence Agency’s (NGA) sub-meter commercial imagery program and currently holds a global collection of sub-meter imagery of approximately 3.2 PB comprised of 7.8 million scenes with an area of over 2 billion km2 that is currently increasing at a rate of 2-10 TB daily. PGC also works with NGA to coordinate the tasking of 3 satellites for much for the Poles to address NSF PI science goals. The recent collaboration between PGC, NSF and NGA has lowered the cost of sub-meter commercial imagery from tens of thousands of dollars to pennies per square kilometer. The imagery is now provided to PGC at no cost, with the expectation that PGC and NSF provide the infrastructure to retrieve, maintain, and transfer it to the science community. This increased access to the rich dataset fundamentally changes how federally funded research can be done. Almost any researcher would find this imagery useful for applications as broad as coastline erosion, land use/land cover change, examining the surface expression of earthquakes and ice mass balance. PGC provides imagery to NSF-funded researchers working globally in three forms; raw imagery; value added imagery including orthorectified images, image mosaics and digital elevation models; and geospatial web services.

...

  • We plan to investigate how satellite imageries can be efficiently delivered to NCSA Blue Waters   for processing and how the generated data of 3D models be stored in a distributed environment.
  • We will investigate how the proposed distributed storage system will enable efficient accesses from NSF researchers to the imagery and terrain data

Observational Data from Astronomy

JHU is hosting the database for the Sloan Digital Sky Survey (SDSS) project, often called the “Cosmic Genome Project”.  The data has been one of the first examples of a large open scientific data set. It has been in the public domain for over a decade [Sky02, Sza00]. It is fair to say that the project and its archive have changed astronomy forever. It has shown that a whole community is willing to change its traditional approach and use a virtual tele­scope, if the data is of high quality, and if it is presented in an intuitive fashion. The continuous evolution and curation of the system over the years   has been an intense effort, and has given us a unique perspec­tive of the challenges involved in the broader problem of operating open archival systems over a decade.

...

  1. Use the calibrated SDSS data to build a new set of false color images with a custom parametri­zation, and processing. E.g. build a large mosaic where all objects identified as stars are removed and replaced by blank sky to create an image what the extragalactic sky would look like.

  2. Take a significant dynamic subset of the 3 million SDSS spectra, defined by a SQL query, transmit the spectral data to NCSA and perform a massively parallel machine learning task to classify the given subset of spectra to a much higher resolution. E.g. perform a large scale PCA, then find the small number of the regions in the spectra that contribute the most the given classification, similar to the so called Lick-indices, found heuristically a few decades ago.

  3. Take the data cubes from the SDSS Manga instrument (an integrated fiber unit, IFU) which takes spectra of every course pixel of a given galaxy, and all multicolor images of the same object, then reconstruct the data cube to the best of both the imaging and spectral resolutions using compres­sive sensing. This is an incredibly compute-intensive task, only possible on a supercomputer.

  4. Take a subset of the simulated LSST data and stream it to multiple locations, on demand.

Cross Domain Need for Dataset Publication (NDS, NCSA, SDSC, ...)

MHD Turbulence in Core-Collapse Supernovae
Authors:Philipp Moesta, Christian Ott
Size: 90 TB

...

UIUC/NCSA was a founding member of the Berkeley-Illinois-Maryland (BIMA) millimeter array consortium and its successor the Combined Array for Research in Millimeter Astronomy (CARMA). The CARMA consortium included UIUC, UMd, UCB, Caltech, and U.Chicago. NCSA/UIUC serves as the primary archive site and user access center for these two datasets. The CARMA array ceased operations in 2015 primarily due to the advent of the Atacama Large Millimeter Array (ALMA), however both CARMA and BIMA data remain very useful to the millimeter-interferometry community as both a scientific and technical repository. The datasets total approximately 50TB, contain visibility data in  custom binary formats, images in the standard FITS format, and auxiliary data in XML and ASCII text-file format. The data can be processed using community data analysis packages such as MIRIAD and casa. The point of contact for the archive is Athol Kemball (NCSA)

...

This data is from a suite of high-resolution climate model simulations using the Community Earth System Model (CESM, http://www.cesm.ucar.edu/models/).  It consists of three multi-decadal, pre-industrial control simulations featuring different coupling configurations: a) 25 km atmosphere with specified surface ocean temperatures; b) 25 km atmosphere coupled to  a non-dynamic slab ocean; and c) the 25 km atmosphere coupled to a fully-dynamic 3-dimensional model with 1 degree horizontal resolution.  The output represents the first phase of a comprehensive model sensitivity and climate change experiment focusing on the representation of weather and climate extremes in CESM.  The data has broad applications in a wide range of research areas related to climate change science, Earth system modeling, uncertainty quantification, extreme events, decision support, and risk analysis.  The current total output is around 100 Terabytes and is in netcdf format.  The data includes gridded monthly, daily, and 6-hourly outputs of key relevant climate/weather variables for the atmosphere, land, ocean, and sea-ice models.

...