Open Source Scientific Gateways for Geospatial Data

Presenter: Luigi Marini

November 10, 2020

Abstract

Whether it is sensors in the field, external data providers, results of lab analyses, remote sensing data, images from drones or field scanners, modern virtual observatories require a wide variety of features to aggregate data from multiple sources and support cross-cutting research. We provide an overview of two open source frameworks developed with these requirements in mind.

The Clowder framework has been developed to link the needs of long tail data and big data. The framework provides multiple ways to extend its functionality to new domains and datasets, recognizing that a large quantity of research data is characterized by custom data formats and analytics. Developers can build custom data pipelines by using existing data preprocessing and on-demand data analysis extractors as well as develop new ones. New web based visualizations can be developed in plain Javascript. An extensive web API supports the development of custom clients and ingestion pipelines.

The Geostreams data framework provides data management capabilities and web application interfaces for pre-processing, cleaning, and visualization of geospatial and streaming data.

We will draw from a variety of use cases to highlight how these frameworks can be applied and why certain design decisions were made, including: the Terra-Ref platform to catalog the output of the Lemnatec Field Scanalyzer in Arizona, the largest high-throughput phenotyping field-scanning robot in the world; the Intensively Managed Landscapes Critical Zone Observatory, collecting environmental data in the Midwest with the aim to understand the short-term and long-term resilience of the crucial ecological, hydrological, and climatic services provided by the Critical Zone; the Great Lakes to Gulf Virtual Observatory, collecting water quality monitoring data aggregated from multiple sources along the Mississippi River and its tributaries concerning excess nutrient and hypoxia in the Gulf of Mexico; the Permafrost Discovery Gateway, an online platform for archiving, processing, analysis, and visualization of permafrost big imagery products.

Biography

Luigi Marini is Lead Research Programmer at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. He is the software architect of the Clowder data management framework and the Geostreams data platform. He has 20 years of experience developing software for generic cyberinfrastructure and e-Science in a variety of domains, including earth sciences, data curation, clinical informatics and digital humanities.