Developing Interactive Parallel Workflows in Python using Parsl

Presenter: Kyle Chard, Fellow and Senior Researcher, University of Chicago and Argonne National Laboratory

Date: October 10, 2018

Slides: https://uofi.box.com/s/va17f5dtsrj9eqn32pjha7sdsaieljxo

Video: https://www.youtube.com/watch?v=vWVN4XbXUgg

Abstract

Python is quickly becoming the predominant programming language used in research. However, it is often challenging to execute Python applications at scale and to develop workflows that integrate a variety of independent Python functions and external applications. Computations that are simple to perform at small scales (e.g., on a laptop) can easily become prohibitively difficult as data sizes and analysis complexity grows, requiring complex orchestration and management of applications and data as well as customization for specific execution environments. In this webinar we will present Parsl (Parallel Scripting Library), a Python library for programming and executing data-oriented workflows at scale. Parsl is designed to be simple and intuitive: developers simply annotate Python functions with Parsl directives (to wrap either Python functions or external applications); Parsl then manages the execution of the script, determines dependencies between functions, orchestrates data movement, and executes functions concurrently when dependencies are met. Parsl separates the code and configuration, allowing the same script to be seamlessly executed on laptops, clusters, clouds, grids, and supercomputers.

In this webinar we will introduce Parsl and demonstrate how it can be used to write and execute data-oriented workflows on Blue Waters. We will show how Parsl can be used within a Jupyter notebook to develop scalable parallel workflows and how these workflows can be executed on arbitrary resources with simple configurations. Finally, we will demonstrate how Parsl can automatically stage data using Globus to transparently analyze remotely accessible data.

The webinar is intended for researchers and developers who are interested in interactive and parallel computing, and particularly those with an interest in developing workflows in Python to run on Blue Waters.

Attendees can follow the webinar in a Jupyter notebook or Python script on Blue Waters. A guide to setting up Jupyter notebooks on Blue Waters is available on the Blue Waters website: https://bluewaters.ncsa.illinois.edu/pythonnotebooks. We will also provide a hosted Jupyter environment for those that wish to try Parsl without installing any dependencies locally.

Target audience: Researchers, developers, and scientific teams.
Prerequisites: None.
Training and reference materials:

Biography

Kyle Chard is a Senior Researcher and Fellow in the Computation Institute at the University of Chicago and Argonne National Laboratory. He received his Ph.D. in Computer Science from Victoria University of Wellington in 2011. His research focuses on developing and applying computational and data-intensive approaches to solve scientific problems. He leads the development of Parsl, a parallel scripting library for implementing scalable data-oriented workflows in Python. He is a member of the Globus leadership team where he co-leads the Globus Labs research group. He also co-leads projects related to scientific reproducibility, elastic and cost-aware use of cloud infrastructure, and research automation.