Presenter: Aaron Saxton, National Center for Supercomputing Applications, University of Illinois
Tuesday, October 6, 2020
There are many ways to parallelize computational workflows on HPC systems like Blue Waters and they all come with risks and benefits. The holy grail of paralyzing ML workflows is distributed training. In this process the model is copied, data is partitioned, and both are loaded onto multiple nodes to increase the scale at which we can train. In this talk, we start with the hypothesis that all data can be embedded on some lower dimension manifold. Indeed, this is called the geometric interpretation of data. Since gradient methods are the primary algorithms to optimize ML models on training data, by using the geometric interpretation we are able to visualize and gain insight to the challenges optimization faces. In particular we will explore how this expresses while training at scale. There is no good general theory of ML training, but this will give practitioners some intuitive tools to improve their models to push the limits of scaling.
Aaron Saxton is a Data Scientist who works in the Blue Waters project office at the National Center for Super Computing Applications (NCSA). His current interest is in machine learning, data, and migrating popular data/ML techniques to HPC environments. His career has shifted back and forth between industry and academic ventures. Previous to NCSA he was a data scientist and founding member of the agricultural data company Agrible Inc. participating in crop model development and customer facing deployment. Before that, Aaron worked at Neustar Inc, University of Kentucky, and SAIC. In the summer of 2014, shortly after joining Neustar, Aaron graduated from University of Kentucky to earn his PhD in Mathematics by studying Partial Differential Equations, Operator Theory, and the Schrodinger equation.