Presenter: Volodymyr Kindratenko, NCSA, University of Illinois
June 7-18, 2021
Materials and Videos: https://bluewaters.ncsa.illinois.edu/NFI/Webinars/cuda
Abstract
This CUDA parallel programming tutorial with focus on developing applications for NVIDIA GPUs. Computational thinking, forms of parallelism, programming model features, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, and hardware features and limitations will be covered. Specific topics will include: CUDA parallel execution model, CUDA memory model, locality, constant cache, shared memory, atomic operations, tiled matrix multiplication, 1D and 2D convolution kernels, reduction trees, parallel scan, histogramming, sparse matrix algorithms, task parallelism and asynchronous data transfer. Good working knowledge of C/C++ is required.
There will be hands-on exercises using the HAL computing system at the University of Illinois. Upon registering you will receive instructions for gaining access to the HAL system. Please follow these instructions in advance of the first session.
Schedule
Day 1, Monday, June 7 – 8 am-noon Central time / 9 am to 1 pm Eastern time
- Introduction: Slides
- Introduction to Parallel Computing and CUDA: Slides and Video
- CUDA Parallel Execution Model: Slides and Video
- Lab 0
- Lab 1: Vector Addition
- Lab 2: Simple Matrix Multiply
Day 2, Wednesday, June 9 – 8 am-noon Central time / 9 am to 1 pm Eastern time
- CUDA Memory Model: Slides and Video
- Locality and Tiled Matrix Multiplication: Slides and Video
- Generalized Tiling and DRAM Bandwidth: Slides and Video
- Lab 3: Tiled Matrix Multiply
Where to learn more about NVIDIA’s Nsight: Nsight training
Day 3, Friday, June 11 – 8 am-noon Central time / 9 am to 1 pm Eastern time
- Convolution concept (1D and 2D); 1D Basic Convolution Kernel, and constant cache: Slides and Video
- 2D Tiled Convolution Kernel and Constant Memory: Slides and Video
- Shared Memory Data Reuse and Memory Bandwidth Benefit Analysis for 1D and 2D Tiled Convolution Kernels: Slides and Video
- Lab 4: 3D Convolution
Day 4, Monday, June 14 – 8 am-noon Central time / 9 am to 1 pm Eastern time
- Reduction Tree: Slides and Video
- Parallel Scan (Prefix-Sum) – Kogge-Stone: Slides and Video
- Parallel Scan – Brent-Kung: Slides and Video
- Lab 5.1: List Reduction
- Lab 5.2: Scan
Day 5, Wednesday, June 16 – 8 am-noon Central time / 9 am to 1 pm Eastern time
- Histogramming and Atomic Operations: Slides and Video
- Sparse Matrix: Slides and Video
- Spare Matrix – part 2: Slides and Video
- Lab 6: Histogramming
- Lab 7: Sparse Matrix Multiply
Day 6, Friday, June 18 – 8 am-noon Central time / 9 am to 1 pm Eastern time
- GPU as a part of the PC Architecture: Slides and Video
- Task parallelism and asynchronous data transfer: Slides and Video
- Course wrap-up: Slides and Video
- Lab 8: Streams
Prerequisites: Good working knowledge of C/C++ is required.
Biography
Dr. Kindratenko is a Senior Research Scientist at the National Center for Supercomputing Applications, an Adjunct Associate Professor in the Department of Electrical and Computer Engineering and a Research Associate Professor in the Department of Computer Science at the University of Illinois. He received D.Sc. degree from the University of Antwerp, Belgium, in 1997 and prior to that graduated from the State Pedagogical University, Kirovograd, Ukraine, in 1993. Dr. Kindratenkos research interests include high-performance computing, special-purpose computing architectures, and machine learning. He serves as a department editor of IEEE Computing in Science and Engineering magazine and an associate editor of the International Journal of Reconfigurable Computing. Dr. Kindratenkos work has been funded by NSF, NASA, ONR, DOE, and industry. He has published over 70 papers in refereed scientific journals and conference proceedings and holds 4 US patents. He is a Senior Member of IEEE and ACM.