Introduction to Scalable Deep Learning
Basic Info
This repo contains the material for the course Scalable Deep Learning.
It is a five half-day course with two lectures and two tutorials each. Lectures and tutorials are located in the corresponding folders lectures, tutorials .
All Tutorials are set up in the form of jupyter notebooks. Only some tutorials should be executed in notebooks as well. Most tutorials rely on adapting code in standalone scripts. Therefore, we strongly suggest the workflow of cloning the repo on our supercomputer and executing the scripts there.
Jupyter-JSC can be used to view and execute the tutorial code.
Syllabus Outline
Day 1
Lecture 1
- Intro
-
Getting started on a supercomputer
- Content: Stefan Kesselheim
Tutorial 1
-
First Steps on the Supercomputer
- Content: Stefan Kesselheim, Jan Ebert
- Content supervisor: Stefan Kesselheim
Lecture 2
-
Supercomputer architecture and MPI Primer
- Content: Stefan Kesselheim
Tutorial 2
-
Hello MPI World
- Content: Jan Ebert, Stefan Kesselheim
- Content supervisor: Stefan Kesselheim
Day 2
Lecture 1
-
Intro Large-Scale Deep Learning - Motivation, Deep Learning Basics Recap
- Content: Jenia Jitsev
Tutorial 1
-
Deep Learning Basics Recap
- Content: Roshni Kamath, Jenia Jitsev
- Content supervisor: Jenia Jitsev
Lecture 2
-
Distributed Training and Data Parallelism with Horovod
- Content: Jenia Jitsev
Tutorial 2
-
Dataset API and Horovod Data Parallel Training Basics
- Content: Jan Ebert, Stefan Kesselheim, Jenia Jitsev
- Content supervisor: Jenia Jitsev
Day 3
Lecture 1
-
Distributed Training with Large Data and Scaling Behavior
- Content: Jenia Jitsev
Tutorial 1:
-
Distributed Training - Throughput and Scaling
- Content: Mehdi Cherti, Jenia Jitsev
- Content supervisor: Jenia Jitsev
Lecture 2
-
Is My Code Fast? Performance Analysis
- Content: Stefan Kesselheim
Tutorial 2:
-
Data Pipelines and Performance Analysis
- Content: Jan Ebert, Stefan Kesselheim
- Content supervisor: Stefan Kesselheim
Day 4
Lecture 1
-
Distributed Training with Large Data - Combating Accuracy Loss
- Content: Jenia Jitsev
Tutorial 1:
-
Distributed Training with Large Data - Combating Accuracy Loss
- Content: Mehdi Cherti
- Content supervisor: Jenia Jitsev
Lecture 2
-
Advanced Distributed Training and Large-Scale Deep Learning Outlook
- Content: Jenia Jitsev
Day 5
Lecture 1
-
Generative Adversarial Networks (GANs) basics
- Content: Mehdi Cherti
Tutorial 1
-
Basic GAN distributed training using Horovod
- Content: Mehdi Cherti
- Content supervisor: Mehdi Cherti
Lecture 2
-
Advanced Generative Adversarial Networks
- Content: Mehdi Cherti
Tutorial 2
-
Advanced GAN distributed training using Horovod
- Content: Mehdi Cherti
- Content supervisor: Mehdi Cherti