Abhiraj Bishnoi
course-material

Repository



Introduction to Scalable Deep Learning

Basic Info
This repo contains the material for the course Scalable Deep Learning.
It is a five half-day course with two lectures and two tutorials each.
Lectures and tutorials are located in the corresponding folders lectures, tutorials .
All Tutorials are set up in the form of jupyter notebooks. Only some tutorials should be executed
in notebooks as well. Most tutorials rely on adapting code in standalone scripts.
Therefore, we strongly suggest the workflow of cloning the repo on our supercomputer
and executing the scripts there.
Jupyter-JSC can be used to view and execute the tutorial code.

Syllabus Outline

Day 1

Lecture 1

Intro

Getting started on a supercomputer

Content: Stefan Kesselheim


Tutorial 1


First Steps on the Supercomputer

Content: Stefan Kesselheim, Jan Ebert
Content supervisor: Stefan Kesselheim


Lecture 2


Supercomputer architecture and MPI Primer

Content: Stefan Kesselheim


Tutorial 2


Hello MPI World

Content: Jan Ebert, Stefan Kesselheim
Content supervisor: Stefan Kesselheim


Day 2

Lecture 1


Intro Large-Scale Deep Learning - Motivation, Deep Learning Basics Recap

Content: Jenia Jitsev


Tutorial 1


Deep Learning Basics Recap

Content: Roshni Kamath, Jenia Jitsev
Content supervisor: Jenia Jitsev


Lecture 2


Distributed Training and Data Parallelism with Horovod

Content: Jenia Jitsev


Tutorial 2


Dataset API and Horovod Data Parallel Training Basics

Content: Jan Ebert, Stefan Kesselheim, Jenia Jitsev
Content supervisor: Jenia Jitsev


Day 3

Lecture 1


Distributed Training with Large Data and Scaling Behavior

Content: Jenia Jitsev


Tutorial 1:


Distributed Training - Throughput and Scaling

Content: Mehdi Cherti, Jenia Jitsev
Content supervisor: Jenia Jitsev


Lecture 2


Is My Code Fast? Performance Analysis

Content: Stefan Kesselheim


Tutorial 2:


Data Pipelines and Performance Analysis

Content: Jan Ebert, Stefan Kesselheim
Content supervisor: Stefan Kesselheim


Day 4

Lecture 1


Distributed Training with Large Data - Combating Accuracy Loss

Content: Jenia Jitsev


Tutorial 1:


Distributed Training with Large Data - Combating Accuracy Loss

Content: Mehdi Cherti
Content supervisor: Jenia Jitsev


Lecture 2


Advanced Distributed Training and Large-Scale Deep Learning Outlook

Content: Jenia Jitsev


Day 5

Lecture 1


Generative Adversarial Networks (GANs) basics

Content: Mehdi Cherti


Tutorial 1


Basic GAN distributed training using Horovod

Content: Mehdi Cherti
Content supervisor: Mehdi Cherti


Lecture 2


Advanced Generative Adversarial Networks

Content: Mehdi Cherti


Tutorial 2


Advanced GAN distributed training using Horovod

Content: Mehdi Cherti
Content supervisor: Mehdi Cherti