From 713cda0373f6373815136a9ffad48bd755c3323d Mon Sep 17 00:00:00 2001
From: Fahad Khalid <f.khalid@fz-juelich.de>
Date: Mon, 2 Sep 2019 13:51:23 +0200
Subject: [PATCH] 1) Removed PyTorch samples. 2) Updated README files. 3)
 Verified that all the training scripts are upto date with those available in
 the corresponding framework repos.

---
 README.md                                 |  51 +++---
 caffe/README.md                           |   2 +-
 horovod/README.md                         |  19 +-
 horovod/keras/mnist.py                    |   2 +-
 horovod/pytorch/.submit_job_jureca.sh     |  22 ---
 horovod/pytorch/.submit_job_juwels.sh     |  22 ---
 horovod/pytorch/mnist.py                  | 200 ----------------------
 horovod/pytorch/run_on_localMachine.sh    |   8 -
 horovod/pytorch/submit_job_juron.sh       |  20 ---
 horovod/pytorch/synthetic_benchmark.py    | 110 ------------
 horovod/tensorflow/mnist.py               |   2 +-
 horovod/tensorflow/synthetic_benchmark.py |   4 +-
 keras/README.md                           |   2 +-
 pytorch/.submit_job_juwels.sh             |  20 ---
 pytorch/README.md                         |  13 --
 pytorch/mnist.py                          | 151 ----------------
 pytorch/run_on_localMachine.sh            |   4 -
 pytorch/submit_job_jureca.sh              |  20 ---
 pytorch/submit_job_juron.sh               |  18 --
 tensorflow/README.md                      |   2 +-
 20 files changed, 38 insertions(+), 654 deletions(-)
 delete mode 100755 horovod/pytorch/.submit_job_jureca.sh
 delete mode 100755 horovod/pytorch/.submit_job_juwels.sh
 delete mode 100644 horovod/pytorch/mnist.py
 delete mode 100644 horovod/pytorch/run_on_localMachine.sh
 delete mode 100644 horovod/pytorch/submit_job_juron.sh
 delete mode 100644 horovod/pytorch/synthetic_benchmark.py
 delete mode 100755 pytorch/.submit_job_juwels.sh
 delete mode 100644 pytorch/README.md
 delete mode 100644 pytorch/mnist.py
 delete mode 100644 pytorch/run_on_localMachine.sh
 delete mode 100755 pytorch/submit_job_jureca.sh
 delete mode 100644 pytorch/submit_job_juron.sh

diff --git a/README.md b/README.md
index 9938dad..78c4874 100644
--- a/README.md
+++ b/README.md
@@ -1,23 +1,26 @@
 # Getting started with Deep Learning on Supercomputers
 
 This repository is intended to serve as a tutorial for anyone interested in utilizing the supercomputers 
-available at the JSC for deep learning based projects. It is assumed that the reader is proficient in one or 
-more of the following frameworks:
+available at the Jülich Supercomputing Center (JSC) for deep learning based projects. It is assumed that 
+the reader is proficient in one or more of the following frameworks:
 
 *    [Tensorflow](https://www.tensorflow.org/)
 *    [Keras](https://keras.io/)
-*    [PyTorch](https://pytorch.org/)
-*    [Caffe](http://caffe.berkeleyvision.org/)
 *    [Horovod](https://github.com/horovod/horovod)
+*    [Caffe](http://caffe.berkeleyvision.org/) (limited support)
 
 **Note:** This tutorial is by no means intended as an introduction to deep learning, or to any of the
 above mentioned frameworks. If you are interested in educational resources for beginners, please
 visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/Education) page.
 
-## Announcements
+### Announcements
 
 1. Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well.
 2. Python 2 support has been removed from the tutorial for all frameworks except Caffe.
+3. Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch 
+examples have been removed from this tutorial. This is due to the fact that the tutorial
+developers are not currently working with PyTorch, and are therefore not in a position to provide
+support for PyTorch related issues.
 
 # Table of contents
 <!-- TOC -->
@@ -93,20 +96,23 @@ Otherwise please join the `PADC` and `CPADC` projects.
 
 ### 4.1 JURECA and JUWELS
 
-Following are the steps required to login (more information 
-[here](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/UserInfo/QuickIntroduction.html?nn=1803700)).
+Following are the steps required to login (more information: 
+[JURECA](https://apps.fz-juelich.de/jsc/hps/jureca/access.html#access), 
+[JUWELS](https://apps.fz-juelich.de/jsc/hps/juwels/access.html#access)).
 
-1.  Use SSH to login:
+1.  Use SSH to login. Use one of the following commands, depending on your target system:
     
-    `ssh <username>@jureca.fz-juelich.de`
+    `ssh <username>@jureca.fz-juelich.de` or `ssh <username>@juwels.fz-juelich.de`
 2.  Upon successful login, activate your project environment:
 
     `jutil env activate -p <name of compute project> -A <name of budget>`
     
-    **Note:** To view a list of all project and budget names available to you, please use the following command: `jutil user projects -o columns`.
-    Under the column titled "project", all names that start with the prefix "c" are compute projects, and 
+    **Note:** To view a list of all project and budget names available to you, please use the following command: 
+    `jutil user projects -o columns`. Each name under the column titled "project" has a corresponding type under the
+    column titled "project-type". All projects with "project-type" "C" are compute projects, and 
     can be used in the `<name of compute project>` field for the command above. The `<name of budget>` field should then
-    contain the corresponding name under the "budgets" column. Please click [here](http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/NewUsageModel/NewUsageModel_node.html)
+    contain the corresponding name under the "budgets" column. Please click [here](
+    http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/NewUsageModel/NewUsageModel_node.html)
     for more information.
 3.  Change to the project directory:
 
@@ -207,7 +213,7 @@ configuration.
 
     `bsub < submit_job_juron_python3.sh`
 
-Please note that unlike JURECA, JURON uses LSF for job submission, which is why a different 
+Please note that unlike JURECA and JUWELS, JURON uses LSF for job submission, which is why a different 
 syntax is required for job configuration and submission. Moreover, email notifications are not 
 supported on JURON. For more information on how to use LSF on JURON, use the following command:
 
@@ -218,18 +224,20 @@ configuration.
 
 ## 7. Python 2 support
 
-All the code samples are compatible with both Python 2 and Python 3. However, not all frameworks on all 
-machines are available for Python 2 (yet); in certain cases these are only available for Python 3. We have 
-included separate job submission scripts for Python 2 and Python 3. In cases where Python 2 is not 
-supported, only the job submission script for Python 3 is available. We will try our best to make 
-all frameworks available with Python 2 as well, but this will not be a priority as the official support 
-for Python 2 will be discontinued in the year 2020.
+As the official support for Python 2 will be be discontinued in 2020, we decided to encourage our
+users to make the switch to Python 3 already. This also enables us to provide better support for
+Python 3 based modules, as we no longer have to spend time maintaining Python 2 modules.
+
+The only exception is Caffe, as on JURECA it is available with Python 2 only. Please note however that
+other than on JURON, Caffe is only available in the JURECA Stage 2018b, i.e., one of the previous stages. 
+We do not intend to provide support for Caffe from Stage 2019a and onward. This is due to the fact that 
+Caffe is no longer being developed.
 
 ## 8. Distributed training
 
 [Horovod](https://github.com/horovod/horovod) provides a simple and efficient solution for 
 training artificial neural networks on multiple GPUs across multiple nodes in a cluster. It can 
-be used with Tensorflow, Keras, and PyTorch (some other frameworks are supported as well, but 
+be used with Tensorflow and Keras (some other frameworks are supported as well, but 
 not Caffe). In this repository, the `horovod` directory contains further sub-directories; one 
 for each compatible framework that has been tested. E.g., there is a `keras` sub-directory that 
 contains samples that utilize distributed training with Keras and Horovod (more information is available 
@@ -251,4 +259,5 @@ directory-local `README.md` for further information.
 *  **Created by:** Fahad Khalid (SLNS/HPCNS, JSC)
 *  **Installation of modules on JURON:** Andreas Herten (HPCNS, JSC)
 *  **Installation of modules on JURECA:** Damian Alvarez (JSC), Rajalekshmi Deepu (SLNS/HPCNS, JSC)
-*  **Initial review/suggestions/testing:** Kai Krajsek (SLNS/HPCNS, JSC), Tabea Kirchner (SLNS/HPCNS, JSC)
+*  **Review/suggestions/testing:** Kai Krajsek (SLNS/HPCNS, JSC), Tabea Kirchner (SLNS/HPCNS, JSC), 
+Susanne Wenzel (INM-1)
diff --git a/caffe/README.md b/caffe/README.md
index dffc7af..941c3d6 100644
--- a/caffe/README.md
+++ b/caffe/README.md
@@ -38,6 +38,6 @@ results in the generation of a learning curve plot in the current directory.
 Working with custom C++ layers requires recompiling Caffe with the custom code. As 
 this is not possible with a system-wide installation, we have decided not to 
 include an example of this use case. Nevertheless, if you must work with custom 
-C++ layers and require assistance, please send an email to the mailing list 
+C++ layers and require assistance, please send an email to the JULAIN mailing list 
 (more information [here](https://lists.fz-juelich.de/mailman/listinfo/ml)).
 
diff --git a/horovod/README.md b/horovod/README.md
index a06e588..3d63a23 100644
--- a/horovod/README.md
+++ b/horovod/README.md
@@ -2,7 +2,7 @@
 
 All source code samples were taken from the Horovod examples repository 
 [here](https://github.com/uber/horovod/tree/master/examples) 
-(last checked: February 19, 2019). The samples that work with MNIST data have been 
+(last checked: September 02, 2019). The samples that work with MNIST data have been 
 slightly modified. Our changes are limited to,
 
 *  The data loading mechanism
@@ -22,23 +22,6 @@ for distributed training.
 2.  `mnist_advanced.py`: This sample is primarily the same as `mnist.py`. However, a 
 few more advanced Horovod features are used.
 
-## PyTorch samples
-
-**Note:** PyTorch samples currently DO NOT work on JURECA and JUWELS. These 
-do however work on JURON.
-
-The following PyTorch samples are included:
-
-1.  `mnist.py`: Demonstrates distributed training using Horovod with PyTorch. A 
-simple convolutional neural network is trained on the MNIST dataset.
-2.  `synthetic_benchmark.py`: A benchmark that can be used to measure performance 
-of PyTorch with Horovod without using any external dataset.
-
-**Note:** The job scripts for JURECA and JUWELS are prefixed with `.` for these samples, so that 
-these scripts do not appear in the directory listing. The reason for doing this is
-that our testing revealed issues with multi-node training. As soon as the issue has 
-been resolved, we'll make the scripts available.
-
 ## Tensorflow samples
 
 The following Tensorflow samples are included:
diff --git a/horovod/keras/mnist.py b/horovod/keras/mnist.py
index 85dd944..e31aa8a 100644
--- a/horovod/keras/mnist.py
+++ b/horovod/keras/mnist.py
@@ -104,7 +104,7 @@ model.fit(x_train, y_train,
           batch_size=batch_size,
           callbacks=callbacks,
           epochs=epochs,
-          verbose=1,
+          verbose=1 if hvd.rank() == 0 else 0,
           validation_data=(x_test, y_test))
 score = model.evaluate(x_test, y_test, verbose=0)
 print('Test loss:', score[0])
diff --git a/horovod/pytorch/.submit_job_jureca.sh b/horovod/pytorch/.submit_job_jureca.sh
deleted file mode 100755
index 1afd801..0000000
--- a/horovod/pytorch/.submit_job_jureca.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/usr/bin/env bash
-
-# Slurm job configuration
-#SBATCH --nodes=2
-#SBATCH --ntasks=4
-#SBATCH --ntasks-per-node=2
-#SBATCH --output=output_%j.out
-#SBATCH --error=error_%j.er
-#SBATCH --time=00:10:00
-#SBATCH --job-name=HOROVOD_PYTORCH_MNIST
-#SBATCH --gres=gpu:2 --partition=develgpus
-#SBATCH --mail-type=ALL
-
-# Load the required modules
-module load GCC/7.3.0
-module load MVAPICH2/2.3-GDR
-module load PyTorch/1.0.0-GPU-Python-3.6.6
-module load torchvision/0.2.1-GPU-Python-3.6.6
-module load Horovod/0.15.2-GPU-Python-3.6.6
-
-# Run the program
-srun python -u mnist.py
diff --git a/horovod/pytorch/.submit_job_juwels.sh b/horovod/pytorch/.submit_job_juwels.sh
deleted file mode 100755
index 5070055..0000000
--- a/horovod/pytorch/.submit_job_juwels.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/usr/bin/env bash
-
-# Slurm job configuration
-#SBATCH --nodes=2
-#SBATCH --ntasks=8
-#SBATCH --ntasks-per-node=4
-#SBATCH --output=output_%j.out
-#SBATCH --error=error_%j.er
-#SBATCH --time=00:10:00
-#SBATCH --job-name=HOROVOD_PYTORCH_MNIST
-#SBATCH --gres=gpu:4 --partition=develgpus
-#SBATCH --mail-type=ALL
-
-# Load the required modules
-module load GCC/8.3.0
-module load MVAPICH2/2.3.1-GDR
-module load PyTorch/1.1.0-GPU-Python-3.6.8
-module load torchvision/0.3.0-GPU-Python-3.6.8
-module load Horovod/0.16.2-GPU-Python-3.6.8
-
-# Run the program
-srun python -u mnist.py
diff --git a/horovod/pytorch/mnist.py b/horovod/pytorch/mnist.py
deleted file mode 100644
index 4f43193..0000000
--- a/horovod/pytorch/mnist.py
+++ /dev/null
@@ -1,200 +0,0 @@
-from __future__ import print_function
-import os
-import sys
-import shutil
-import argparse
-import torch.nn as nn
-import torch.nn.functional as F
-import torch.optim as optim
-from torchvision import datasets, transforms
-import torch.utils.data.distributed
-import horovod.torch as hvd
-
-# Training settings
-parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
-parser.add_argument('--batch-size', type=int, default=64, metavar='N',
-                    help='input batch size for training (default: 64)')
-parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
-                    help='input batch size for testing (default: 1000)')
-parser.add_argument('--epochs', type=int, default=10, metavar='N',
-                    help='number of epochs to train (default: 10)')
-parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
-                    help='learning rate (default: 0.01)')
-parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
-                    help='SGD momentum (default: 0.5)')
-parser.add_argument('--no-cuda', action='store_true', default=False,
-                    help='disables CUDA training')
-parser.add_argument('--seed', type=int, default=42, metavar='S',
-                    help='random seed (default: 42)')
-parser.add_argument('--log-interval', type=int, default=10, metavar='N',
-                    help='how many batches to wait before logging training status')
-parser.add_argument('--fp16-allreduce', action='store_true', default=False,
-                    help='use fp16 compression during allreduce')
-args = parser.parse_args()
-args.cuda = not args.no_cuda and torch.cuda.is_available()
-
-# [HPCNS] Import the DataValidator, which can then be used to
-# validate and load the path to the already downloaded dataset.
-sys.path.insert(0, '../../utils')
-from data_utils import DataValidator
-
-# [HPCNS] Name of the dataset file
-data_file = 'mnist/pytorch/data'
-
-# [HPCNS] Path to the directory containing the dataset file
-data_dir = DataValidator.validated_data_dir(data_file)
-
-# Horovod: initialize library.
-hvd.init()
-torch.manual_seed(args.seed)
-
-if args.cuda:
-    # Horovod: pin GPU to local rank.
-    torch.cuda.set_device(hvd.local_rank())
-    torch.cuda.manual_seed(args.seed)
-
-# Horovod: limit # of CPU threads to be used per worker.
-torch.set_num_threads(1)
-
-# [HPCNS] Fully qualified dataset file name
-dataset_file = os.path.join(data_dir, data_file)
-
-# [HPCNS] Dataset filename for this rank
-dataset_root_for_rank = 'MNIST-data-{}'.format(hvd.rank())
-dataset_for_rank = dataset_root_for_rank + '/MNIST'
-
-# [HPCNS] If the path already exists, remove it
-if os.path.exists(dataset_for_rank):
-    shutil.rmtree(dataset_for_rank)
-
-# [HPCNS] Make a copy of the dataset for this rank
-shutil.copytree(dataset_file, dataset_for_rank)
-
-kwargs = {'num_workers': 1, 'pin_memory': True} if args.cuda else {}
-train_dataset = \
-    datasets.MNIST(dataset_root_for_rank, train=True, download=False,
-                   transform=transforms.Compose([
-                       transforms.ToTensor(),
-                       transforms.Normalize((0.1307,), (0.3081,))
-                   ]))
-# Horovod: use DistributedSampler to partition the training data.
-train_sampler = torch.utils.data.distributed.DistributedSampler(
-    train_dataset, num_replicas=hvd.size(), rank=hvd.rank())
-train_loader = torch.utils.data.DataLoader(
-    train_dataset, batch_size=args.batch_size, sampler=train_sampler, **kwargs)
-
-test_dataset = \
-    datasets.MNIST(dataset_root_for_rank, train=False, download=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        transforms.Normalize((0.1307,), (0.3081,))
-    ]))
-# Horovod: use DistributedSampler to partition the test data.
-test_sampler = torch.utils.data.distributed.DistributedSampler(
-    test_dataset, num_replicas=hvd.size(), rank=hvd.rank())
-test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=args.test_batch_size,
-                                          sampler=test_sampler, **kwargs)
-
-
-class Net(nn.Module):
-    def __init__(self):
-        super(Net, self).__init__()
-        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
-        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
-        self.conv2_drop = nn.Dropout2d()
-        self.fc1 = nn.Linear(320, 50)
-        self.fc2 = nn.Linear(50, 10)
-
-    def forward(self, x):
-        x = F.relu(F.max_pool2d(self.conv1(x), 2))
-        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
-        x = x.view(-1, 320)
-        x = F.relu(self.fc1(x))
-        x = F.dropout(x, training=self.training)
-        x = self.fc2(x)
-        return F.log_softmax(x)
-
-
-model = Net()
-
-if args.cuda:
-    # Move model to GPU.
-    model.cuda()
-
-# Horovod: scale learning rate by the number of GPUs.
-optimizer = optim.SGD(model.parameters(), lr=args.lr * hvd.size(),
-                      momentum=args.momentum)
-
-# Horovod: broadcast parameters & optimizer state.
-hvd.broadcast_parameters(model.state_dict(), root_rank=0)
-hvd.broadcast_optimizer_state(optimizer, root_rank=0)
-
-# Horovod: (optional) compression algorithm.
-compression = hvd.Compression.fp16 if args.fp16_allreduce else hvd.Compression.none
-
-# Horovod: wrap optimizer with DistributedOptimizer.
-optimizer = hvd.DistributedOptimizer(optimizer,
-                                     named_parameters=model.named_parameters(),
-                                     compression=compression)
-
-
-def train(epoch):
-    model.train()
-    # Horovod: set epoch to sampler for shuffling.
-    train_sampler.set_epoch(epoch)
-    for batch_idx, (data, target) in enumerate(train_loader):
-        if args.cuda:
-            data, target = data.cuda(), target.cuda()
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % args.log_interval == 0:
-            # Horovod: use train_sampler to determine the number of examples in
-            # this worker's partition.
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                epoch, batch_idx * len(data), len(train_sampler),
-                100. * batch_idx / len(train_loader), loss.item()))
-
-
-def metric_average(val, name):
-    tensor = torch.tensor(val)
-    avg_tensor = hvd.allreduce(tensor, name=name)
-    return avg_tensor.item()
-
-
-def test():
-    model.eval()
-    test_loss = 0.
-    test_accuracy = 0.
-    for data, target in test_loader:
-        if args.cuda:
-            data, target = data.cuda(), target.cuda()
-        output = model(data)
-        # sum up batch loss
-        test_loss += F.nll_loss(output, target, size_average=False).item()
-        # get the index of the max log-probability
-        pred = output.data.max(1, keepdim=True)[1]
-        test_accuracy += pred.eq(target.data.view_as(pred)).cpu().float().sum()
-
-    # Horovod: use test_sampler to determine the number of examples in
-    # this worker's partition.
-    test_loss /= len(test_sampler)
-    test_accuracy /= len(test_sampler)
-
-    # Horovod: average metric values across workers.
-    test_loss = metric_average(test_loss, 'avg_loss')
-    test_accuracy = metric_average(test_accuracy, 'avg_accuracy')
-
-    # Horovod: print output only on first rank.
-    if hvd.rank() == 0:
-        print('\nTest set: Average loss: {:.4f}, Accuracy: {:.2f}%\n'.format(
-            test_loss, 100. * test_accuracy))
-
-
-for epoch in range(1, args.epochs + 1):
-    train(epoch)
-    test()
-
-# [HPCNS] Remove the copied dataset
-shutil.rmtree(dataset_root_for_rank)
diff --git a/horovod/pytorch/run_on_localMachine.sh b/horovod/pytorch/run_on_localMachine.sh
deleted file mode 100644
index 9c9afb4..0000000
--- a/horovod/pytorch/run_on_localMachine.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-#!/usr/bin/env bash
-
-# Run the program
-mpirun -np 1 -H localhost:1 \
-    -bind-to none -map-by slot \
-    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
-    -mca pml ob1 -mca btl ^openib \
-    python -u mnist.py
diff --git a/horovod/pytorch/submit_job_juron.sh b/horovod/pytorch/submit_job_juron.sh
deleted file mode 100644
index 126c939..0000000
--- a/horovod/pytorch/submit_job_juron.sh
+++ /dev/null
@@ -1,20 +0,0 @@
-#!/usr/bin/env bash
-
-#BSUB -q normal
-#BSUB -W 10
-#BSUB -n 4
-#BSUB -R "span[ptile=2]"
-#BSUB -gpu "num=2"
-#BSUB -e "error.%J.er"
-#BSUB -o "output_%J.out"
-#BSUB -J PYTORCH_HOROVOD_MNIST
-
-# Load the required modules
-module load python/3.6.1
-module load pytorch/1.0.1-gcc_5.4.0-cuda_10.0.130
-module load torchvision/0.2.1
-module load horovod/0.15.2
-
-# Run the program
-mpirun -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH \
-        -x PATH -mca pml ob1 -mca btl ^openib python -u mnist.py
diff --git a/horovod/pytorch/synthetic_benchmark.py b/horovod/pytorch/synthetic_benchmark.py
deleted file mode 100644
index e7a177f..0000000
--- a/horovod/pytorch/synthetic_benchmark.py
+++ /dev/null
@@ -1,110 +0,0 @@
-from __future__ import print_function
-
-import argparse
-import torch.backends.cudnn as cudnn
-import torch.nn.functional as F
-import torch.optim as optim
-import torch.utils.data.distributed
-from torchvision import models
-import horovod.torch as hvd
-import timeit
-import numpy as np
-
-# Benchmark settings
-parser = argparse.ArgumentParser(description='PyTorch Synthetic Benchmark',
-                                 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
-parser.add_argument('--fp16-allreduce', action='store_true', default=False,
-                    help='use fp16 compression during allreduce')
-
-parser.add_argument('--model', type=str, default='resnet50',
-                    help='model to benchmark')
-parser.add_argument('--batch-size', type=int, default=32,
-                    help='input batch size')
-
-parser.add_argument('--num-warmup-batches', type=int, default=10,
-                    help='number of warm-up batches that don\'t count towards benchmark')
-parser.add_argument('--num-batches-per-iter', type=int, default=10,
-                    help='number of batches per benchmark iteration')
-parser.add_argument('--num-iters', type=int, default=10,
-                    help='number of benchmark iterations')
-
-parser.add_argument('--no-cuda', action='store_true', default=False,
-                    help='disables CUDA training')
-
-args = parser.parse_args()
-args.cuda = not args.no_cuda and torch.cuda.is_available()
-
-hvd.init()
-
-if args.cuda:
-    # Horovod: pin GPU to local rank.
-    torch.cuda.set_device(hvd.local_rank())
-
-cudnn.benchmark = True
-
-# Set up standard model.
-model = getattr(models, args.model)()
-
-if args.cuda:
-    # Move model to GPU.
-    model.cuda()
-
-optimizer = optim.SGD(model.parameters(), lr=0.01)
-
-# Horovod: (optional) compression algorithm.
-compression = hvd.Compression.fp16 if args.fp16_allreduce else hvd.Compression.none
-
-# Horovod: wrap optimizer with DistributedOptimizer.
-optimizer = hvd.DistributedOptimizer(optimizer,
-                                     named_parameters=model.named_parameters(),
-                                     compression=compression)
-
-# Horovod: broadcast parameters & optimizer state.
-hvd.broadcast_parameters(model.state_dict(), root_rank=0)
-hvd.broadcast_optimizer_state(optimizer, root_rank=0)
-
-# Set up fixed fake data
-data = torch.randn(args.batch_size, 3, 224, 224)
-target = torch.LongTensor(args.batch_size).random_() % 1000
-if args.cuda:
-    data, target = data.cuda(), target.cuda()
-
-
-def benchmark_step():
-    optimizer.zero_grad()
-    output = model(data)
-    loss = F.cross_entropy(output, target)
-    loss.backward()
-    optimizer.step()
-
-
-def log(s, nl=True):
-    if hvd.rank() != 0:
-        return
-    print(s, end='\n' if nl else '')
-
-
-log('Model: %s' % args.model)
-log('Batch size: %d' % args.batch_size)
-device = 'GPU' if args.cuda else 'CPU'
-log('Number of %ss: %d' % (device, hvd.size()))
-
-# Warm-up
-log('Running warmup...')
-timeit.timeit(benchmark_step, number=args.num_warmup_batches)
-
-# Benchmark
-log('Running benchmark...')
-img_secs = []
-for x in range(args.num_iters):
-    time = timeit.timeit(benchmark_step, number=args.num_batches_per_iter)
-    img_sec = args.batch_size * args.num_batches_per_iter / time
-    log('Iter #%d: %.1f img/sec per %s' % (x, img_sec, device))
-    img_secs.append(img_sec)
-
-# Results
-img_sec_mean = np.mean(img_secs)
-img_sec_conf = 1.96 * np.std(img_secs)
-log('Img/sec per %s: %.1f +-%.1f' % (device, img_sec_mean, img_sec_conf))
-log('Total img/sec on %d %s(s): %.1f +-%.1f' %
-    (hvd.size(), device, hvd.size() * img_sec_mean, hvd.size() * img_sec_conf))
diff --git a/horovod/tensorflow/mnist.py b/horovod/tensorflow/mnist.py
index da37944..8099f1c 100644
--- a/horovod/tensorflow/mnist.py
+++ b/horovod/tensorflow/mnist.py
@@ -118,7 +118,7 @@ def main(_):
     predict, loss = conv_model(image, label, tf.estimator.ModeKeys.TRAIN)
 
     # Horovod: adjust learning rate based on number of GPUs.
-    opt = tf.train.RMSPropOptimizer(0.001 * hvd.size())
+    opt = tf.train.AdamOptimizer(0.001 * hvd.size())
 
     # Horovod: add Horovod Distributed Optimizer.
     opt = hvd.DistributedOptimizer(opt)
diff --git a/horovod/tensorflow/synthetic_benchmark.py b/horovod/tensorflow/synthetic_benchmark.py
index abbdd20..ee401a5 100644
--- a/horovod/tensorflow/synthetic_benchmark.py
+++ b/horovod/tensorflow/synthetic_benchmark.py
@@ -69,8 +69,8 @@ target = tf.random_uniform([args.batch_size, 1], minval=0, maxval=999, dtype=tf.
 
 
 def loss_function():
-    logits = model(data, training=True)
-    return tf.losses.sparse_softmax_cross_entropy(target, logits)
+    probs = model(data, training=True)
+    return tf.losses.sparse_softmax_cross_entropy(target, probs)
 
 
 def log(s, nl=True):
diff --git a/keras/README.md b/keras/README.md
index 598f4e1..4e8462d 100644
--- a/keras/README.md
+++ b/keras/README.md
@@ -3,7 +3,7 @@
 The `mnist.py` sample is a slightly modified version of `mnist_cnn.py`
 available in the Keras examples repository 
 [here](https://github.com/keras-team/keras/tree/master/examples) 
-(last checked: February 19, 2019). Our changes are 
+(last checked: September 02, 2019). Our changes are 
 limited to,
 
 *  The data loading mechanism
diff --git a/pytorch/.submit_job_juwels.sh b/pytorch/.submit_job_juwels.sh
deleted file mode 100755
index 15f53ac..0000000
--- a/pytorch/.submit_job_juwels.sh
+++ /dev/null
@@ -1,20 +0,0 @@
-#!/usr/bin/env bash
-
-# Slurm job configuration
-#SBATCH --nodes=1
-#SBATCH --ntasks=1
-#SBATCH --ntasks-per-node=1
-#SBATCH --output=output_%j.out
-#SBATCH --error=error_%j.er
-#SBATCH --time=00:10:00
-#SBATCH --job-name=PYTORCH_MNIST
-#SBATCH --gres=gpu:1 --partition=develgpus
-#SBATCH --mail-type=ALL
-
-# Load the required modules
-module load GCC/8.3.0
-module load PyTorch/1.1.0-GPU-Python-3.6.8
-module load torchvision/0.3.0-GPU-Python-3.6.8
-
-# Run the program
-srun python -u mnist.py
diff --git a/pytorch/README.md b/pytorch/README.md
deleted file mode 100644
index ac1ac2f..0000000
--- a/pytorch/README.md
+++ /dev/null
@@ -1,13 +0,0 @@
-# Notes
-
-The `mnist.py` sample is a slightly modified version of `main.py`
-available in the PyTorch examples repository 
-[here](https://github.com/pytorch/examples/tree/master/mnist) 
-(last checked: February 19, 2019). Our changes are 
-limited to,
-
-*  The data loading mechanism
-*  A bit of code cleanup
-*  A few additional comments pertaining to our custom data loading mechanism
-
-**Note:** All newly added statements follow a comment beginning with `[HPCNS]`.
\ No newline at end of file
diff --git a/pytorch/mnist.py b/pytorch/mnist.py
deleted file mode 100644
index 19bcac0..0000000
--- a/pytorch/mnist.py
+++ /dev/null
@@ -1,151 +0,0 @@
-from __future__ import print_function
-
-import os
-import sys
-import shutil
-import argparse
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-import torch.optim as optim
-from torchvision import datasets, transforms
-
-# [HPCNS] Import the DataValidator, which can then be used to
-# validate and load the path to the already downloaded dataset.
-sys.path.insert(0, '../utils')
-from data_utils import DataValidator
-
-
-class Net(nn.Module):
-    def __init__(self):
-        super(Net, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = nn.Linear(4 * 4 * 50, 500)
-        self.fc2 = nn.Linear(500, 10)
-
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = F.relu(self.conv2(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = x.view(-1, 4 * 4 * 50)
-        x = F.relu(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
-
-
-def train(args, model, device, train_loader, optimizer, epoch):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % args.log_interval == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                epoch, batch_idx * len(data), len(train_loader.dataset),
-                       100. * batch_idx / len(train_loader), loss.item()))
-
-
-def test(args, model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
-            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
-            correct += pred.eq(target.view_as(pred)).sum().item()
-
-    test_loss /= len(test_loader.dataset)
-
-    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
-        test_loss, correct, len(test_loader.dataset),
-        100. * correct / len(test_loader.dataset)))
-
-
-def main():
-    # Training settings
-    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
-    parser.add_argument('--batch-size', type=int, default=64, metavar='N',
-                        help='input batch size for training (default: 64)')
-    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
-                        help='input batch size for testing (default: 1000)')
-    parser.add_argument('--epochs', type=int, default=10, metavar='N',
-                        help='number of epochs to train (default: 10)')
-    parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
-                        help='learning rate (default: 0.01)')
-    parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
-                        help='SGD momentum (default: 0.5)')
-    parser.add_argument('--no-cuda', action='store_true', default=False,
-                        help='disables CUDA training')
-    parser.add_argument('--seed', type=int, default=1, metavar='S',
-                        help='random seed (default: 1)')
-    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
-                        help='how many batches to wait before logging training status')
-
-    parser.add_argument('--save-model', action='store_true', default=False,
-                        help='For Saving the current Model')
-    args = parser.parse_args()
-    use_cuda = not args.no_cuda and torch.cuda.is_available()
-
-    torch.manual_seed(args.seed)
-
-    device = torch.device("cuda" if use_cuda else "cpu")
-
-    # [HPCNS] Name of the dataset file
-    data_file = 'mnist/pytorch/data'
-
-    # [HPCNS] Path to the directory containing the dataset file
-    data_dir = DataValidator.validated_data_dir(data_file)
-
-    # [HPCNS] Fully qualified dataset file name
-    dataset_file = os.path.join(data_dir, data_file)
-
-    # [HPCNS] A copy of the dataset in the current directory
-    dataset_copy = 'MNIST'
-
-    # [HPCNS] If the path already exists, remove it
-    if os.path.exists(dataset_copy):
-        shutil.rmtree(dataset_copy)
-
-    # [HPCNS] Make a copy of the dataset, as the torch data loader used
-    # below expects the dataset in the current directory
-    shutil.copytree(dataset_file, dataset_copy)
-
-    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('', train=True, download=False,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=args.batch_size, shuffle=True, **kwargs)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('', train=False, download=False, transform=transforms.Compose([
-            transforms.ToTensor(),
-            transforms.Normalize((0.1307,), (0.3081,))
-        ])),
-        batch_size=args.test_batch_size, shuffle=True, **kwargs)
-
-    model = Net().to(device)
-    optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
-
-    for epoch in range(1, args.epochs + 1):
-        train(args, model, device, train_loader, optimizer, epoch)
-        test(args, model, device, test_loader)
-
-    if (args.save_model):
-        torch.save(model.state_dict(), "mnist_cnn.pt")
-
-    # [HPCNS] Remove the copied dataset
-    shutil.rmtree(dataset_copy)
-
-
-if __name__ == '__main__':
-    main()
diff --git a/pytorch/run_on_localMachine.sh b/pytorch/run_on_localMachine.sh
deleted file mode 100644
index 9c5737c..0000000
--- a/pytorch/run_on_localMachine.sh
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/usr/bin/env bash
-
-# Run the program
-python -u mnist.py
\ No newline at end of file
diff --git a/pytorch/submit_job_jureca.sh b/pytorch/submit_job_jureca.sh
deleted file mode 100755
index 15f53ac..0000000
--- a/pytorch/submit_job_jureca.sh
+++ /dev/null
@@ -1,20 +0,0 @@
-#!/usr/bin/env bash
-
-# Slurm job configuration
-#SBATCH --nodes=1
-#SBATCH --ntasks=1
-#SBATCH --ntasks-per-node=1
-#SBATCH --output=output_%j.out
-#SBATCH --error=error_%j.er
-#SBATCH --time=00:10:00
-#SBATCH --job-name=PYTORCH_MNIST
-#SBATCH --gres=gpu:1 --partition=develgpus
-#SBATCH --mail-type=ALL
-
-# Load the required modules
-module load GCC/8.3.0
-module load PyTorch/1.1.0-GPU-Python-3.6.8
-module load torchvision/0.3.0-GPU-Python-3.6.8
-
-# Run the program
-srun python -u mnist.py
diff --git a/pytorch/submit_job_juron.sh b/pytorch/submit_job_juron.sh
deleted file mode 100644
index 061139f..0000000
--- a/pytorch/submit_job_juron.sh
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/usr/bin/env bash
-
-#BSUB -q normal
-#BSUB -W 10
-#BSUB -n 1
-#BSUB -R "span[ptile=1]"
-#BSUB -gpu "num=1"
-#BSUB -e "error.%J.er"
-#BSUB -o "output_%J.out"
-#BSUB -J PYTORCH_MNIST
-
-# Load the required modules
-module load python/3.6.1
-module load pytorch/1.0.1-gcc_5.4.0-cuda_10.0.130
-module load torchvision/0.2.1
-
-# Run the program
-python -u mnist.py
\ No newline at end of file
diff --git a/tensorflow/README.md b/tensorflow/README.md
index cbf4854..3bf439c 100644
--- a/tensorflow/README.md
+++ b/tensorflow/README.md
@@ -3,7 +3,7 @@
 The `mnist.py` sample is a slightly modified version of `convolutional.py`
 available in the Tensorflow models repository 
 [here](https://github.com/tensorflow/models/blob/master/tutorials/image/mnist) 
-(last checked: February 19, 2019). Our changes are 
+(last checked: September 02, 2019). Our changes are 
 limited to,
 
 *  The data loading mechanism
-- 
GitLab