Skip to content
Snippets Groups Projects
Commit f74b48e6 authored by Fahad Khalid's avatar Fahad Khalid
Browse files

Merge branch 'issue_3' into 'master'

Issue 3

Closes #3

See merge request !3
parents 114bbc18 02a35e3f
Branches
No related tags found
1 merge request!3Issue 3
...@@ -15,6 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E ...@@ -15,6 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E
### Announcements ### Announcements
* **November 28, 2019:** Slides and code samples for the "Deep Learning on Supercomputers" talk given
as part of the [Introduction to the programming and usage of the supercomputer resources at Jülich](https://www.fz-juelich.de/SharedDocs/Termine/IAS/JSC/EN/courses/2019/supercomputer-2019-11.html?nn=944302)
course are now available in the `course_material` directory.
* **November 22, 2019:** Samples for Caffe are no longer supported on JURECA due to system-wide * **November 22, 2019:** Samples for Caffe are no longer supported on JURECA due to system-wide
MVAPICH2 module changes. MVAPICH2 module changes.
* **November 18, 2019:** The `horovod_data_distributed` directory has been added that contains code * **November 18, 2019:** The `horovod_data_distributed` directory has been added that contains code
......
# Slides and code samples
The slides and code samples in the sub-directories correspond to the introductory examples presented during the
"Deep Learning on Supercomputers" talk, which is given as
part of the [Introduction to the programming and usage of the supercomputer resources at Jülich](https://www.fz-juelich.de/SharedDocs/Termine/IAS/JSC/EN/courses/2019/supercomputer-2019-11.html?nn=944302)
course.
**Note:** These code samples are NOT designed to work on our supercomputers. To see why, read `datasets/README.md`.
To run code samples on the supercomputers, please follow the main tutorial.
\ No newline at end of file
# Copyright (c) 2019 Forschungszentrum Juelich GmbH.
# This code is licensed under MIT license (see the LICENSE file for details).
# This code is derived from Horovod, which is licensed under the Apache License,
# Version 2.0 (see the NOTICE file for details).
"""
This program is an adaptation of the following code sample:
https://github.com/horovod/horovod/blob/master/examples/keras_mnist.py.
The program creates and trains a shallow ANN for handwritten digit
classification using the MNIST dataset.
The Horovod framework is used for seamless distributed training. In this
example epochs are distributed across the Horovod ranks, not data.
To run this sample use the following command on your
workstation/laptop equipped with a GPU:
mpirun -np 1 python -u mnist_epoch_distributed.py
If you have more than one GPU on your system, you can increase the
number of ranks accordingly.
The code has been tested with Python 3.7.5, tensorflow-gpu 1.13.1, and
horovod 0.16.2.
Note: This code will NOT work on the supercomputers.
"""
import math
import tensorflow as tf
import horovod.tensorflow.keras as hvd
from tensorflow.python.keras import backend as K
# Horovod: initialize Horovod.
hvd.init()
# Horovod: pin GPU to be used to process local rank (one GPU per process)
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())
K.set_session(tf.Session(config=config))
# Reference to the MNIST dataset
mnist = tf.keras.datasets.mnist
# Load the MNIST dataset, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Normalize input samples
x_train, x_test = x_train / 255.0, x_test / 255.0
# Define the model, i.e., the network
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
# Optimizer
optimizer = tf.keras.optimizers.Adam()
# Decorate the optimizer with Horovod's distributed optimizer
optimizer = hvd.DistributedOptimizer(optimizer)
# Horovod: adjust number of epochs based on number of GPUs.
epochs = int(math.ceil(4.0 / hvd.size()))
# Compile the model
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Training callbacks
callbacks = [
# Horovod: broadcast initial variable states from rank 0 to all other processes.
# This is necessary to ensure consistent initialization of all workers when
# training is started with random weights or restored from a checkpoint.
hvd.callbacks.BroadcastGlobalVariablesCallback(0)
]
# Train the model using the training set
model.fit(
x=x_train,
y=y_train,
batch_size=32,
epochs=epochs,
verbose=1 if hvd.rank() == 0 else 0,
callbacks=callbacks
)
# Run the test on the root rank only
if hvd.rank() == 0:
# Test the model on the test set
score = model.evaluate(x=x_test, y=y_test, verbose=0)
print(f'Test loss: {score[0]}')
print(f'Test accuracy: {score[1]}')
# Copyright (c) 2019 Forschungszentrum Juelich GmbH.
# This code is licensed under MIT license (see the LICENSE file for details).
# This code is derived from Tensorflow tutorials, which is licensed under the Apache License,
# Version 2.0 (see the NOTICE file for details).
"""
This program is an adaptation of the code sample available at
https://www.tensorflow.org/tutorials/. The program creates
and trains a shallow ANN for handwritten digit classification
using the MNIST dataset.
To run this sample use the following command on your
workstation/laptop equipped with a GPU:
python -u mnist.py
The code has been tested with Python 3.7.5 and tensorflow-gpu 1.13.1.
Note: This code will NOT work on the supercomputers.
"""
import tensorflow as tf
# Reference to the MNIST data object
mnist = tf.keras.datasets.mnist
# Load the MNIST dataset, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Normalize input samples
x_train, x_test = x_train / 255.0, x_test / 255.0
# Define the model, i.e., the network
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
# Optimizer
optimizer = tf.keras.optimizers.Adam()
# No. of epochs
epochs = 4
# Compile the model
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Train the model using the training set
model.fit(
x=x_train,
y=y_train,
batch_size=32,
epochs=epochs,
verbose=1
)
# Test the model using the test set
score = model.evaluate(x=x_test, y=y_test, verbose=0)
print(f'Test loss: {score[0]}')
print(f'Test accuracy: {score[1]}')
File added
...@@ -22,9 +22,6 @@ class DataValidator: ...@@ -22,9 +22,6 @@ class DataValidator:
""" """
def __init__(self):
""" No-op constructor. """
@staticmethod @staticmethod
def validated_data_dir(filename): def validated_data_dir(filename):
""" """
...@@ -32,15 +29,9 @@ class DataValidator: ...@@ -32,15 +29,9 @@ class DataValidator:
recognized input data directory locations. If the check is passed, recognized input data directory locations. If the check is passed,
returns the fully qualified path to the input data directory. returns the fully qualified path to the input data directory.
Parameters :param filename: Name of the data file to be checked.
----------
filename:
Name of the data file to be checked
Returns :return: str. Fully qualified path to the input data directory.
-------
string:
Fully qualified path to the input data directory
""" """
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment