Skip to content
Snippets Groups Projects
Commit eb916a0e authored by Fahad Khalid's avatar Fahad Khalid
Browse files

Minor updates to the main readme.

parent 6864128b
No related branches found
No related tags found
No related merge requests found
...@@ -15,9 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E ...@@ -15,9 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E
### Announcements ### Announcements
1. Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well. * Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well.
2. Python 2 support has been removed from the tutorial for all frameworks except Caffe. * Python 2 support has been removed from the tutorial for all frameworks except Caffe.
3. Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch * Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch
examples have been removed from this tutorial. This is due to the fact that the tutorial examples have been removed from this tutorial. This is due to the fact that the tutorial
developers are not currently working with PyTorch, and are therefore not in a position to provide developers are not currently working with PyTorch, and are therefore not in a position to provide
support for PyTorch related issues. support for PyTorch related issues.
...@@ -63,7 +63,7 @@ It is worth mentioning that all the code samples were taken from the correspondi ...@@ -63,7 +63,7 @@ It is worth mentioning that all the code samples were taken from the correspondi
official samples/tutorials repository, as practitioners are likely familiar with these (links official samples/tutorials repository, as practitioners are likely familiar with these (links
to the original code samples are included in the directory-local `README.md`). However, the to the original code samples are included in the directory-local `README.md`). However, the
original examples are designed to automatically download the required dataset in a original examples are designed to automatically download the required dataset in a
framework-defined directory. This is not a feasible option as compute nodes on the supercomputers framework-defined directory. This is not a feasible option while working with supercomputers as compute nodes
do not have access to the Internet. Therefore, the samples have been slightly modified to load data from do not have access to the Internet. Therefore, the samples have been slightly modified to load data from
the `datasets` directory included in this repository; specific code changes, at least for now, the `datasets` directory included in this repository; specific code changes, at least for now,
have been marked by comments prefixed with the `[HPCNS]` tag. For more information see the `README.md` have been marked by comments prefixed with the `[HPCNS]` tag. For more information see the `README.md`
...@@ -211,7 +211,7 @@ configuration. ...@@ -211,7 +211,7 @@ configuration.
`cd keras` `cd keras`
3. Submit the job to run the sample: 3. Submit the job to run the sample:
`bsub < submit_job_juron_python3.sh` `bsub < submit_job_juron.sh`
Please note that unlike JURECA and JUWELS, JURON uses LSF for job submission, which is why a different Please note that unlike JURECA and JUWELS, JURON uses LSF for job submission, which is why a different
syntax is required for job configuration and submission. Moreover, email notifications are not syntax is required for job configuration and submission. Moreover, email notifications are not
...@@ -244,7 +244,7 @@ contains samples that utilize distributed training with Keras and Horovod (more ...@@ -244,7 +244,7 @@ contains samples that utilize distributed training with Keras and Horovod (more
in the directory-local `README.md`). in the directory-local `README.md`).
Please note that Horovod currently only supports a distribution strategy where the entire model is Please note that Horovod currently only supports a distribution strategy where the entire model is
replicated on all GPUs. It is the data that is distributed across the GPUs. If you are interested replicated on every GPU. It is the data that is distributed across the GPUs. If you are interested
in model-parallel training, where the model itself can be split and distributed, a different in model-parallel training, where the model itself can be split and distributed, a different
solution is required. We hope to add a sample for model-parallel training at a later time. solution is required. We hope to add a sample for model-parallel training at a later time.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment