diff --git a/README.md b/README.md index 78c4874c8d6e133bccf5953cc894144a491deb9f..e2a5043ece8ea8ebc83b23abe7cce69b20dd62e2 100644 --- a/README.md +++ b/README.md @@ -15,9 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E ### Announcements -1. Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well. -2. Python 2 support has been removed from the tutorial for all frameworks except Caffe. -3. Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch +* Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well. +* Python 2 support has been removed from the tutorial for all frameworks except Caffe. +* Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch examples have been removed from this tutorial. This is due to the fact that the tutorial developers are not currently working with PyTorch, and are therefore not in a position to provide support for PyTorch related issues. @@ -63,7 +63,7 @@ It is worth mentioning that all the code samples were taken from the correspondi official samples/tutorials repository, as practitioners are likely familiar with these (links to the original code samples are included in the directory-local `README.md`). However, the original examples are designed to automatically download the required dataset in a -framework-defined directory. This is not a feasible option as compute nodes on the supercomputers +framework-defined directory. This is not a feasible option while working with supercomputers as compute nodes do not have access to the Internet. Therefore, the samples have been slightly modified to load data from the `datasets` directory included in this repository; specific code changes, at least for now, have been marked by comments prefixed with the `[HPCNS]` tag. For more information see the `README.md` @@ -211,7 +211,7 @@ configuration. `cd keras` 3. Submit the job to run the sample: - `bsub < submit_job_juron_python3.sh` + `bsub < submit_job_juron.sh` Please note that unlike JURECA and JUWELS, JURON uses LSF for job submission, which is why a different syntax is required for job configuration and submission. Moreover, email notifications are not @@ -244,7 +244,7 @@ contains samples that utilize distributed training with Keras and Horovod (more in the directory-local `README.md`). Please note that Horovod currently only supports a distribution strategy where the entire model is -replicated on all GPUs. It is the data that is distributed across the GPUs. If you are interested +replicated on every GPU. It is the data that is distributed across the GPUs. If you are interested in model-parallel training, where the model itself can be split and distributed, a different solution is required. We hope to add a sample for model-parallel training at a later time.