Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
dl_on_supercomputers
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
HPC4NS
dl_on_supercomputers
Commits
eb916a0e
Commit
eb916a0e
authored
5 years ago
by
Fahad Khalid
Browse files
Options
Downloads
Patches
Plain Diff
Minor updates to the main readme.
parent
6864128b
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+6
-6
6 additions, 6 deletions
README.md
with
6 additions
and
6 deletions
README.md
+
6
−
6
View file @
eb916a0e
...
@@ -15,9 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E
...
@@ -15,9 +15,9 @@ visit [this](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/wikis/E
### Announcements
### Announcements
1.
Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well.
*
Tensorflow and Keras examples (with and without Horovod) are now fully functional on JUWELS as well.
2.
Python 2 support has been removed from the tutorial for all frameworks except Caffe.
*
Python 2 support has been removed from the tutorial for all frameworks except Caffe.
3.
Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch
*
Even though PyTorch is available as as system-wide module on the JSC supercomputers, all PyTorch
examples have been removed from this tutorial. This is due to the fact that the tutorial
examples have been removed from this tutorial. This is due to the fact that the tutorial
developers are not currently working with PyTorch, and are therefore not in a position to provide
developers are not currently working with PyTorch, and are therefore not in a position to provide
support for PyTorch related issues.
support for PyTorch related issues.
...
@@ -63,7 +63,7 @@ It is worth mentioning that all the code samples were taken from the correspondi
...
@@ -63,7 +63,7 @@ It is worth mentioning that all the code samples were taken from the correspondi
official samples/tutorials repository, as practitioners are likely familiar with these (links
official samples/tutorials repository, as practitioners are likely familiar with these (links
to the original code samples are included in the directory-local
`README.md`
). However, the
to the original code samples are included in the directory-local
`README.md`
). However, the
original examples are designed to automatically download the required dataset in a
original examples are designed to automatically download the required dataset in a
framework-defined directory. This is not a feasible option
as compute nodes on
th
e
supercomputers
framework-defined directory. This is not a feasible option
while working wi
th supercomputers
as compute nodes
do not have access to the Internet. Therefore, the samples have been slightly modified to load data from
do not have access to the Internet. Therefore, the samples have been slightly modified to load data from
the
`datasets`
directory included in this repository; specific code changes, at least for now,
the
`datasets`
directory included in this repository; specific code changes, at least for now,
have been marked by comments prefixed with the
`[HPCNS]`
tag. For more information see the
`README.md`
have been marked by comments prefixed with the
`[HPCNS]`
tag. For more information see the
`README.md`
...
@@ -211,7 +211,7 @@ configuration.
...
@@ -211,7 +211,7 @@ configuration.
`cd keras`
`cd keras`
3.
Submit the job to run the sample:
3.
Submit the job to run the sample:
`bsub < submit_job_juron
_python3
.sh`
`bsub < submit_job_juron.sh`
Please note that unlike JURECA and JUWELS, JURON uses LSF for job submission, which is why a different
Please note that unlike JURECA and JUWELS, JURON uses LSF for job submission, which is why a different
syntax is required for job configuration and submission. Moreover, email notifications are not
syntax is required for job configuration and submission. Moreover, email notifications are not
...
@@ -244,7 +244,7 @@ contains samples that utilize distributed training with Keras and Horovod (more
...
@@ -244,7 +244,7 @@ contains samples that utilize distributed training with Keras and Horovod (more
in the directory-local
`README.md`
).
in the directory-local
`README.md`
).
Please note that Horovod currently only supports a distribution strategy where the entire model is
Please note that Horovod currently only supports a distribution strategy where the entire model is
replicated on
all
GPU
s
. It is the data that is distributed across the GPUs. If you are interested
replicated on
every
GPU. It is the data that is distributed across the GPUs. If you are interested
in model-parallel training, where the model itself can be split and distributed, a different
in model-parallel training, where the model itself can be split and distributed, a different
solution is required. We hope to add a sample for model-parallel training at a later time.
solution is required. We hope to add a sample for model-parallel training at a later time.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment