Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
MLAir
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
esde
machine-learning
MLAir
Merge requests
!97
Felix issue106 hpc modules for juwels
Code
Review changes
Check out branch
Download
Patches
Plain diff
Merged
Felix issue106 hpc modules for juwels
felix_issue106_HPC_modules_for_JUWELS
into
develop
Overview
0
Commits
57
Pipelines
9
Changes
20
Merged
Ghost User
requested to merge
felix_issue106_HPC_modules_for_JUWELS
into
develop
5 years ago
Overview
0
Commits
57
Pipelines
9
Changes
20
Expand
Setup for HPC systems JUWELS and HDFML
Edited
5 years ago
by
Ghost User
0
0
Merge request reports
Compare
develop
version 8
8013c6f6
5 years ago
version 7
7ecabd45
5 years ago
version 6
0081d308
5 years ago
version 5
65a38afc
5 years ago
version 4
6a6b3e9c
5 years ago
version 3
f145d38b
5 years ago
version 2
d26d5a22
5 years ago
version 1
fd2d6d40
5 years ago
develop (base)
and
latest version
latest version
cf534648
57 commits,
5 years ago
version 8
8013c6f6
55 commits,
5 years ago
version 7
7ecabd45
54 commits,
5 years ago
version 6
0081d308
53 commits,
5 years ago
version 5
65a38afc
51 commits,
5 years ago
version 4
6a6b3e9c
49 commits,
5 years ago
version 3
f145d38b
48 commits,
5 years ago
version 2
d26d5a22
47 commits,
5 years ago
version 1
fd2d6d40
45 commits,
5 years ago
20 files
+
600
−
41
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
Files
20
Search (e.g. *.vue) (Ctrl+P)
HPC_setup/create_runscripts_HPC.sh
0 → 100755
+
131
−
0
Options
#!/bin/csh -x
# __author__ = Felix Kleinert
# __date__ = '2020-04-30'
# This script creates run scripts for JUWELS or HDFML
# When you call this script directly you can use
# $1 which has to be `juwels' or `hdfml'.
# $2 which is the path where the run scripts should be stored
if
[[
$1
!=
''
]]
;
then
hpcsys
=
$1
else
if
[[
$HOSTNAME
==
*
"juwels"
*
]]
;
then
hpcsys
=
"juwels"
elif
[[
$HOSTNAME
==
*
"hdfml"
*
]]
;
then
hpcsys
=
"hdfml"
else
echo
"Unknown hpc host
\`
$HOSTNAME
\`
. Pass 'juwels' or 'hdfml' as first argument."
exit
fi
fi
if
[[
$2
!=
''
]]
;
then
cur
=
$2
else
cur
=
$PWD
fi
echo
"############################################################"
echo
"# #"
echo
"# user interaction required #"
echo
"# #"
echo
"############################################################"
echo
echo
"This script creates the HPC batch scripts to run mlt on compute nodes on JUWELS or hdfml."
echo
"You can modify the created run scripts afterwards if needed."
echo
echo
echo
"Creating run script for
$hpcsys
:"
echo
budget
=
''
while
[[
$budget
==
''
]]
do
echo
read
-p
"Enter project budget for --account flag: "
budget
done
email
=
`
jutil user show
-o
json |
grep
email |
cut
-f2
-d
':'
|
cut
-f1
-d
','
|
cut
-f2
-d
'"'
`
echo
read
-p
"Enter e-mail address for --mail-user (default:
${
email
}
): "
new_email
if
[[
-z
"
$new_email
"
]]
;
then
new_email
=
$email
fi
# create HPC_logging dir
hpclogging
=
"HPC_logging/"
mkdir
-p
${
cur
}
/
${
hpclogging
}
# ordering for looping:
# "partition nGPUs timing"
if
[[
$hpcsys
=
"juwels"
]]
;
then
for
i
in
"develgpus 2 02:00:00"
"gpus 4 08:00:00"
;
do
set
--
$i
cat
<<
EOT
>
${
cur
}
/run_
${
hpcsys
}
_
$1
.bash
#!/bin/bash -x
#SBATCH --account=
${
budget
}
#SBATCH --nodes=1
#SBATCH --output=
${
hpclogging
}
mlt-out.%j
#SBATCH --error=
${
hpclogging
}
mlt-err.%j
#SBATCH --time=
$3
#SBATCH --partition=
$1
#SBATCH --gres=gpu:
$2
#SBATCH --mail-type=ALL
#SBATCH --mail-user=
${
email
}
source HPC_setup/mlt_modules_
${
hpcsys
}
.sh
source venv_
${
hpcsys
}
/bin/activate
timestamp=
\`
date +"%Y-%m-%d_%H%M-%S"
\`
export PYTHONPATH=
\$
{PWD}/venv_
${
hpcsys
}
/lib/python3.6/site-packages:
\$
{PYTHONPATH}
srun python run.py --experiment_date=
\$
timestamp
EOT
echo
"Created runscript: run_
${
hpcsys
}
_
$1
.bash"
done
elif
[[
$hpcsys
=
"hdfml"
]]
;
then
cat
<<
EOT
>
${
cur
}
/run_
${
hpcsys
}
_batch.bash
#!/bin/bash -x
#SBATCH --account=
${
budget
}
#SBATCH --nodes=1
#SBATCH --output=
${
hpclogging
}
mlt-out.%j
#SBATCH --error=
${
hpclogging
}
mlt-err.%j
#SBATCH --time=08:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=
${
email
}
source HPC_setup/mlt_modules_
${
hpcsys
}
.sh
source venv_
${
hpcsys
}
/bin/activate
timestamp=
\`
date +"%Y-%m-%d_%H%M-%S"
\`
export PYTHONPATH=
\$
{PWD}/venv_
${
hpcsys
}
/lib/python3.6/site-packages:
\$
{PYTHONPATH}
srun python run.py --experiment_date=
\$
timestamp
EOT
fi
echo
echo
"You have to run the the following command on a login node to download data:"
echo
"
\`
python run.py'"
echo
echo
"Please execute the following command to check if the setup went well:"
if
[[
${
hpcsys
}
=
'juwels'
]]
;
then
echo
"
\`
sbatch run_juwels_develgpus.bash'"
else
echo
"
\`
sbatch run_hdfml_batch.bash'"
fi
Loading