Supercomputing Environment Template using Python Virtual Environments
TLDR
This repo contains an example on how to easily create a consistent working environment that can be used in the "normal" supercomputer workflow as well as in Jupyter-JSC.
There are two flavors of this type of repo:
- The "normal" version based on environment modules and a python virtual environment
- The singularity version where a Singularity container replaces the module environment. Check out the branch
singularity
.
The user creates a working environment as a fork of this repo, developes it along with the requirement of their project and finally archives or publishes it jointly with the results of the research project.
HowTo Environment Modules
- Clone the repo into a folder with the name of the environment module
git clone /path/to/repo ./my_favourite_environment
. - Edit
modules.sh
to select the modules to load prior to creating a virtual environment - Edit
requirements.txt
to select the packages to be installed via pip - Create the environment
- execute
setup.sh
to create the virtual environment - execute
create_kernel.sh
to create a kernel for Jupyer-JSC
- execute
-
source activate.sh
to enter the environment
HowTo Singularity
- Get you favourite base image to use as starting point. Example:
singularity pull docker://nvcr.io/nvidia/pytorch:22.10-py3
- Clone the repo into a folder with the name of the environment module
git clone /path/to/repo ./my_favourite_environment
. - Edit
config.sh
to select the singularity container. - If required, edit
setup.sh
to make changes to the container. This is possible as we support a persistent overlay file system - Edit
requirements.txt
to select the packages to be installed via pip - Create the environment
- execute
setup.sh
to create the virtual environment - execute
create_kernel.sh
to create a kernel for Jupyer-JSC
- execute
- Make sure you execute
source activate.sh
each time you execute something within the container.
Description
This project contains a lightweight set of scripts to easily create Python working environments on typical supercomputer setups, including creating Jupyter Kernels.
Environment Modules
On Supercomputers, typically a basic environment based on Environment Modules. This setup is carefully curated and optimized, including compilers, MPI version etc. Extra Python packages can be installed with pip into user space. This, however, does not create a reproducible environment that can be used by other users as well.
Conceptuall, with virtual environments, it is easily possible to create project-based virtual environments. These scripts streamline the creation und usage of such environments and make it easy for a users to share a setup and to put it under version control with the main code.
Furthermore, in typical compute setup of scientific projects, one or more packages possibly are in active developement. In the context of these setups, it is intended to include them as submodules and add integrate them into the workflow. This can e.g. mean that a compilation step is added in the setup step and setting appropriate environment variables is included in the activation step.
Singularity
A useful alternative to environment modules are container-based environments. On JUWELS Booster and JURECA DC, we highly recommend using NVIDIA's pre-built docker containers in the NGC https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch . Singularity can import docker images from public container registries with a command link singularity pull docker://nvcr.io/nvidia/pytorch:22.10-py3
. This will automatically create a .sif
file that you can use as starting point.
Details
The setup is configured in the bash script config.sh
. The user can define a name for the venv and directory
where the venv files are stored. This defaults to the directory name of the containing folder and the "." folder
of the scripts. Please edit this file if you want a custom name and location for the venv.
The modules ontop of which the the venv should be built are defined in modules.sh
. Please edit the file
to your needs.
The file requirements.txt
contains a list of packages to be installed during the setup process. Add required
packages to this file to reproducibly add them to the venv.
The script setup.sh
creates the venv according to the config given in config.sh
. Please edit this
file to add a setup step for submodules (e.g. compilation of libraries). If only plain venvs are used, this file
can remain unchanged.
The script activate.sh
sets the environment variables such that the venv can be used. Please edit this file
to add environment variables for submodules. Note that it the script must be sourced to take effect. Example:
source <path_to_venv>/activate.sh
The script create_kernel.sh
will create a kernel json file in the user's home directory that can be found
by jupyter and a helper script in the virtual environment folder.