diff --git a/README.md b/README.md index ca52aa45b4bfe9f94b350fa8556076bb10686dca..64efcba4d36c920d28476853d761039f7bffa6e9 100644 --- a/README.md +++ b/README.md @@ -87,9 +87,10 @@ If you would like to install PyTorch, you can find the PyTorch NVIDIA container you can proceed to build the base NVIDIA image using the following command: ```bash -bash scripts/build_base_nvidia_image.sh -A {account} -p {partition} -V {Nvidia_container_version} -N {base_image_name} +bash scripts/build_base_nvidia_image.sh -A {account} -G {group_name} -P {partition} -V {Nvidia_container_version} -N {base_image_name} ``` - `account`: The account name for using supercomputers. +- `group_name`: The group name for using supercomputers (you can find the name of the Group on JuDoor). Sometimes account and group name are the same. - `partition`: The partition name on the supercomputer. Please use a partition that has internet access (e.g., `devel`, `develgpus`, `develbooster`, etc.). - `Nvidia_container_version`: The Nvidia container (e.g., `nvcr.io/nvidia/pytorch:23.05-py3`). - `base_image_name`: The name of the base image with a `.sif` extension (e.g., `pytorch-23-05-py3.sif`). @@ -98,14 +99,14 @@ bash scripts/build_base_nvidia_image.sh -A {account} -p {partition} -V {Nvidia_c Here is the example how to build the base image: ```bash -bash scripts/build_base_nvidia_image.sh -A atmlaml -P devel -V nvcr.io/nvidia/pytorch:23.05-py3 -N pytorch-23-05-py3.sif +bash scripts/build_base_nvidia_image.sh -A atmlaml -G atmlaml -P devel -V nvcr.io/nvidia/pytorch:23.05-py3 -N pytorch-23-05-py3.sif ``` After running scripts/build_base_nvidia_image.sh, a SLURM job will be launched, and a report will be generated in the `slurm_report` path. Please make sure to review this report to confirm that your image was created without any errors. The above command will create an singularity image in the following path: -`/p/project1/{account}/{$USER}/apptainer/pytorch-23-05-py3.sif` +`/p/project1/{account}/$USER/apptainer/pytorch-23-05-py3.sif` @@ -116,9 +117,10 @@ Note: Since you are using Nvidia container, PyTorch and other dependencies is al If there is a `torch` or `torchvision` inside of `requirements.txt`, please remove it in order to avoid conflict between the Nvidia PyTorch inside of the conatiner. ```bash -bash scripts/build_customized_nvidia_image.sh -A {account} -P {partition} -S {absolute_path_base_image} -R {absolute_path_requirements.txt} -F {final_image_name} +bash scripts/build_customized_nvidia_image.sh -A {account} -G {group_name} -P {partition} -S {absolute_path_base_image} -R {absolute_path_requirements.txt} -F {final_image_name} ``` - `account`: The account name for using supercomputers. +- `group_name`: The group name for using supercomputers (you can find the name of the Group on JuDoor). Sometimes account and group name are the same. - `partition`: The partition name on the supercomputer. Please use a partition that has internet access (e.g., `devel`, `develgpus`, `develbooster`, etc.). - `absolute_path_base_image`: The absolute path of the base image that was created before. - `absolute_path_requirements.txt`: The absolute path of the requirements.txt that all containes all pip dependencies. @@ -129,7 +131,7 @@ Here is the example how to build the final image: ```bash -bash scripts/build_customized_nvidia_image.sh -A atmlaml -P devel -S /p/project1/atmlaml/{$USER}/apptainer/pytorch-23-05-py3.sif -R {path_to_requirements}/requirements.txt -F pytorch-23-05-py3-final.sif +bash scripts/build_customized_nvidia_image.sh -A atmlaml -G atmlaml -P devel -S /p/project1/atmlaml/{$USER}/apptainer/pytorch-23-05-py3.sif -R {path_to_requirements}/requirements.txt -F pytorch-23-05-py3-final.sif ``` Check slurm report in `slurm_report` to make sure the singularity image has been created sucessfully. @@ -137,7 +139,7 @@ Athe end of this report, if you see `Hello world. This is my Python version` can You can find the final singularity image in the following path: -`/p/project1/{account}/{$USER}/apptainer/pytorch-23-05-py3-final.sif` +`/p/project1/{account}/$USER/apptainer/pytorch-23-05-py3-final.sif` Congratulations!!! You have successfully created a singularity image. To use this image for your application, you can find an example submission script in `example_submission_script.sh`. diff --git a/scripts/build_base_nvidia_image.sh b/scripts/build_base_nvidia_image.sh index c9bba2560d52e4f5b57eca92c2e4548d2ab90ee0..57b25d57d35ae4191648a97f8d6343f0306a6f79 100644 --- a/scripts/build_base_nvidia_image.sh +++ b/scripts/build_base_nvidia_image.sh @@ -5,25 +5,27 @@ ACCOUNT_NAME="" PARTITION_NAME="" NVIDIA_CONTAINER_VERSION="" SINGULARITY_IMG_NAME="" +GROUP_NAME="" # Parse options -while getopts "A:P:V:N:" opt; do +while getopts "A:G:P:V:N:" opt; do case $opt in A) ACCOUNT_NAME=$OPTARG ;; + G) GROUP_NAME=$OPTARG ;; P) PARTITION_NAME=$OPTARG ;; V) NVIDIA_CONTAINER_VERSION=$OPTARG ;; N) SINGULARITY_IMG_NAME=$OPTARG ;; *) - echo "Usage: $0 -A <account_name> -P <partition_name> -V <nvidia_container_version> -N <singularity_base_img_name>" + echo "Usage: $0 -A <account_name> -G <group_name> -P <partition_name> -V <nvidia_container_version> -N <singularity_base_img_name>" exit 1 ;; esac done # Check if required arguments are provided -if [ -z "$ACCOUNT_NAME" ] || [ -z "$PARTITION_NAME" ] || [ -z "$NVIDIA_CONTAINER_VERSION" ] || [ -z "$SINGULARITY_IMG_NAME" ]; then - echo "Usage: $0 -A <account_name> -P <partition_name> -V <nvidia_container_version> -N <singularity_base_img_name>" +if [ -z "$ACCOUNT_NAME" ] || [ -z "$GROUP_NAME" ] || [ -z "$PARTITION_NAME" ] || [ -z "$NVIDIA_CONTAINER_VERSION" ] || [ -z "$SINGULARITY_IMG_NAME" ]; then + echo "Usage: $0 -A <account_name> -G <group_name> -P <partition_name> -V <nvidia_container_version> -N <singularity_base_img_name>" exit 1 fi @@ -47,7 +49,7 @@ module load Stages module try-load GCC Apptainer-Tools # Define the directory path -DIR="/p/project1/${ACCOUNT_NAME}/${USER}/apptainer" +DIR="/p/project1/${GROUP_NAME}/${USER}/apptainer" # Ensure directory exists if [ ! -d "\$DIR" ]; then @@ -59,7 +61,7 @@ else fi # Create temporary directory for Apptainer build -TMP=\$(mktemp -p /p/scratch/${ACCOUNT_NAME}/apptainer -d) || { echo "Failed to create temporary directory"; exit 1; } +TMP=\$(mktemp -p /p/scratch/${GROUP_NAME}/apptainer -d) || { echo "Failed to create temporary directory"; exit 1; } export APPTAINER_TMPDIR=\$TMP # Create the Singularity definition file in TMP