Data Extraction import mpi4py error
@langguth1 I did the last testing in order to provide the code for GMD paper. I tested for Data Extraction setp with the new environment for the container. And I met some packages issues as described below.
I submit the job on HDF-ML with the following runscript:
My runscript as below:
#!/bin/bash -x
## Controlling Batch-job
#SBATCH --account=deepacf
#SBATCH --nodes=1
#SBATCH --ntasks=13
##SBATCH --ntasks-per-node=13
#SBATCH --cpus-per-task=1
#SBATCH --output=../HPC_scripts/data_extraction_era5-out.%j
#SBATCH --error=../HPC_scripts/data_extraction_era5-err.%j
#SBATCH --time=04:20:00
#SBATCH --partition=batch
#SBATCH --gres=gpu:0
#SBATCH --mail-type=ALL
#SBATCH --mail-user=b.gong@fz-juelich.de
jutil env activate -p deepacf
# Name of virtual environment
VIRT_ENV_NAME=venv_hdfml
# Loading mouldes
source ../env_setup/modules_preprocess+extract.sh
# Activate virtual environment if needed (and possible)
if [ -z ${VIRTUAL_ENV} ]; then
if [[ -f ../virtual_envs/${VIRT_ENV_NAME}/bin/activate ]]; then
echo "Activating virtual environment..."
source ../virtual_envs/${VIRT_ENV_NAME}/bin/activate
else
echo "ERROR: Requested virtual environment ${VIRT_ENV_NAME} not found..."
exit 1
fi
fi
# Declare path-variables (dest_dir will be set and configured automatically via generate_runscript.py)
source_dir=/p/fastdata/slmet/slmet111/met_data/ecmwf/era5/grib/
destination_dir=/p/project/deepacf/deeprain/video_prediction_shared_folder/extractedData
varmap_file=../data_preprocess/era5_varmapping.json
years=("2016")
# Run data extraction
for year in "${years[@]}"; do
echo "Perform ERA5-data extraction for year ${year}"
srun python3 ../main_scripts/main_data_extraction.py --source_dir ${source_dir} --target_dir ${destination_dir} \
--year ${year} --varslist_path ${varmap_file}
done
Firstly I got the error that "No module named mpi4py" as below
Traceback (most recent call last):
File "../main_scripts/main_data_extraction.py", line 10, in <module>
from mpi4py import MPI
ImportError: No module named mpi4py
Traceback (most recent call last):
File "../main_scripts/main_data_extraction.py", line 10, in <module>
from mpi4py import MPI
ImportError: No module named mpi4py
Traceback (most recent call last):
File "../main_scripts/main_data_extraction.py", line 10, in <module>
from mpi4py import MPI
ImportError: No module named mpi4py
srun: error: hdfmlc01: tasks 1,3,5-12: Terminated
srun: error: hdfmlc01: tasks 0,2,4: Exited with exit code 1
srun: Force Terminated job step 31215.0
Then I checked my environment site-packages '/p/home/jusers/gong1/hdfml/bing/ambs_container/video_prediction_tools/virtual_envs/venv_hdfml/lib/python3.8' that the mpi4py is installed and also we loaded the module for mpi4py 'modules_preporcess+extract.sh'. This somehow wird. Then I manually load the modules and activate environment, then I. checked the PYTHONPATH, and realise that the path of site-packages is not in the PYTHONPATH. I suppose this path is missing when you set PATHONPATH in the modules_preporcess+extract.sh.
After that tried again, and the error happens saying cannot import MPI for 'from mpi4py import MPI' in the Main_data_extraction.py. I think this is because in the modules_preporcess+extract.sh Line 70, the PYTHONPATH is setup from scratch and remove the system PYTHONPATH of loading HPC system pacakges. To solve that I replace Line 70 with 'export PYTHONPATH=/usr/local/lib/python3.8/dist-packages/:$PYTHONPATH' in order to keep the system package PYTHONPATH.
I resubmited my job, everything is fine. please check if the revision is ok ? Once I got your confirmation I will merge back to develop branch. Thanks!