Add packages from JSC modules to requirements_noHPC.txt
Currently, data extraction and preprocessing step1 depend on modules that are provided on JSC's HPC-systems.
Since these modules are not provided on other systems (i.e. they cannot be used by external users without access to our HPC-system), but provide Python packages used for both steps, missing packages must be added to requirements_noHPC.txt
.
For figuring out those packages, running the command pip list -v
after activating the virtual environment and loading the modules can help. This command lists the location of the all Python packages in the filesystem. The packages from the modules, i.e. under /p/software/juwels/stages/<...>
are stemming from the modules, e.g. /p/software/juwels/stages/2020/SciPy-Stack/2021-gcccoremkl-10.3.0-2021.2.0-Python-3.8.5/lib/python3.8/site-packages/numpy-1.19.1-py3.8-linux-x86_64.egg
for numpy==1.19.1
. Of particular relevance are all the Python packages that are imported in the Python-scripts used for data extraction and preprocessing. Since the number of import
-statements is not that large, it is probably best to do this manually.
However note that there might be conflicts with the existing requirements related to the container environment. For instance, the TFcontainer includes numpy==1.17.3
, while numpy==1.19.5
is used with the SciPy-Stack-module (see above). However, we should stick to the versions used in conjunction with the container, i.e. don't add numpy==1.19.5
to the requirements, but keep numpy==1.17.3
. Another discrepancy realtes to mpi4py
which runs under version 3.0.3
with the HPC-system modules, whereas 3.0.1
is used in he container. Thus, the latter should go to requirements_noHPC.txt
.
Finally, integrity should be tested on zam347
and runscripts for this machine should be set-up.