revert many changes as multiprocessing approach will not work on HPC system with really big data due to memory issues