Issues happend for Horovod on new modular version #

@langguth1, as I mentioned by using Horovod on new modular version does not work on multiple nodes (without any error, but just stop training). I suspect that this is related to the function to save the worker gpu information to json file. I just simple comment it. And the transcript can run.