Implementation of gradient accumulation for SAVP
In order to allow for a larger batch size during SAVP training (it is hoped that this stabilizes the training), gradient accumulation (with Horovod) will be implemented in this working branch.
In order to allow for a larger batch size during SAVP training (it is hoped that this stabilizes the training), gradient accumulation (with Horovod) will be implemented in this working branch.