Implement AdaSum for parallelized training of convLSTM

So far, the Adam Optimizer is used for parallelized training of the convLSTM architecture. However, the performance of the NN crucially depends on the number of involved GPUs and degrades continuously with increasing number of GPUs. In this branch, the AdaSum optimizer will be integrated and tested as an alternative for convLSTM.