Skip to content
Snippets Groups Projects

Repository graph

You can move around the graph by using the arrow keys.
Select Git revision
  • add_automatic_checkpoint_and_restart
  • main default protected
2 results
Created with Raphaël 2.2.021Jul8723Jun211331May3020Apr13Update README for most recent submission scriptmainmainAdd 176B training scriptHandle patch application fail betterUpdate preprocess_data.sbatch, ntasks to 1Remove DeepSpeed commit hash specificationFix error propagationsFix path to patch fileFix quitting upon successImprove error handlingDo not `exit` from `source`d scriptsFix DeepSpeed setup by specifying commit hashUpdate remaining paths with new project nameUse patch file from repositoryextended README for using StartLongJobs,bash,changed paths to opengptx-elm and added StartLongRun.bash to start multiple runs, changes in tr1-13...sbatch, s.t. paths are only set if not already setMerge branch 'add_automatic_checkpoint_and_restart' of https://gitlab.jsc.fz-juelich.de/opengptx/bigscience-code into add_automatic_checkpoint_and_restartadd_automatic_c…add_automatic_checkpoint_and_restartextended README for using StartLongJobs,bash,Delete testlower runtime set in defaultforgot to uncommentforgot to uncomment after testingadded specific login node in tensorboard port forwarding suggestionbugfixesactually calling sbatchchanged paths to opengptx-elm and added StartLongRun.bash to start multiple runs, changes in tr1-13...sbatch, s.t. paths are only set if not already setchange project namePrefer SYSTEMNAME variable to /etc/FZJ/systemnameQuit upon execution-location errorsExplain activating working environmentIgnore error from empty Git stashUse dynamic temp directory for building DeepSpeedUninstall DeepSpeed without askingFix return valueExplain partitionsQuit when any variable is not setLink to Megatron-DeepSpeed repositoryInitial commit
Loading