Skip to content
Snippets Groups Projects
Commit 20c34d01 authored by Chelsea Maria John's avatar Chelsea Maria John
Browse files

Merge branch 'master' into 'main'

README

See merge request !4
parents a3a79111 414efb7e
Branches main
No related tags found
1 merge request!4README
...@@ -6,13 +6,12 @@ the forked [Meta OPT codebase](https://github.com/chelseajohn/metaseq.git). ...@@ -6,13 +6,12 @@ the forked [Meta OPT codebase](https://github.com/chelseajohn/metaseq.git).
## Getting Started ## Getting Started
### Set up ### Set up
Assuming you have already set up your environment on the Please see the ["Getting started at
supercomputer. If you have not, please see the ["Getting started at
JSC"](https://gitlab.jsc.fz-juelich.de/opengptx/infos-public/-/blob/main/documentation/getting_started_at_JSC.md) JSC"](https://gitlab.jsc.fz-juelich.de/opengptx/infos-public/-/blob/main/documentation/getting_started_at_JSC.md)
guide. Then guide and setup your environment in the JUWELS supercomputer, if you have not yet. Then
- Clone this repository - Clone this repository
- make required changes in `variables.bash` - make required location changes in `variables.bash`
- execute - execute
``` ```
nice bash setup.bash nice bash setup.bash
...@@ -24,7 +23,7 @@ Make required changes in the `jobscript.sh` like adjusting the `#SBATCH` variabl ...@@ -24,7 +23,7 @@ Make required changes in the `jobscript.sh` like adjusting the `#SBATCH` variabl
``` ```
sbatch jobscript.sh sbatch jobscript.sh
``` ```
**WARNING** : PyTorch >= 1.11 will complain about not being able to handle some address families and tell you that sockets are invalid. This does **not** hinder the code from scaling according to the number of total GPUs. **WARNING** : PyTorch >= 1.11 will throw warnings about client socket initializations and `(errno: 97 - Address family not supported by protocol)`. This so far has **not** hindered the code from scaling to the total number of GPUs assigned.
### Launch tensorboard for the run ### Launch tensorboard for the run
...@@ -44,7 +43,7 @@ tensorboard serve --logdir="INSERT_TENSORBOARD_LOGDIR" --bind_all ...@@ -44,7 +43,7 @@ tensorboard serve --logdir="INSERT_TENSORBOARD_LOGDIR" --bind_all
## Interactive Usage ## Interactive Usage
To work interactively, please activate the environment like this: To work interactively, please activate the environment using the following command:
``` ```
source activate.bash source activate.bash
...@@ -59,7 +58,10 @@ environment, and set the variables specified in `variables.bash`. ...@@ -59,7 +58,10 @@ environment, and set the variables specified in `variables.bash`.
- JUWELS Cluster - JUWELS Cluster
- JUWELS Booster - JUWELS Booster
Supported means tested and the correct CUDA compute architecture will Other machines can easily be supported by adjusting `activate.bash` and setting the correct CUDA architecture.
be selected. Other machines can easily be supported by adjusting
`activate.bash`. ## Tested Models
Test runs for 15-30 mins were performed on the follwoing models to train from scratch using the [OSCAR](https://huggingface.co/bigscience/misc-test-data/tree/main/stas) dataset.
- 125m model
- 30b model
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment