Skip to content
Snippets Groups Projects
Commit 20c34d01 authored by Chelsea Maria John's avatar Chelsea Maria John
Browse files

Merge branch 'master' into 'main'

README

See merge request !4
parents a3a79111 414efb7e
No related branches found
No related tags found
1 merge request!4README
......@@ -6,13 +6,12 @@ the forked [Meta OPT codebase](https://github.com/chelseajohn/metaseq.git).
## Getting Started
### Set up
Assuming you have already set up your environment on the
supercomputer. If you have not, please see the ["Getting started at
Please see the ["Getting started at
JSC"](https://gitlab.jsc.fz-juelich.de/opengptx/infos-public/-/blob/main/documentation/getting_started_at_JSC.md)
guide. Then
guide and setup your environment in the JUWELS supercomputer, if you have not yet. Then
- Clone this repository
- make required changes in `variables.bash`
- make required location changes in `variables.bash`
- execute
```
nice bash setup.bash
......@@ -24,7 +23,7 @@ Make required changes in the `jobscript.sh` like adjusting the `#SBATCH` variabl
```
sbatch jobscript.sh
```
**WARNING** : PyTorch >= 1.11 will complain about not being able to handle some address families and tell you that sockets are invalid. This does **not** hinder the code from scaling according to the number of total GPUs.
**WARNING** : PyTorch >= 1.11 will throw warnings about client socket initializations and `(errno: 97 - Address family not supported by protocol)`. This so far has **not** hindered the code from scaling to the total number of GPUs assigned.
### Launch tensorboard for the run
......@@ -44,7 +43,7 @@ tensorboard serve --logdir="INSERT_TENSORBOARD_LOGDIR" --bind_all
## Interactive Usage
To work interactively, please activate the environment like this:
To work interactively, please activate the environment using the following command:
```
source activate.bash
......@@ -59,7 +58,10 @@ environment, and set the variables specified in `variables.bash`.
- JUWELS Cluster
- JUWELS Booster
Supported means tested and the correct CUDA compute architecture will
be selected. Other machines can easily be supported by adjusting
`activate.bash`.
Other machines can easily be supported by adjusting `activate.bash` and setting the correct CUDA architecture.
## Tested Models
Test runs for 15-30 mins were performed on the follwoing models to train from scratch using the [OSCAR](https://huggingface.co/bigscience/misc-test-data/tree/main/stas) dataset.
- 125m model
- 30b model
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment