... | ... | @@ -50,8 +50,36 @@ Additionally, there are up-to-date `sbatch` scripts in `/p/scratch/ccstdl/ebert1 |
|
|
cp /p/scratch/ccstdl/ebert1/dalle/*.sbatch ~/$USER/DALLE-pytorch
|
|
|
```
|
|
|
|
|
|
## Starting a Training Job
|
|
|
## Train DALLE-pytorch
|
|
|
|
|
|
### Choose a Variational Autoencoder
|
|
|
The VAE is responsible for representing images efficiently via pretraining. You can use a VAE released by OpenAI, the various flavors of VQGAN from Heidelberg, or you can train your own discrete VAE from scratch.
|
|
|
|
|
|
To download the Discrete VAE released by OpenAI, do a dry-run on the login node while you still have internet. Since there are no GPU's on login nodes, the process will fail out after automatically downloading the discrete VAE:
|
|
|
```sh
|
|
|
python train_dalle.py --image_text_folder DALLE-pytorch/images
|
|
|
```
|
|
|
|
|
|
The same goes for downloading the (smaller) 1024 Token VQGAN trained on ImageNet by Heidelberg. You just need to append the `--taming` argument and do another dry-run on a login node. As long as you see a download progress finish out, it's okay if there are some errors at the end.
|
|
|
```sh
|
|
|
python train_dalle.py --image_text_folder DALLE-pytorch/images --taming # autodownloads the 1024 imagenet vqgan
|
|
|
```
|
|
|
|
|
|
You can also use arbitrary checkpoints matching the VQGAN architecture if you have a `.yaml` and `.ckpt` in the format described in https://github.com/CompVis/taming-transformers. For instance:
|
|
|
```sh
|
|
|
HOME_PATH=lastname1
|
|
|
wget --continue http://batbot.tv/ai/models/imagenet_16384_slim.ckpt -o /p/scratch/ccstdl/${HOME_PATH}/vqgan_models/imagenet_16384_slim.ckpt
|
|
|
wget --continue http://batbot.tv/ai/models/imagenet_16384.yaml -o /p/scratch/ccstdl/${HOME_PATH}/vqgan_models/imagenet_16384.yaml
|
|
|
```
|
|
|
_please see https://github.com/CompVis/taming-transformers if any mirrors fail._
|
|
|
|
|
|
The most recent addition from Heidelberg as of this writing is the GumbelVQGAN trained on Open Images:
|
|
|
```sh
|
|
|
wget --continue http://batbot.tv/ai/models/gumbel_f8_8192.ckpt -o /p/scratch/ccstdl/${HOME_PATH}/vqgan_models/gumbel_f8_8192.ckpt
|
|
|
wget --continue https://heibox.uni-heidelberg.de/seafhttp/files/a5b2f0d5-bccd-4421-a9a5-864df8659560/model.yaml -o /p/scratch/ccstdl/${HOME_PATH}/vqgan_models/gumbel_f8_8192.yaml
|
|
|
```
|
|
|
|
|
|
## Running jobs
|
|
|
Depending on the supercomputer you are on, you have to change the `--partition` in the `sbatch` script you want to use. The `sinfo` command lists all partitions for the machine you are on; look out for names like `develgpus` or `develbooster`.
|
|
|
Once the partitions are configured correctly, you should be able to start a DALLE-pytorch training job using `sbatch <script.sbatch>`! These will use an example dataset also located at `/p/scratch/ccstdl/ebert1/dalle`.
|
|
|
|
... | ... | |