... | ... | @@ -37,14 +37,7 @@ python -m pip install --user wandb |
|
|
|
|
|
For simplicity, we disable WandB in the `sbatch` scripts (see below) as we can't upload its results from the compute nodes (you may look into its offline mode if you are interested in using it).
|
|
|
|
|
|
We do not have internet access from the compute nodes, so we cannot download checkpoints while our script is running. You can find common material in `/p/scratch/ccstdl/ebert1/dalle`, including downloaded checkpoints. We link these so DALLE-pytorch can find them at the expected location:
|
|
|
|
|
|
```sh
|
|
|
mkdir -p ~/.cache/dalle
|
|
|
ln -s /p/scratch/ccstdl/ebert1/dalle/checkpoints/* ~/.cache/dalle
|
|
|
```
|
|
|
|
|
|
Additionally, there are up-to-date `sbatch` scripts in `/p/scratch/ccstdl/ebert1/dalle`. Copy these into your local DALLE-pytorch clone:
|
|
|
There are up-to-date `sbatch` scripts in `/p/scratch/ccstdl/ebert1/dalle`. Copy these into your local DALLE-pytorch clone:
|
|
|
|
|
|
```sh
|
|
|
cp /p/scratch/ccstdl/ebert1/dalle/*.sbatch ~/$USER/DALLE-pytorch
|
... | ... | @@ -55,14 +48,11 @@ cp /p/scratch/ccstdl/ebert1/dalle/*.sbatch ~/$USER/DALLE-pytorch |
|
|
### Choose a Variational Autoencoder
|
|
|
The VAE is responsible for representing images efficiently via pretraining. You can use a VAE released by OpenAI, the various flavors of VQGAN from Heidelberg, or you can train your own discrete VAE from scratch.
|
|
|
|
|
|
To download the Discrete VAE released by OpenAI, do a dry-run on the login node while you still have internet. Since there are no GPU's on login nodes, the process will fail out after automatically downloading the discrete VAE:
|
|
|
```sh
|
|
|
python train_dalle.py --image_text_folder DALLE-pytorch/images
|
|
|
```
|
|
|
We do not have internet access from the compute nodes, so we cannot download checkpoints while our script is running. You can find common material in `/p/scratch/ccstdl/ebert1/dalle`, including downloaded checkpoints. We link these so DALLE-pytorch can find them at the expected location:
|
|
|
|
|
|
The same goes for downloading the (smaller) 1024 Token VQGAN trained on ImageNet by Heidelberg. You just need to append the `--taming` argument and do another dry-run on a login node. As long as you see a download progress finish out, it's okay if there are some errors at the end.
|
|
|
```sh
|
|
|
python train_dalle.py --image_text_folder DALLE-pytorch/images --taming # autodownloads the 1024 imagenet vqgan
|
|
|
mkdir -p ~/.cache/dalle
|
|
|
ln -s /p/scratch/ccstdl/ebert1/dalle/checkpoints/* ~/.cache/dalle
|
|
|
```
|
|
|
|
|
|
You can also use arbitrary checkpoints matching the VQGAN architecture if you have a `.yaml` and `.ckpt` in the format described in https://github.com/CompVis/taming-transformers. For instance:
|
... | ... | |