diff --git a/01-deep-learning-on-supercomputers.md b/01-deep-learning-on-supercomputers.md index 1be393f13e14819d854f720e808aebff943bda86..7749cc6854e7d806ab84dad9482fbbc494377995 100644 --- a/01-deep-learning-on-supercomputers.md +++ b/01-deep-learning-on-supercomputers.md @@ -490,9 +490,10 @@ learn.fine_tune(6) - Add this to requirements.txt: - ```python fastai -accelerate deepspeed +git+https://github.com/huggingface/accelerate@rdzv-endpoint ``` +- (the last one will become `accelerate` later this week) - Run `./setup.sh` - `source activate.sh` - Done! You installed everything you need @@ -625,16 +626,21 @@ epoch train_loss valid_loss accuracy top_k_accuracy time 5 1.554356 1.450976 0.502798 0.914547 00:08 real 1m19.979s ``` + --- ## Some insights -- Distributed run suffered a bit on the accuracy and loss in exchange for speed 🏎️ -- Data parallel is a simple and effective way to distribute DL workload +- Distributed run suffered a bit on the accuracy 🎯 and loss 😩 + - In exchange for speed 🏎️ +- It's more than 4x faster because the library is multi-threaded (and now we use 48 threads) +- I/O is automatically parallelized / sharded by Fast.AI library +- Data parallel is a simple and effective way to distribute DL workload 💪 - This is really just a primer - there's much more to that - I/O plays a HUGE role on Supercomputers, for example --- + ## Multi-node - Simply change `#SBATCH --nodes=2` on the submission file! diff --git a/public/01-deep-learning-on-supercomputers.html b/public/01-deep-learning-on-supercomputers.html index 045d5b7083cf80d8107970b19bc58aa0752d4c82..316b378bac70943833cdafb33b284789f93cbe18 100644 --- a/public/01-deep-learning-on-supercomputers.html +++ b/public/01-deep-learning-on-supercomputers.html @@ -692,8 +692,10 @@ class="fragment"><code>git clone https://gitlab.jsc.fz-juelich.de/kesselheim1/sc <li class="fragment">Add this to requirements.txt:</li> <li class="fragment"><div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>fastai</span> -<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>accelerate</span> -<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>deepspeed</span></code></pre></div></li> +<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>deepspeed</span> +<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>git<span class="op">+</span>https:<span class="op">//</span>github.com<span class="op">/</span>huggingface<span class="op">/</span>accelerate<span class="op">@</span>rdzv<span class="op">-</span>endpoint</span></code></pre></div></li> +<li class="fragment">(the last one will become <code>accelerate</code> +later this week)</li> <li class="fragment">Run <code>./setup.sh</code></li> <li class="fragment"><code>source activate.sh</code></li> <li class="fragment">Done! You installed everything you need</li> @@ -812,40 +814,40 @@ class="sourceCode bash"><code class="sourceCode bash"><span id="cb14-1"><a href= <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span> 1.754019 1.687136 0.404883 0.872330 00:08 </span> <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span> 1.643759 1.499526 0.482706 0.906409 00:08 </span> <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span> 1.554356 1.450976 0.502798 0.914547 00:08 </span> -<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span> 1m19.979s</span></code></pre></div> -<hr /></li> +<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span> 1m19.979s</span></code></pre></div></li> </ul> </section> <section id="some-insights" class="slide level2"> <h2>Some insights</h2> <ul> -<li class="fragment">Distributed run suffered a bit on the accuracy and -loss in exchange for speed 🏎️</li> +<li class="fragment">Distributed run suffered a bit on the accuracy 🎯 +and loss 😩 +<ul> +<li class="fragment">In exchange for speed 🏎️</li> +</ul></li> +<li class="fragment">It’s more than 4x faster because the library is +multi-threaded (and now we use 48 threads)</li> +<li class="fragment">I/O is automatically parallelized / sharded by +Fast.AI library</li> <li class="fragment">Data parallel is a simple and effective way to -distribute DL workload</li> +distribute DL workload 💪</li> <li class="fragment">This is really just a primer - there’s much more to that</li> <li class="fragment">I/O plays a HUGE role on Supercomputers, for example</li> </ul> -<table style="width:6%;"> -<colgroup> -<col style="width: 5%" /> -</colgroup> -<tbody> -<tr class="odd"> -<td>## Multi-node</td> -</tr> -<tr class="even"> -<td>- Simply change <code>#SBATCH --nodes=2</code> on the submission -file! - THAT’S IT</td> -</tr> -</tbody> -</table> </section> <section id="multi-node" class="slide level2"> <h2>Multi-node</h2> <ul> +<li class="fragment">Simply change <code>#SBATCH --nodes=2</code> on the +submission file!</li> +<li class="fragment">THAT’S IT</li> +</ul> +</section> +<section id="multi-node-1" class="slide level2"> +<h2>Multi-node</h2> +<ul> <li class="fragment"><div class="sourceCode" id="cb15"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span> train_loss valid_loss accuracy top_k_accuracy time </span> <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span> 2.230926 2.414113 0.170986 0.654726 00:10 </span> diff --git a/src/distrib.slurm b/src/distrib.slurm index f9471bf4fdb5f855ee25076ffff895d643600c13..02b63c28b547c8766cd5d075545083ac386a5823 100644 --- a/src/distrib.slurm +++ b/src/distrib.slurm @@ -1,6 +1,6 @@ #!/bin/bash #SBATCH --account=training2306 -#SBATCH --nodes=2 +#SBATCH --nodes=1 #SBATCH --job-name=ai-multi-gpu #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=48