save as 1 node on the example

ad0e525f · Alexandre Strube · 4d8f63ba · ad0e525f · ad0e525f · ad0e525f
Commit ad0e525f authored 2 years ago by Alexandre Strube
--- a/01-deep-learning-on-supercomputers.md
+++ b/01-deep-learning-on-supercomputers.md
@@ -490,9 +490,10 @@ learn.fine_tune(6)
 - Add this to requirements.txt:
 - ```python
 fastai
-accelerate
 deepspeed
+git+https://github.com/huggingface/accelerate@rdzv-endpoint
 ```
+- (the last one will become `accelerate` later this week)
 - Run `./setup.sh`
 - `source activate.sh`
 - Done! You installed everything you need
@@ -625,16 +626,21 @@ epoch     train_loss  valid_loss  accuracy  top_k_accuracy  time
 5         1.554356    1.450976    0.502798  0.914547        00:08  
 real	1m19.979s
 ```
+
 ---

 ## Some insights

- Distributed run suffered a bit on the accuracy and loss in exchange for speed 🏎️
- Data parallel is a simple and effective way to distribute DL workload
+- Distributed run suffered a bit on the accuracy 🎯 and loss 😩
+  - In exchange for speed 🏎️
+- It's more than 4x faster because the library is multi-threaded (and now we use 48 threads)
+- I/O is automatically parallelized / sharded by Fast.AI library
+- Data parallel is a simple and effective way to distribute DL workload 💪
 - This is really just a primer - there's much more to that
 - I/O plays a HUGE role on Supercomputers, for example

 ---
+
 ## Multi-node

 - Simply change `#SBATCH --nodes=2` on the submission file!

--- a/public/01-deep-learning-on-supercomputers.html
+++ b/public/01-deep-learning-on-supercomputers.html
@@ -692,8 +692,10 @@ class="fragment"><code>git clone https://gitlab.jsc.fz-juelich.de/kesselheim1/sc
 <li class="fragment">Add this to requirements.txt:</li>
 <li class="fragment"><div class="sourceCode" id="cb5"><pre
 class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>fastai</span>
-<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>accelerate</span>
-<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>deepspeed</span></code></pre></div></li>
+<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>deepspeed</span>
+<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>git<span class="op">+</span>https:<span class="op">//</span>github.com<span class="op">/</span>huggingface<span class="op">/</span>accelerate<span class="op">@</span>rdzv<span class="op">-</span>endpoint</span></code></pre></div></li>
+<li class="fragment">(the last one will become <code>accelerate</code>
+later this week)</li>
 <li class="fragment">Run <code>./setup.sh</code></li>
 <li class="fragment"><code>source activate.sh</code></li>
 <li class="fragment">Done! You installed everything you need</li>
@@ -812,40 +814,40 @@ class="sourceCode bash"><code class="sourceCode bash"><span id="cb14-1"><a href=
 <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span>         1.754019    1.687136    0.404883  0.872330        00:08                        </span>
 <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span>         1.643759    1.499526    0.482706  0.906409        00:08                        </span>
 <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span>         1.554356    1.450976    0.502798  0.914547        00:08  </span>
-<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span>    1m19.979s</span></code></pre></div>
-<hr /></li>
+<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span>    1m19.979s</span></code></pre></div></li>
 </ul>
 </section>
 <section id="some-insights" class="slide level2">
 <h2>Some insights</h2>
 <ul>
-<li class="fragment">Distributed run suffered a bit on the accuracy and
-loss in exchange for speed 🏎️</li>
+<li class="fragment">Distributed run suffered a bit on the accuracy 🎯
+and loss 😩
+<ul>
+<li class="fragment">In exchange for speed 🏎️</li>
+</ul></li>
+<li class="fragment">It’s more than 4x faster because the library is
+multi-threaded (and now we use 48 threads)</li>
+<li class="fragment">I/O is automatically parallelized / sharded by
+Fast.AI library</li>
 <li class="fragment">Data parallel is a simple and effective way to
-distribute DL workload</li>
+distribute DL workload 💪</li>
 <li class="fragment">This is really just a primer - there’s much more to
 that</li>
 <li class="fragment">I/O plays a HUGE role on Supercomputers, for
 example</li>
 </ul>
-<table style="width:6%;">
-<colgroup>
-<col style="width: 5%" />
-</colgroup>
-<tbody>
-<tr class="odd">
-<td>## Multi-node</td>
-</tr>
-<tr class="even">
-<td>- Simply change <code>#SBATCH --nodes=2</code> on the submission
-file! - THAT’S IT</td>
-</tr>
-</tbody>
-</table>
 </section>
 <section id="multi-node" class="slide level2">
 <h2>Multi-node</h2>
 <ul>
+<li class="fragment">Simply change <code>#SBATCH --nodes=2</code> on the
+submission file!</li>
+<li class="fragment">THAT’S IT</li>
+</ul>
+</section>
+<section id="multi-node-1" class="slide level2">
+<h2>Multi-node</h2>
+<ul>
 <li class="fragment"><div class="sourceCode" id="cb15"><pre
 class="sourceCode bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span>     train_loss  valid_loss  accuracy  top_k_accuracy  time    </span>
 <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span>         2.230926    2.414113    0.170986  0.654726        00:10                       </span>

--- a/src/distrib.slurm
+++ b/src/distrib.slurm
 #!/bin/bash
 #SBATCH --account=training2306
-#SBATCH --nodes=2
+#SBATCH --nodes=1
 #SBATCH --job-name=ai-multi-gpu
 #SBATCH --ntasks-per-node=1
 #SBATCH --cpus-per-task=48