Skip to content
Snippets Groups Projects
Commit ad0e525f authored by Alexandre Strube's avatar Alexandre Strube
Browse files

save as 1 node on the example

parent 4d8f63ba
No related branches found
No related tags found
No related merge requests found
Pipeline #140815 passed
......@@ -490,9 +490,10 @@ learn.fine_tune(6)
- Add this to requirements.txt:
- ```python
fastai
accelerate
deepspeed
git+https://github.com/huggingface/accelerate@rdzv-endpoint
```
- (the last one will become `accelerate` later this week)
- Run `./setup.sh`
- `source activate.sh`
- Done! You installed everything you need
......@@ -625,16 +626,21 @@ epoch train_loss valid_loss accuracy top_k_accuracy time
5 1.554356 1.450976 0.502798 0.914547 00:08
real 1m19.979s
```
---
## Some insights
- Distributed run suffered a bit on the accuracy and loss in exchange for speed 🏎️
- Data parallel is a simple and effective way to distribute DL workload
- Distributed run suffered a bit on the accuracy 🎯 and loss 😩
- In exchange for speed 🏎️
- It's more than 4x faster because the library is multi-threaded (and now we use 48 threads)
- I/O is automatically parallelized / sharded by Fast.AI library
- Data parallel is a simple and effective way to distribute DL workload 💪
- This is really just a primer - there's much more to that
- I/O plays a HUGE role on Supercomputers, for example
---
## Multi-node
- Simply change `#SBATCH --nodes=2` on the submission file!
......
......@@ -692,8 +692,10 @@ class="fragment"><code>git clone https://gitlab.jsc.fz-juelich.de/kesselheim1/sc
<li class="fragment">Add this to requirements.txt:</li>
<li class="fragment"><div class="sourceCode" id="cb5"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>fastai</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>accelerate</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>deepspeed</span></code></pre></div></li>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>deepspeed</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>git<span class="op">+</span>https:<span class="op">//</span>github.com<span class="op">/</span>huggingface<span class="op">/</span>accelerate<span class="op">@</span>rdzv<span class="op">-</span>endpoint</span></code></pre></div></li>
<li class="fragment">(the last one will become <code>accelerate</code>
later this week)</li>
<li class="fragment">Run <code>./setup.sh</code></li>
<li class="fragment"><code>source activate.sh</code></li>
<li class="fragment">Done! You installed everything you need</li>
......@@ -812,40 +814,40 @@ class="sourceCode bash"><code class="sourceCode bash"><span id="cb14-1"><a href=
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span> 1.754019 1.687136 0.404883 0.872330 00:08 </span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span> 1.643759 1.499526 0.482706 0.906409 00:08 </span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span> 1.554356 1.450976 0.502798 0.914547 00:08 </span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span> 1m19.979s</span></code></pre></div>
<hr /></li>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span> 1m19.979s</span></code></pre></div></li>
</ul>
</section>
<section id="some-insights" class="slide level2">
<h2>Some insights</h2>
<ul>
<li class="fragment">Distributed run suffered a bit on the accuracy and
loss in exchange for speed 🏎️</li>
<li class="fragment">Distributed run suffered a bit on the accuracy 🎯
and loss 😩
<ul>
<li class="fragment">In exchange for speed 🏎️</li>
</ul></li>
<li class="fragment">It’s more than 4x faster because the library is
multi-threaded (and now we use 48 threads)</li>
<li class="fragment">I/O is automatically parallelized / sharded by
Fast.AI library</li>
<li class="fragment">Data parallel is a simple and effective way to
distribute DL workload</li>
distribute DL workload 💪</li>
<li class="fragment">This is really just a primer - there’s much more to
that</li>
<li class="fragment">I/O plays a HUGE role on Supercomputers, for
example</li>
</ul>
<table style="width:6%;">
<colgroup>
<col style="width: 5%" />
</colgroup>
<tbody>
<tr class="odd">
<td>## Multi-node</td>
</tr>
<tr class="even">
<td>- Simply change <code>#SBATCH --nodes=2</code> on the submission
file! - THAT’S IT</td>
</tr>
</tbody>
</table>
</section>
<section id="multi-node" class="slide level2">
<h2>Multi-node</h2>
<ul>
<li class="fragment">Simply change <code>#SBATCH --nodes=2</code> on the
submission file!</li>
<li class="fragment">THAT’S IT</li>
</ul>
</section>
<section id="multi-node-1" class="slide level2">
<h2>Multi-node</h2>
<ul>
<li class="fragment"><div class="sourceCode" id="cb15"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span> train_loss valid_loss accuracy top_k_accuracy time </span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span> 2.230926 2.414113 0.170986 0.654726 00:10 </span>
......
#!/bin/bash
#SBATCH --account=training2306
#SBATCH --nodes=2
#SBATCH --nodes=1
#SBATCH --job-name=ai-multi-gpu
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment