Skip to content
Snippets Groups Projects
Commit 39fe358a authored by Alexandre Strube's avatar Alexandre Strube
Browse files

big models

parent ad0e525f
Branches
No related tags found
No related merge requests found
Pipeline #140818 passed
...@@ -652,25 +652,28 @@ real 1m19.979s ...@@ -652,25 +652,28 @@ real 1m19.979s
- ```bash - ```bash
epoch train_loss valid_loss accuracy top_k_accuracy time epoch train_loss valid_loss accuracy top_k_accuracy time
0 2.230926 2.414113 0.170986 0.654726 00:10 0 2.242036 2.192690 0.201728 0.681148 00:10
epoch train_loss valid_loss accuracy top_k_accuracy time epoch train_loss valid_loss accuracy top_k_accuracy time
0 1.986611 1.993477 0.298018 0.790142 00:06 0 2.035004 2.084082 0.246189 0.748984 00:05
1 1.954962 2.180505 0.249238 0.765498 00:06 1 1.981432 2.054528 0.247205 0.764482 00:05
2 1.915481 2.004775 0.301829 0.803354 00:06 2 1.942930 1.918441 0.316057 0.821138 00:05
3 1.853237 1.827811 0.364583 0.837906 00:06 3 1.898426 1.832725 0.370173 0.839431 00:05
4 1.783993 1.779548 0.391768 0.847307 00:06 4 1.859066 1.781805 0.375508 0.858740 00:05
5 1.718417 1.642507 0.422002 0.884909 00:06 5 1.820968 1.743448 0.394055 0.864583 00:05
real 1m15.651s
``` ```
--- ---
## Some insights ## Some insights
- It's faster per epoch, but not by much (6 seconds vs 8 seconds) - It's faster per epoch, but not by much (5 seconds vs 8 seconds)
- Accuracy and loss suffered - Accuracy and loss suffered
- This is a very simple model, so it's not surprising - This is a very simple model, so it's not surprising
- It fits into 4gb, we "stretched" it to a 320gb system
- You need bigger models to really exercise the gpu and scaling - You need bigger models to really exercise the gpu and scaling
- There's a lot more to that - There's a lot more to that, but for now, let's focus on medium/big sized models
- For Gigantic and Humongous-sized models, there's a DL scaling course at JSC!
--- ---
......
...@@ -850,27 +850,36 @@ submission file!</li> ...@@ -850,27 +850,36 @@ submission file!</li>
<ul> <ul>
<li class="fragment"><div class="sourceCode" id="cb15"><pre <li class="fragment"><div class="sourceCode" id="cb15"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span> train_loss valid_loss accuracy top_k_accuracy time </span> class="sourceCode bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span> train_loss valid_loss accuracy top_k_accuracy time </span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span> 2.230926 2.414113 0.170986 0.654726 00:10 </span> <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span> 2.242036 2.192690 0.201728 0.681148 00:10 </span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span> train_loss valid_loss accuracy top_k_accuracy time </span> <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span> train_loss valid_loss accuracy top_k_accuracy time </span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span> 1.986611 1.993477 0.298018 0.790142 00:06 </span> <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span> 2.035004 2.084082 0.246189 0.748984 00:05 </span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="ex">1</span> 1.954962 2.180505 0.249238 0.765498 00:06 </span> <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="ex">1</span> 1.981432 2.054528 0.247205 0.764482 00:05 </span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a><span class="ex">2</span> 1.915481 2.004775 0.301829 0.803354 00:06 </span> <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a><span class="ex">2</span> 1.942930 1.918441 0.316057 0.821138 00:05 </span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span> 1.853237 1.827811 0.364583 0.837906 00:06 </span> <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span> 1.898426 1.832725 0.370173 0.839431 00:05 </span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span> 1.783993 1.779548 0.391768 0.847307 00:06 </span> <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span> 1.859066 1.781805 0.375508 0.858740 00:05 </span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span> 1.718417 1.642507 0.422002 0.884909 00:06 </span></code></pre></div></li> <span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span> 1.820968 1.743448 0.394055 0.864583 00:05</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span> 1m15.651s </span></code></pre></div></li>
</ul> </ul>
</section> </section>
<section id="some-insights-1" class="slide level2"> <section id="some-insights-1" class="slide level2">
<h2>Some insights</h2> <h2>Some insights</h2>
<ul> <ul>
<li class="fragment">It’s faster per epoch, but not by much (6 seconds <li class="fragment">It’s faster per epoch, but not by much (5 seconds
vs 8 seconds)</li> vs 8 seconds)</li>
<li class="fragment">Accuracy and loss suffered</li> <li class="fragment">Accuracy and loss suffered</li>
<li class="fragment">This is a very simple model, so it’s not <li class="fragment">This is a very simple model, so it’s not surprising
surprising</li> <ul>
<li class="fragment">It fits into 4gb, we “stretched” it to a 320gb
system</li>
</ul></li>
<li class="fragment">You need bigger models to really exercise the gpu <li class="fragment">You need bigger models to really exercise the gpu
and scaling</li> and scaling</li>
<li class="fragment">There’s a lot more to that</li> <li class="fragment">There’s a lot more to that, but for now, let’s
focus on medium/big sized models
<ul>
<li class="fragment">For Gigantic and Humongous-sized models, there’s a
DL scaling course at JSC!</li>
</ul></li>
</ul> </ul>
</section> </section>
<section id="thats-all-folks" class="slide level2"> <section id="thats-all-folks" class="slide level2">
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment