big models

39fe358a · Alexandre Strube · ad0e525f · 39fe358a · 39fe358a
Commit 39fe358a authored May 31, 2023 by Alexandre Strube
--- a/01-deep-learning-on-supercomputers.md
+++ b/01-deep-learning-on-supercomputers.md
@@ -652,25 +652,28 @@ real	1m19.979s
 - ```bash
 epoch     train_loss  valid_loss  accuracy  top_k_accuracy  time    
-0         2.230926    2.414113    0.170986  0.654726        00:10                       
+0         2.242036    2.192690    0.201728  0.681148        00:10                      
 epoch     train_loss  valid_loss  accuracy  top_k_accuracy  time    
-0         1.986611    1.993477    0.298018  0.790142        00:06                       
+0         2.035004    2.084082    0.246189  0.748984        00:05                      
-1         1.954962    2.180505    0.249238  0.765498        00:06                       
+1         1.981432    2.054528    0.247205  0.764482        00:05                      
-2         1.915481    2.004775    0.301829  0.803354        00:06                       
+2         1.942930    1.918441    0.316057  0.821138        00:05                      
-3         1.853237    1.827811    0.364583  0.837906        00:06                       
+3         1.898426    1.832725    0.370173  0.839431        00:05                      
-4         1.783993    1.779548    0.391768  0.847307        00:06                       
+4         1.859066    1.781805    0.375508  0.858740        00:05                      
-5         1.718417    1.642507    0.422002  0.884909        00:06  
+5         1.820968    1.743448    0.394055  0.864583        00:05
+real	1m15.651s    
 ```
 ---
 ## Some insights
- It's faster per epoch, but not by much (6 seconds vs 8 seconds)
+- It's faster per epoch, but not by much (5 seconds vs 8 seconds)
 - Accuracy and loss suffered
 - This is a very simple model, so it's not surprising
+    - It fits into 4gb, we "stretched" it to a 320gb system
 - You need bigger models to really exercise the gpu and scaling
- There's a lot more to that
+- There's a lot more to that, but for now, let's focus on medium/big sized models
+    - For Gigantic and Humongous-sized models, there's a DL scaling course at JSC!
 ---

--- a/public/01-deep-learning-on-supercomputers.html
+++ b/public/01-deep-learning-on-supercomputers.html
@@ -850,27 +850,36 @@ submission file!</li>
 <ul>
 <li class="fragment"><div class="sourceCode" id="cb15"><pre
 class="sourceCode bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span>     train_loss  valid_loss  accuracy  top_k_accuracy  time    </span>
-<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span>         2.230926    2.414113    0.170986  0.654726        00:10                       </span>
+<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span>         2.242036    2.192690    0.201728  0.681148        00:10                      </span>
 <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="ex">epoch</span>     train_loss  valid_loss  accuracy  top_k_accuracy  time    </span>
-<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span>         1.986611    1.993477    0.298018  0.790142        00:06                       </span>
+<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ex">0</span>         2.035004    2.084082    0.246189  0.748984        00:05                      </span>
-<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="ex">1</span>         1.954962    2.180505    0.249238  0.765498        00:06                       </span>
+<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="ex">1</span>         1.981432    2.054528    0.247205  0.764482        00:05                      </span>
-<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a><span class="ex">2</span>         1.915481    2.004775    0.301829  0.803354        00:06                       </span>
+<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a><span class="ex">2</span>         1.942930    1.918441    0.316057  0.821138        00:05                      </span>
-<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span>         1.853237    1.827811    0.364583  0.837906        00:06                       </span>
+<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="ex">3</span>         1.898426    1.832725    0.370173  0.839431        00:05                      </span>
-<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span>         1.783993    1.779548    0.391768  0.847307        00:06                       </span>
+<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="ex">4</span>         1.859066    1.781805    0.375508  0.858740        00:05                      </span>
-<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span>         1.718417    1.642507    0.422002  0.884909        00:06  </span></code></pre></div></li>
+<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a><span class="ex">5</span>         1.820968    1.743448    0.394055  0.864583        00:05</span>
+<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a><span class="ex">real</span>    1m15.651s    </span></code></pre></div></li>
 </ul>
 </section>
 <section id="some-insights-1" class="slide level2">
 <h2>Some insights</h2>
 <ul>
-<li class="fragment">It’s faster per epoch, but not by much (6 seconds
+<li class="fragment">It’s faster per epoch, but not by much (5 seconds
 vs 8 seconds)</li>
 <li class="fragment">Accuracy and loss suffered</li>
-<li class="fragment">This is a very simple model, so it’s not
+<li class="fragment">This is a very simple model, so it’s not surprising
-surprising</li>
+<ul>
+<li class="fragment">It fits into 4gb, we “stretched” it to a 320gb
+system</li>
+</ul></li>
 <li class="fragment">You need bigger models to really exercise the gpu
 and scaling</li>
-<li class="fragment">There’s a lot more to that</li>
+<li class="fragment">There’s a lot more to that, but for now, let’s
+focus on medium/big sized models
+<ul>
+<li class="fragment">For Gigantic and Humongous-sized models, there’s a
+DL scaling course at JSC!</li>
+</ul></li>
 </ul>
 </section>
 <section id="thats-all-folks" class="slide level2">