diff --git a/index.md b/index.md index 42d81d33ed66940f5356174e6fcacd7c5978bf57..9e099df537c625b9fed6c928d5d083b0813fa796 100644 --- a/index.md +++ b/index.md @@ -200,11 +200,13 @@ date: December 4th, 2024 # Non-transformer architectures -- Last I checked, Jamba 1.5 was the best one +- First one was Mamba in February +- Jamba from A21 in March +- Last I checked, LFM 40B MoE from Liquid/MIT was the best one (from 30.09) - Performs well on benchmarks - What about real examples? - Some mathematical discussions about it being turing-complete (probably not) -- Other examples are Hymba from NVIDIA, Liquid from MIT +- Other example is Hymba from NVIDIA (tiny model, 1.5B) --- diff --git a/public/index.html b/public/index.html index a046cd1bc965424a7bb09ab89537e0c2bac700c4..e7d0b8c9a471a0381daf151e88fc59dccee18990 100644 --- a/public/index.html +++ b/public/index.html @@ -489,13 +489,16 @@ across continents and 96% when training only in the USA</li> <section id="non-transformer-architectures" class="slide level1"> <h1>Non-transformer architectures</h1> <ul> -<li class="fragment">Last I checked, Jamba 1.5 was the best one</li> +<li class="fragment">First one was Mamba in February</li> +<li class="fragment">Jamba from A21 in March</li> +<li class="fragment">Last I checked, LFM 40B MoE from Liquid/MIT was the +best one (from 30.09)</li> <li class="fragment">Performs well on benchmarks</li> <li class="fragment">What about real examples?</li> <li class="fragment">Some mathematical discussions about it being turing-complete (probably not)</li> -<li class="fragment">Other examples are Hymba from NVIDIA, Liquid from -MIT</li> +<li class="fragment">Other example is Hymba from NVIDIA (tiny model, +1.5B)</li> </ul> </section> <section id="eu-ai-act" class="slide level1">