diff --git a/index.md b/index.md index 0792556f2742773b631c26296f3bcf7b37c1bd94..ff5411319e49666ec6e6c895e74e120c6c382fba 100644 --- a/index.md +++ b/index.md @@ -1,8 +1,8 @@ --- author: Alexandre Strube -title: LLMs in 2025 { width=550px } +title: LLMs in (March)2025<BR> { width=350px } subtitle: Open source is the new black -date: January 30th, 2025 +date: March 26th, 2025 --- ## Website @@ -55,17 +55,21 @@ date: January 30th, 2025 ## Huawei +- Probably the most sanctioned company in the world - Selling AI Chips - Ascend 910C: claimed to be on par with Nvidia's H100 - (In practice, their chips are closed to A100) - Already made by SMIC (previous models were made in Taiwan by TSMC) +- Has LLMs since 2023 on Huawei Cloud + - Can't download, but can fine-tune and download the finetuned models +- Servers in EU comply with EU regulations/AI act --- # China Telecom - Already has a 1 trillion parameter model using Huawei chips -- TeleChat2-115b +- TeleChat2-115b - Meanwhile, Deutsche Telekom gives me 16mbps in Jülich, like I had in 2005 in Brazil 😥 - Using **copper**, like the ancient aztecs or whatever 🗿𓂀𓋹𓁈𓃠𓆃𓅓𓆣 @@ -77,18 +81,22 @@ date: January 30th, 2025 # Baichuan AI +- Was the first model on Blablador after Llama/Vicuna in 2023 - Has a 1 trillion parameter model for a year already +- Baichuan 2 is Llama 2-level, Baichuan 4 (closed) is ChatGPT-4 level (china-only) --- # Baidu -- Doing a lot of AI reserch -- Ernie 4 from Oct 2023 (version 3 is open) +- Used to be their AI leader, playing catch-up + - Fell behind BECAUSE THEY ARE CLOSED SOURCE +- Doing a lot of AI research +- Ernie 4.5 from March 2025 (version 3 is open) - They swear they'll open them in June 30th(?) - Ernie-Health - Miaoda (no-code dev) - I-Rag and Wenxin Yige (text to image) -- [https://research.baidu.com/Blog/index-view?id=187](https://research.baidu.com/Blog/index-view?id=187) +- Ernie X1 is 50% of DeepSeek's price, same reasoning - 100,000+ GPU cluster**s** --- @@ -101,7 +109,7 @@ date: January 30th, 2025 - Yi Lightning is #12 on LmArena as of 26.01.2025 - Yi VL, multimodal in June (biggest vision model available, 34b) - Yi Coder was the best code model until 09.2024 -- Went quiet after that (probably busy making money) +- Went quiet after that (probably busy making money, last update 11.2024) - [https://01.ai](https://01.ai) --- @@ -121,14 +129,14 @@ date: January 30th, 2025 ## Alibaba -- Qwen2 series - Qwen2.5-1M: 1m tokens for context -- Qwen2-VL: 72b parameters video model: can chat through camera, play games, control your phone etc -- Qwq and QVQ vision models +- Qwen2.5-VL: (21.03.2025) 32b parameters video model: can chat through camera, play games, control your phone etc - BLABLADOR: alias-code for Qwen2.5-coder - Better than Llama - Open weights on HuggingFace, modelscope and free inference too - 28.01.2025: Qwen 2.5 MAX released (only on their website and api) +- QwQ-32 is a reasoning model, available on Blablador (#13 on LM Arena) +- LHM: From picture to animated 3d in seconds --- @@ -271,8 +279,8 @@ date: January 30th, 2025 ## THEY DON'T STOP AND I CAN'T KEEP UP WITH THIS -- New model relased monday afternoon -- DeepSeek Janus Pro 7B +- New model relased in January +- DeepSeek Janus Pro 7B in 01.02 - It not only understands multi-modalities, but also generates them - Best model understanding images, best model generating images - [https://huggingface.co/deepseek-ai/Janus-Pro-7B](https://huggingface.co/deepseek-ai/Janus-Pro-7B) @@ -280,170 +288,122 @@ date: January 30th, 2025 --- -# What about 🇪🇺 - ---- - -- A lot of people tell me: "HuggingFace is French and is a great success" -- HuggingFace was based in NYC from the beginning - ---- +## THEY DON'T STOP AND I CAN'T KEEP UP WITH THIS - +- DeepSeekMath 7B (05.02.2024!!!) created "Group Relative Policy Optimization" (GRPO) for advanced math reasoning +- Now GRPO is widely used to improve Reinforcement Learning on general LLMs +- DeepSeek-V3-0324 came out monday, doesn't even have a README yet (post-training update?) --- -## 🇪🇺 +# The LLM ecosystem: 🇨🇦 -- Last Thursday, the French government released their LLM. +- Cohere AI + - Aya: group of models, from 8 to 32b parameters, multi-modal, multi langage etc + - Strube's opinion: they were not so good in 2024 + - C4AI Command A: 111b model + - Text-only, never tried --- - +# The Open LLM ecosystem: 🇺🇸 ---- - - +- Google: Gemini is closed and state-of-the-art, gemma is open and good +- Microsoft has tiny, bad ones (but I wouldn't bet against them - getting better) + - They are putting money on X's Grok +- Twitter/X has Grok-3 for paying customers, Grok-1 is enormous and "old" (from march) + - Colossus supercomputer has 200k gpus, aiming for 1M + - Grok is #1 on the leaderboard +- Apple is going their own way + using ChatGPT + - Falling behind? Who knows? --- - - ---- +# The LLM ecosystem: 🇺🇸 -(Side note: No model in blablador has ever said that the "square root of goat is one") +- Meta has Llama, and somehow it's making money out of it + - Training Llama4 with lessons learned from DeepSeek +- Anthropic is receiving billions from Amazon, MS, Pentagon etc but Claude is completely closed +- Amazon released its "Nova" models in February. 100% closed, interesting tiering (similar to blablador) +- Nvidia has a bunch of intersting stuff, like accelerated versions of popular models, and small speech/translation models among others --- -## 🇪🇺 - -- What Mistral doin'? -- What Helmholtz AI doin'? - ---- - - +# The "open" LLM ecosystem: 🇺🇸 ---- - -# FUTURE +- Outside of Academia, there's [OLMo](https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e7359222) from AllenAI + - Has training code, weights and data, all open +- [Intellect-1](https://www.primeintellect.ai/blog/intellect-1-release) was trained collaboratively + - Up to 112 H100 GPUs simultaneously + - They claim overall compute utilization of 83% across continents and 96% when training only in the USA + - Fully open +- [Nous Research](http://nousresearch.com) Hermes 3, announced in January 2025 +- Fine-tuned from Llama 3.1 on synthetic data +- Training live, on the interent --- -# CALL FOR ACTION - ---- +{width=600px} -## Call for action -- DeepSeek's techniques are *"democratizing AI"*® -- Training has become 20x faster -- WE CAN COMPETE -- OTOH, so does everyone else --- -## Call for action +# The LLM ecosystem 🇪🇺 -- European AI hardware? -- If we don't have it, fine, we buy 🇺🇸 and 🇨🇳 -- But we need software -- (Strube's opinion: we can do it) -- LAION is a good start -- TrustLLM is a good start -- OpenGPT-X is... well, it's open +- Mistral.ai just came with a new small model +- Fraunhofer/OpenGPT-X/JSC has Teuken-7b +- DE/NL/NO/DK: TrustLLM +- SE/ES/CZ/NL/FI/NO/IT: OpenEuroLLM --- -## What can I do? +# 🇪🇺 --- -# Blablador, the brand, as an umbrella for *ALL* inference at JSC +- A lot of people tell me: "HuggingFace is French and is a great success" +- HuggingFace was based in NYC from the beginning --- -## JSC Inference umbrella - -- Blablador as LLM is the first step -- Grow to include other types of models for Science and Industry + --- -## Use cases so far +## 🇪🇺 -- LLMs for science (e.g. OpenGPT-X, TrustLLM, CosmoSage) -- [Prithvi-EO-2.0](https://www.nas.nasa.gov/SC24/research/project27.php): Geospatial FM for EO (NASA, IBM, JSC). The 300M and 600M models will be released today -- Terramind: Multi-Modal Geospatial FM for EO (ESA Phi-Lab, JSC, DLR, IBM, KP Labs). Model will be released in spring 2025 -- [Helio](https://arxiv.org/abs/2410.10841) NASA’s FM model for Heliophysics, for understating complex solar phenomena (first study design of the model is available) -- JSC/ESA/DLR's upcomping model [FAST-EO](https://eo4society.esa.int/projects/fast-eo/) -- Health: Radiology with Aachen Uniklinik -- JSC/Cern/ECMWF's [Atmorep](https://www.atmorep.org) -- Open models: - - Pango weather - - Graphcast -- With privacy! +- Beginning of February, the French government released their LLM. --- - + --- -# Blablador is growing - -- I liked the brand because it's fun -- It's a dog -- And it's hard as hell to reproduce the image + --- -# Blablador is growing - -- We need something more professional -- To add to project proposals + --- -# BEHOLD +(Side note: No model in blablador has ever said that the "square root of goat is one") --- - - ---- ## Potato { width=350px } ---- - -## Open source - -- Outside of Academia, there's [OLMo](https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e7359222) from AllenAI - - Has training code, weights and data, all open -- [Intellect-1](https://www.primeintellect.ai/blog/intellect-1-release) was trained collaboratively - - Up to 112 H100 GPUs simultaneously - - They claim overall compute utilization of 83% across continents and 96% when training only in the USA - - Fully open - ---- - -# Nous Research - -- Announced yesterday -- Training live, on the interent -- Previous models were Llama finetuned on synthetic data - -{width=600px} - --- # Non-transformer architectures -- First one was Mamba in February +- First one was Mamba in August 2024 - Jamba from A21 in March - Last I checked, LFM 40B MoE from Liquid/MIT was the best one (from 30.09) - Performs well on benchmarks @@ -451,24 +411,12 @@ date: January 30th, 2025 - Some mathematical discussions about it being turing-complete (probably not) - Other example is Hymba from NVIDIA (tiny model, 1.5B) ---- - -# EU AI Act - -- "GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 10^25 floating point operations (FLOPs)" -- Moore's law has something to say about this -- TrustLLM and OpenGPT-X will have to comply - - To be used commercially -- Research-only models are exempt -- Bureaucratic shot in the foot: the EU AI Act will make it harder for EU models to compete internationally - - --- # Take the slides with you -{ width=500px } +{ width=500px } --- diff --git a/public/images/2025-eum.png b/public/images/2025-eum.png new file mode 100644 index 0000000000000000000000000000000000000000..e97109c8f016bc5825da81836cfd79cd2b0d1e63 Binary files /dev/null and b/public/images/2025-eum.png differ diff --git a/public/images/2025-hai-retreat-Details.png b/public/images/2025-hai-retreat-Details.png deleted file mode 100644 index feda6edcaf0c6a0524f7449175cda7efc42ac282..0000000000000000000000000000000000000000 Binary files a/public/images/2025-hai-retreat-Details.png and /dev/null differ diff --git a/public/images/blablador-ng.svg b/public/images/blablador-ng.svg index a05a8dbda456fe54bb593af9ee3a0b0d95f47f78..6713227d46526bc4e8e9ff60dde42199d34a911f 100644 --- a/public/images/blablador-ng.svg +++ b/public/images/blablador-ng.svg @@ -1,30 +1,32 @@ -<?xml version="1.0" encoding="UTF-8"?> +<?xml version="1.0" encoding="utf-8"?> <svg id="Layer_2" data-name="Layer 2" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 626.98 779.24"> <defs> <style> .cls-1 { fill: #ffefbf; } + /* Dark mode: invert SVG colors */ + @media (prefers-color-scheme: dark) { + svg { + filter: invert(1); + } + } </style> </defs> - <g id="Layer_1-2" data-name="Layer 1"> - <g> - <path class="cls-1" d="M496.92,521.49H131.03c-61.28,0-111.13-49.85-111.13-111.13V131.03c0-61.28,49.85-111.13,111.13-111.13h364.93c61.28,0,111.13,49.85,111.13,111.13v474.13l-110.17-83.66Z"/> - <circle cx="151.59" cy="178.57" r="50"/> - <circle cx="363.49" cy="178.57" r="50"/> - <path d="M495.96,0H131.03C58.66,0,0,58.66,0,131.03v279.33c0,72.36,58.66,131.03,131.03,131.03h359.19l136.76,103.85V131.03C626.98,58.66,568.32,0,495.96,0ZM587.2,409.52v155.55l-72.92-55.37-6.29-4.78V177.78h-39.79v323.82H131.03c-50.31,0-91.24-40.93-91.24-91.24V131.03c0-50.31,40.93-91.24,91.24-91.24h364.93c50.31,0,91.24,40.93,91.24,91.24v278.5Z"/> - <polygon points="191.23 289.2 191.23 328.99 235.04 328.99 235.04 386 269.68 386 269.68 328.99 313.49 328.99 313.49 289.2 191.23 289.2"/> - <g> - <path d="M28.96,677.94h38.8c21.88,0,31.57,11.57,31.57,25.38,0,12.56-6.87,17.9-12.16,20.16,7.35,3.07,15.61,10.32,15.61,24.11,0,19.24-14.47,30.21-35.72,30.21H28.96v-99.86ZM64.26,717.22c10.49,0,15.33-3.33,15.33-11.82s-5.62-11.95-14.31-11.95h-16.35v23.77h15.33ZM48.93,762.43h16.63c9.24,0,16.46-4.45,16.46-14.8,0-9.72-5.7-15-18-15h-15.1v29.8Z"/> - <path d="M113.92,777.8v-105.99h19.81v105.99h-19.81Z"/> - <path d="M208.35,758.26c0,5.1.33,14.19,1.04,19.54h-17.35c-.57-2.26-1.02-6.8-1.15-9.12-3.14,6.12-10.22,10.43-21.12,10.43-18.66,0-25.27-11.63-25.27-23.03,0-13.58,9.4-24.77,35.56-24.77h8.87v-5.48c0-6.17-1.91-10.95-10.76-10.95s-10.88,4.48-11.58,10.02h-19.17c.8-11.91,7.94-23.99,31.06-23.99,17.82,0,29.88,6.14,29.88,24.81v32.54ZM189.58,743.98h-9.17c-13.53,0-16.05,5.5-16.05,10.55s3.07,9.56,10.61,9.56c11.83,0,14.62-7.87,14.62-18.53v-1.59Z"/> - <path d="M224.08,671.82h19.81v38.75c3.21-4.78,9.1-9.65,20.85-9.65,19.47,0,29.12,16.74,29.12,36.73,0,24.23-10.78,41.47-31.91,41.47-10.89,0-15.78-4.59-18.31-8.63-.11,2.45-.27,4.85-.57,7.32h-19.28c.15-5.04.29-15.38.29-22.99v-82.99ZM273.86,738.45c0-12.39-4.59-21.34-15.5-21.34-11.85,0-15.55,7.93-15.55,22.45,0,15.57,3.97,23.61,15.48,23.61,10.65,0,15.57-9.52,15.57-24.73Z"/> - <path d="M305.27,777.8v-105.99h19.81v105.99h-19.81Z"/> - <path d="M399.69,758.26c0,5.1.33,14.19,1.04,19.54h-17.35c-.57-2.26-1.02-6.8-1.15-9.12-3.14,6.12-10.22,10.43-21.12,10.43-18.66,0-25.27-11.63-25.27-23.03,0-13.58,9.4-24.77,35.56-24.77h8.87v-5.48c0-6.17-1.91-10.95-10.76-10.95s-10.88,4.48-11.58,10.02h-19.17c.8-11.91,7.94-23.99,31.06-23.99,17.82,0,29.88,6.14,29.88,24.81v32.54ZM380.93,743.98h-9.17c-13.53,0-16.05,5.5-16.05,10.55s3.07,9.56,10.61,9.56c11.83,0,14.62-7.87,14.62-18.53v-1.59Z"/> - <path d="M480.59,671.82v84.96c0,8.8.13,16.62.29,21.02h-19.29c-.27-2.02-.68-6.45-.68-8.91-3.57,6.57-9.19,10.35-21.18,10.35-19.22,0-29.17-15.7-29.17-39.01s12.14-39.32,32.48-39.32c11.23,0,16.07,3.85,17.74,6.87v-35.96h19.81ZM430.81,739.54c0,14.13,4.76,23.65,15.46,23.65,12.92,0,15.43-8.77,15.43-23.5,0-16.15-3.23-22.6-14.81-22.6-9.76,0-16.08,7.42-16.08,22.45Z"/> - <path d="M564.07,739.03c0,22.01-10.52,40.2-36.16,40.2s-35.5-19.39-35.5-39.78c0-18.79,10.64-38.54,36.51-38.54,24.3,0,35.15,17.67,35.15,38.12ZM512.53,738.96c0,16.09,5.56,24.75,16.12,24.75,9.92,0,15.42-8.6,15.42-24.54,0-14.96-5.16-22.73-15.97-22.73-10.03,0-15.57,8.74-15.57,22.52Z"/> - <path d="M575.31,727.91c0-12.39-.02-20.36-.29-25.54h19.29c.4,3.26.53,8.05.53,13.23,2.54-6.33,9.16-14.39,22.83-14.43v20.02c-15.55-.25-22.56,5.01-22.56,22.33v34.28h-19.81v-49.89Z"/> - </g> - </g> + <path class="cls-1" d="M496.92,521.49H131.03c-61.28,0-111.13-49.85-111.13-111.13V131.03c0-61.28,49.85-111.13,111.13-111.13h364.93c61.28,0,111.13,49.85,111.13,111.13v474.13l-110.17-83.66Z" style="fill: rgb(255, 255, 255);"/> + <circle cx="151.59" cy="178.57" r="50"/> + <circle cx="363.49" cy="178.57" r="50"/> + <path d="M495.96,0H131.03C58.66,0,0,58.66,0,131.03v279.33c0,72.36,58.66,131.03,131.03,131.03h359.19l136.76,103.85V131.03C626.98,58.66,568.32,0,495.96,0ZM587.2,409.52v155.55l-72.92-55.37-6.29-4.78V177.78h-39.79v323.82H131.03c-50.31,0-91.24-40.93-91.24-91.24V131.03c0-50.31,40.93-91.24,91.24-91.24h364.93c50.31,0,91.24,40.93,91.24,91.24v278.5Z"/> + <polygon points="191.23 289.2 191.23 328.99 235.04 328.99 235.04 386 269.68 386 269.68 328.99 313.49 328.99 313.49 289.2 191.23 289.2"/> + <g> + <path d="M28.96,677.94h38.8c21.88,0,31.57,11.57,31.57,25.38,0,12.56-6.87,17.9-12.16,20.16,7.35,3.07,15.61,10.32,15.61,24.11,0,19.24-14.47,30.21-35.72,30.21H28.96v-99.86ZM64.26,717.22c10.49,0,15.33-3.33,15.33-11.82s-5.62-11.95-14.31-11.95h-16.35v23.77h15.33ZM48.93,762.43h16.63c9.24,0,16.46-4.45,16.46-14.8,0-9.72-5.7-15-18-15h-15.1v29.8Z"/> + <path d="M113.92,777.8v-105.99h19.81v105.99h-19.81Z"/> + <path d="M208.35,758.26c0,5.1.33,14.19,1.04,19.54h-17.35c-.57-2.26-1.02-6.8-1.15-9.12-3.14,6.12-10.22,10.43-21.12,10.43-18.66,0-25.27-11.63-25.27-23.03,0-13.58,9.4-24.77,35.56-24.77h8.87v-5.48c0-6.17-1.91-10.95-10.76-10.95s-10.88,4.48-11.58,10.02h-19.17c.8-11.91,7.94-23.99,31.06-23.99,17.82,0,29.88,6.14,29.88,24.81v32.54ZM189.58,743.98h-9.17c-13.53,0-16.05,5.5-16.05,10.55s3.07,9.56,10.61,9.56c11.83,0,14.62-7.87,14.62-18.53v-1.59Z"/> + <path d="M224.08,671.82h19.81v38.75c3.21-4.78,9.1-9.65,20.85-9.65,19.47,0,29.12,16.74,29.12,36.73,0,24.23-10.78,41.47-31.91,41.47-10.89,0-15.78-4.59-18.31-8.63-.11,2.45-.27,4.85-.57,7.32h-19.28c.15-5.04.29-15.38.29-22.99v-82.99ZM273.86,738.45c0-12.39-4.59-21.34-15.5-21.34-11.85,0-15.55,7.93-15.55,22.45,0,15.57,3.97,23.61,15.48,23.61,10.65,0,15.57-9.52,15.57-24.73Z"/> + <path d="M305.27,777.8v-105.99h19.81v105.99h-19.81Z"/> + <path d="M399.69,758.26c0,5.1.33,14.19,1.04,19.54h-17.35c-.57-2.26-1.02-6.8-1.15-9.12-3.14,6.12-10.22,10.43-21.12,10.43-18.66,0-25.27-11.63-25.27-23.03,0-13.58,9.4-24.77,35.56-24.77h8.87v-5.48c0-6.17-1.91-10.95-10.76-10.95s-10.88,4.48-11.58,10.02h-19.17c.8-11.91,7.94-23.99,31.06-23.99,17.82,0,29.88,6.14,29.88,24.81v32.54ZM380.93,743.98h-9.17c-13.53,0-16.05,5.5-16.05,10.55s3.07,9.56,10.61,9.56c11.83,0,14.62-7.87,14.62-18.53v-1.59Z"/> + <path d="M480.59,671.82v84.96c0,8.8.13,16.62.29,21.02h-19.29c-.27-2.02-.68-6.45-.68-8.91-3.57,6.57-9.19,10.35-21.18,10.35-19.22,0-29.17-15.7-29.17-39.01s12.14-39.32,32.48-39.32c11.23,0,16.07,3.85,17.74,6.87v-35.96h19.81ZM430.81,739.54c0,14.13,4.76,23.65,15.46,23.65,12.92,0,15.43-8.77,15.43-23.5,0-16.15-3.23-22.6-14.81-22.6-9.76,0-16.08,7.42-16.08,22.45Z"/> + <path d="M564.07,739.03c0,22.01-10.52,40.2-36.16,40.2s-35.5-19.39-35.5-39.78c0-18.79,10.64-38.54,36.51-38.54,24.3,0,35.15,17.67,35.15,38.12ZM512.53,738.96c0,16.09,5.56,24.75,16.12,24.75,9.92,0,15.42-8.6,15.42-24.54,0-14.96-5.16-22.73-15.97-22.73-10.03,0-15.57,8.74-15.57,22.52Z"/> + <path d="M575.31,727.91c0-12.39-.02-20.36-.29-25.54h19.29c.4,3.26.53,8.05.53,13.23,2.54-6.33,9.16-14.39,22.83-14.43v20.02c-15.55-.25-22.56,5.01-22.56,22.33v34.28h-19.81v-49.89Z"/> </g> </svg> \ No newline at end of file diff --git a/public/index.html b/public/index.html index 7ebae60d366d7515e094b83d718113b2b7448bd9..b9023a32d4d1ce4d84dca2fc0bbefd57af0dc953 100644 --- a/public/index.html +++ b/public/index.html @@ -4,7 +4,7 @@ <meta charset="utf-8"> <meta name="generator" content="pandoc"> <meta name="author" content="Alexandre Strube"> - <title>LLMs in 2025 </title> + <title>LLMs in (March)2025 </title> <meta name="apple-mobile-web-app-capable" content="yes"> <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui"> @@ -160,11 +160,11 @@ <div class="slides"> <section id="title-slide"> - <h1 class="title">LLMs in 2025 <img data-src="images/blablador.png" -width="550" /></h1> + <h1 class="title">LLMs in (March)2025<BR> <img +data-src="images/blablador-ng.svg" width="350" /></h1> <p class="subtitle">Open source is the new black</p> <p class="author">Alexandre Strube</p> - <p class="date">January 30th, 2025</p> + <p class="date">March 26th, 2025</p> </section> <section class="slide level1"> @@ -233,12 +233,21 @@ University)</figcaption> <h2 id="huawei">Huawei</h2> <ul> +<li class="fragment">Probably the most sanctioned company in the +world</li> <li class="fragment">Selling AI Chips</li> <li class="fragment">Ascend 910C: claimed to be on par with Nvidia’s H100</li> <li class="fragment">(In practice, their chips are closed to A100)</li> <li class="fragment">Already made by SMIC (previous models were made in Taiwan by TSMC)</li> +<li class="fragment">Has LLMs since 2023 on Huawei Cloud +<ul> +<li class="fragment">Can’t download, but can fine-tune and download the +finetuned models</li> +</ul></li> +<li class="fragment">Servers in EU comply with EU regulations/AI +act</li> </ul> </section> <section id="china-telecom" class="slide level1"> @@ -260,20 +269,29 @@ aztecs or whatever 🗿𓂀𓋹𓁈𓃠𓆃𓅓𓆣</li> <section id="baichuan-ai" class="slide level1"> <h1>Baichuan AI</h1> <ul> +<li class="fragment">Was the first model on Blablador after Llama/Vicuna +in 2023</li> <li class="fragment">Has a 1 trillion parameter model for a year already</li> +<li class="fragment">Baichuan 2 is Llama 2-level, Baichuan 4 (closed) is +ChatGPT-4 level (china-only)</li> </ul> </section> <section id="baidu" class="slide level1"> <h1>Baidu</h1> <ul> -<li class="fragment">Doing a lot of AI reserch</li> -<li class="fragment">Ernie 4 from Oct 2023 (version 3 is open)</li> +<li class="fragment">Used to be their AI leader, playing catch-up +<ul> +<li class="fragment">Fell behind BECAUSE THEY ARE CLOSED SOURCE</li> +</ul></li> +<li class="fragment">Doing a lot of AI research</li> +<li class="fragment">Ernie 4.5 from March 2025 (version 3 is open) - +They swear they’ll open them in June 30th(?)</li> <li class="fragment">Ernie-Health</li> <li class="fragment">Miaoda (no-code dev)</li> <li class="fragment">I-Rag and Wenxin Yige (text to image)</li> -<li class="fragment"><a -href="https://research.baidu.com/Blog/index-view?id=187">https://research.baidu.com/Blog/index-view?id=187</a></li> +<li class="fragment">Ernie X1 is 50% of DeepSeek’s price, same +reasoning</li> <li class="fragment">100,000+ GPU cluster<strong>s</strong></li> </ul> </section> @@ -291,8 +309,8 @@ benchmarks</li> <li class="fragment">Yi VL, multimodal in June (biggest vision model available, 34b)</li> <li class="fragment">Yi Coder was the best code model until 09.2024</li> -<li class="fragment">Went quiet after that (probably busy making -money)</li> +<li class="fragment">Went quiet after that (probably busy making money, +last update 11.2024)</li> <li class="fragment"><a href="https://01.ai">https://01.ai</a></li> </ul> </section> @@ -314,17 +332,18 @@ Llama, Grok, Qwen etc)</li> <h2 id="alibaba">Alibaba</h2> <ul> -<li class="fragment">Qwen2 series</li> <li class="fragment">Qwen2.5-1M: 1m tokens for context</li> -<li class="fragment">Qwen2-VL: 72b parameters video model: can chat -through camera, play games, control your phone etc</li> -<li class="fragment">Qwq and QVQ vision models</li> +<li class="fragment">Qwen2.5-VL: (21.03.2025) 32b parameters video +model: can chat through camera, play games, control your phone etc</li> <li class="fragment">BLABLADOR: alias-code for Qwen2.5-coder</li> <li class="fragment">Better than Llama</li> <li class="fragment">Open weights on HuggingFace, modelscope and free inference too</li> <li class="fragment">28.01.2025: Qwen 2.5 MAX released (only on their website and api)</li> +<li class="fragment">QwQ-32 is a reasoning model, available on Blablador +(#13 on LM Arena)</li> +<li class="fragment">LHM: From picture to animated 3d in seconds</li> </ul> </section> <section class="slide level1"> @@ -501,8 +520,8 @@ fathom</li> <h2 id="they-dont-stop-and-i-cant-keep-up-with-this">THEY DON’T STOP AND I CAN’T KEEP UP WITH THIS</h2> <ul> -<li class="fragment">New model relased monday afternoon</li> -<li class="fragment">DeepSeek Janus Pro 7B</li> +<li class="fragment">New model relased in January</li> +<li class="fragment">DeepSeek Janus Pro 7B in 01.02</li> <li class="fragment">It not only understands multi-modalities, but also generates them</li> <li class="fragment">Best model understanding images, best model @@ -512,208 +531,165 @@ href="https://huggingface.co/deepseek-ai/Janus-Pro-7B">https://huggingface.co/de <li class="fragment"><img data-src="images/janus-8.jpg" /></li> </ul> </section> -<section id="what-about" class="slide level1"> -<h1>What about 🇪🇺</h1> -</section> <section class="slide level1"> +<h2 id="they-dont-stop-and-i-cant-keep-up-with-this-1">THEY DON’T STOP +AND I CAN’T KEEP UP WITH THIS</h2> <ul> -<li class="fragment">A lot of people tell me: “HuggingFace is French and -is a great success”</li> -<li class="fragment">HuggingFace was based in NYC from the -beginning</li> +<li class="fragment">DeepSeekMath 7B (05.02.2024!!!) created “Group +Relative Policy Optimization” (GRPO) for advanced math reasoning</li> +<li class="fragment">Now GRPO is widely used to improve Reinforcement +Learning on general LLMs</li> +<li class="fragment">DeepSeek-V3-0324 came out monday, doesn’t even have +a README yet (post-training update?)</li> </ul> </section> -<section class="slide level1"> - -<p><img data-src="images/deepseek-vs-eu.jpg" /></p> -</section> -<section class="slide level1"> - -<h2 id="section-1">🇪🇺</h2> +<section id="the-llm-ecosystem" class="slide level1"> +<h1>The LLM ecosystem: 🇨🇦</h1> +<ul> +<li class="fragment">Cohere AI <ul> -<li class="fragment">Last Thursday, the French government released their -LLM.</li> +<li class="fragment">Aya: group of models, from 8 to 32b parameters, +multi-modal, multi langage etc +<ul> +<li class="fragment">Strube’s opinion: they were not so good in +2024</li> +</ul></li> +<li class="fragment">C4AI Command A: 111b model +<ul> +<li class="fragment">Text-only, never tried</li> +</ul></li> +</ul></li> </ul> </section> -<section class="slide level1"> - -<p><img data-src="images/two-days-later.jpg" /></p> -</section> -<section class="slide level1"> - -<p><img data-src="images/message-lucie.jpg" /></p> -</section> -<section class="slide level1"> - -<p><img data-src="images/message-lucie-2.png" /></p> +<section id="the-open-llm-ecosystem" class="slide level1"> +<h1>The Open LLM ecosystem: 🇺🇸</h1> +<ul> +<li class="fragment">Google: Gemini is closed and state-of-the-art, +gemma is open and good</li> +<li class="fragment">Microsoft has tiny, bad ones (but I wouldn’t bet +against them - getting better) +<ul> +<li class="fragment">They are putting money on X’s Grok</li> +</ul></li> +<li class="fragment">Twitter/X has Grok-3 for paying customers, Grok-1 +is enormous and “old” (from march) +<ul> +<li class="fragment">Colossus supercomputer has 200k gpus, aiming for +1M</li> +<li class="fragment">Grok is #1 on the leaderboard</li> +</ul></li> +<li class="fragment">Apple is going their own way + using ChatGPT +<ul> +<li class="fragment">Falling behind? Who knows?</li> +</ul></li> +</ul> </section> -<section class="slide level1"> - -<p>(Side note: No model in blablador has ever said that the “square root -of goat is one”)</p> +<section id="the-llm-ecosystem-1" class="slide level1"> +<h1>The LLM ecosystem: 🇺🇸</h1> +<ul> +<li class="fragment">Meta has Llama, and somehow it’s making money out +of it +<ul> +<li class="fragment">Training Llama4 with lessons learned from +DeepSeek</li> +</ul></li> +<li class="fragment">Anthropic is receiving billions from Amazon, MS, +Pentagon etc but Claude is completely closed</li> +<li class="fragment">Amazon released its “Nova” models in February. 100% +closed, interesting tiering (similar to blablador)</li> +<li class="fragment">Nvidia has a bunch of intersting stuff, like +accelerated versions of popular models, and small speech/translation +models among others</li> +</ul> </section> -<section class="slide level1"> - -<h2 id="section-2">🇪🇺</h2> +<section id="the-open-llm-ecosystem-1" class="slide level1"> +<h1>The “open” LLM ecosystem: 🇺🇸</h1> +<ul> +<li class="fragment">Outside of Academia, there’s <a +href="https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e7359222">OLMo</a> +from AllenAI +<ul> +<li class="fragment">Has training code, weights and data, all open</li> +</ul></li> +<li class="fragment"><a +href="https://www.primeintellect.ai/blog/intellect-1-release">Intellect-1</a> +was trained collaboratively <ul> -<li class="fragment">What Mistral doin’?</li> -<li class="fragment">What Helmholtz AI doin’?</li> +<li class="fragment">Up to 112 H100 GPUs simultaneously</li> +<li class="fragment">They claim overall compute utilization of 83% +across continents and 96% when training only in the USA</li> +<li class="fragment">Fully open</li> +</ul></li> +<li class="fragment"><a href="http://nousresearch.com">Nous Research</a> +Hermes 3, announced in January 2025</li> +<li class="fragment">Fine-tuned from Llama 3.1 on synthetic data</li> +<li class="fragment">Training live, on the interent</li> </ul> </section> <section class="slide level1"> -<p><video data-src="images/IMG_6426.mov" controls=""><a -href="images/IMG_6426.mov">Video</a></video></p> -</section> -<section id="future" class="slide level1"> -<h1>FUTURE</h1> -</section> -<section id="call-for-action" class="slide level1"> -<h1>CALL FOR ACTION</h1> +<p><img data-src="images/nous-distro.png" width="600" /></p> </section> -<section class="slide level1"> - -<h2 id="call-for-action-1">Call for action</h2> +<section id="the-llm-ecosystem-2" class="slide level1"> +<h1>The LLM ecosystem 🇪🇺</h1> <ul> -<li class="fragment">DeepSeek’s techniques are <em>“democratizing -AI”</em>®</li> -<li class="fragment">Training has become 20x faster</li> -<li class="fragment">WE CAN COMPETE</li> -<li class="fragment">OTOH, so does everyone else</li> +<li class="fragment">Mistral.ai just came with a new small model</li> +<li class="fragment">Fraunhofer/OpenGPT-X/JSC has Teuken-7b</li> +<li class="fragment">DE/NL/NO/DK: TrustLLM</li> +<li class="fragment">SE/ES/CZ/NL/FI/NO/IT: OpenEuroLLM</li> </ul> </section> +<section id="section-1" class="slide level1"> +<h1>🇪🇺</h1> +</section> <section class="slide level1"> -<h2 id="call-for-action-2">Call for action</h2> <ul> -<li class="fragment">European AI hardware?</li> -<li class="fragment">If we don’t have it, fine, we buy 🇺🇸 and 🇨🇳</li> -<li class="fragment">But we need software</li> -<li class="fragment">(Strube’s opinion: we can do it)</li> -<li class="fragment">LAION is a good start</li> -<li class="fragment">TrustLLM is a good start</li> -<li class="fragment">OpenGPT-X is… well, it’s open</li> +<li class="fragment">A lot of people tell me: “HuggingFace is French and +is a great success”</li> +<li class="fragment">HuggingFace was based in NYC from the +beginning</li> </ul> </section> <section class="slide level1"> -<h2 id="what-can-i-do">What can I do?</h2> -</section> -<section -id="blablador-the-brand-as-an-umbrella-for-all-inference-at-jsc" -class="slide level1"> -<h1>Blablador, the brand, as an umbrella for <em>ALL</em> inference at -JSC</h1> +<p><img data-src="images/deepseek-vs-eu.jpg" /></p> </section> <section class="slide level1"> -<h2 id="jsc-inference-umbrella">JSC Inference umbrella</h2> +<h2 id="section-2">🇪🇺</h2> <ul> -<li class="fragment">Blablador as LLM is the first step</li> -<li class="fragment">Grow to include other types of models for Science -and Industry</li> +<li class="fragment">Beginning of February, the French government +released their LLM.</li> </ul> </section> <section class="slide level1"> -<h2 id="use-cases-so-far">Use cases so far</h2> -<ul> -<li class="fragment">LLMs for science (e.g. OpenGPT-X, TrustLLM, -CosmoSage)</li> -<li class="fragment"><a -href="https://www.nas.nasa.gov/SC24/research/project27.php">Prithvi-EO-2.0</a>: -Geospatial FM for EO (NASA, IBM, JSC). The 300M and 600M models will be -released today</li> -<li class="fragment">Terramind: Multi-Modal Geospatial FM for EO (ESA -Phi-Lab, JSC, DLR, IBM, KP Labs). Model will be released in spring -2025</li> -<li class="fragment"><a -href="https://arxiv.org/abs/2410.10841">Helio</a> NASA’s FM model for -Heliophysics, for understating complex solar phenomena (first study -design of the model is available)</li> -<li class="fragment">JSC/ESA/DLR’s upcomping model <a -href="https://eo4society.esa.int/projects/fast-eo/">FAST-EO</a></li> -<li class="fragment">Health: Radiology with Aachen Uniklinik</li> -<li class="fragment">JSC/Cern/ECMWF’s <a -href="https://www.atmorep.org">Atmorep</a></li> -<li class="fragment">Open models: -<ul> -<li class="fragment">Pango weather</li> -<li class="fragment">Graphcast</li> -</ul></li> -<li class="fragment">With privacy!</li> -</ul> +<p><img data-src="images/two-days-later.jpg" /></p> </section> <section class="slide level1"> -<figure> -<img data-src="images/IMG_6551.jpg" alt="What do we have to do?" /> -<figcaption aria-hidden="true">What do we have to do?</figcaption> -</figure> -</section> -<section id="blablador-is-growing" class="slide level1"> -<h1>Blablador is growing</h1> -<ul> -<li class="fragment">I liked the brand because it’s fun</li> -<li class="fragment">It’s a dog</li> -<li class="fragment">And it’s hard as hell to reproduce the image</li> -</ul> -</section> -<section id="blablador-is-growing-1" class="slide level1"> -<h1>Blablador is growing</h1> -<ul> -<li class="fragment">We need something more professional</li> -<li class="fragment">To add to project proposals</li> -</ul> -</section> -<section id="behold" class="slide level1"> -<h1>BEHOLD</h1> +<p><img data-src="images/message-lucie.jpg" /></p> </section> <section class="slide level1"> -<p><img data-src="images/blablador-ng.svg" /></p> +<p><img data-src="images/message-lucie-2.png" /></p> </section> <section class="slide level1"> -<h2 id="potato">Potato</h2> -<p><img data-src="images/IMG_6561.jpg" width="350" /></p> +<p>(Side note: No model in blablador has ever said that the “square root +of goat is one”)</p> </section> <section class="slide level1"> -<h2 id="open-source">Open source</h2> -<ul> -<li class="fragment">Outside of Academia, there’s <a -href="https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e7359222">OLMo</a> -from AllenAI -<ul> -<li class="fragment">Has training code, weights and data, all open</li> -</ul></li> -<li class="fragment"><a -href="https://www.primeintellect.ai/blog/intellect-1-release">Intellect-1</a> -was trained collaboratively -<ul> -<li class="fragment">Up to 112 H100 GPUs simultaneously</li> -<li class="fragment">They claim overall compute utilization of 83% -across continents and 96% when training only in the USA</li> -<li class="fragment">Fully open</li> -</ul></li> -</ul> -</section> -<section id="nous-research" class="slide level1"> -<h1>Nous Research</h1> -<ul> -<li class="fragment">Announced yesterday</li> -<li class="fragment">Training live, on the interent</li> -<li class="fragment">Previous models were Llama finetuned on synthetic -data</li> -</ul> -<p><img data-src="images/nous-distro.png" width="600" /></p> +<h2 id="potato">Potato</h2> +<p><img data-src="images/IMG_6561.jpg" width="350" /></p> </section> <section id="non-transformer-architectures" class="slide level1"> <h1>Non-transformer architectures</h1> <ul> -<li class="fragment">First one was Mamba in February</li> +<li class="fragment">First one was Mamba in August 2024</li> <li class="fragment">Jamba from A21 in March</li> <li class="fragment">Last I checked, LFM 40B MoE from Liquid/MIT was the best one (from 30.09)</li> @@ -725,29 +701,12 @@ turing-complete (probably not)</li> 1.5B)</li> </ul> </section> -<section id="eu-ai-act" class="slide level1"> -<h1>EU AI Act</h1> -<ul> -<li class="fragment">“GPAI models present systemic risks when the -cumulative amount of compute used for its training is greater than 10^25 -floating point operations (FLOPs)”</li> -<li class="fragment">Moore’s law has something to say about this</li> -<li class="fragment">TrustLLM and OpenGPT-X will have to comply -<ul> -<li class="fragment">To be used commercially</li> -</ul></li> -<li class="fragment">Research-only models are exempt</li> -<li class="fragment">Bureaucratic shot in the foot: the EU AI Act will -make it harder for EU models to compete internationally</li> -</ul> -</section> <section id="take-the-slides-with-you" class="slide level1"> <h1>Take the slides with you</h1> <figure> -<img data-src="images/2025-hai-retreat-Details.png" width="500" -alt="https://go.fzj.de/2025-hai-retreat" /> -<figcaption -aria-hidden="true">https://go.fzj.de/2025-hai-retreat</figcaption> +<img data-src="images/2025-eum.png" width="500" +alt="https://go.fzj.de/2025-eum" /> +<figcaption aria-hidden="true">https://go.fzj.de/2025-eum</figcaption> </figure> </section> <section class="slide level1">