JSC’s Inference Infrastructure

Alexandre Strube

March 21, 2025

Take the slides with you

https://go.fzj.de/2025-03-course-hitec

Website

https://helmholtz-blablador.fz-juelich.de
  • Play around! 🐶

OUTLINE

  • Past
  • Present
  • Future

Blablador

  • /ˈblæblæˌdɔɹ/
  • Bla-bla-bla 🗣️ + Labrador 🐕‍🦺
  • A stage for deploying and testing large language models
  • Models change constantly (constantly improving rank, some good, some awful)
  • A mix of small, fast models and large, slower ones - changes constantly, keeps up with the state of the art
  • It is a web server, an api server, model runner, and training code.

Blablador

Past: Why?

  • AI is becoming basic infrastructure
  • Which historically is Open Source
  • We (as we in scientists) train a lot, deploy little: Here is your code/weights, tschüssi!
  • Little experience with dealing with LLMs
  • From the tools point of view, this is a FAST moving target 🎯💨
  • Acquire local experience in issues like
    • data loading,
    • quantization,
    • distribution,
    • fine-tune LLMs for specific tasks,
    • inference speed,
    • deployment

Why, part 2

  • Projects like OpenGPT-X, TrustLLM need a place to run
  • The usual: we want to be ready when the time comes
    • The time is now!
  • TL;DR: BECAUSE WE CAN! 🚀

Privacy is our selling point

  • No data collection at all. I don’t keep ANY data whatsoever!
    • You can use it AND keep your data private
    • No records? Privacy (and GDPR is happy)

Deployment as a service

  • Scientists from FZJ can deploy their models on their own hardware and point to blablador
  • This solves a bunch of headaches for researchers:
    • Authentication
    • Web server
    • Firewall
    • Availability
    • Etc
  • If you have a model and want to deploy it, contact me!

OpenAI-compatible API

  • Users import openai-python from OpenAI itself
  • All services which can use OpenAI’s API can use Blablador’s API (VSCode’s Continue.dev, LangChain, etc)
  • The API is not yet rate-limited or monitored (other than unique visits/day)

PRESENT

Usage

  • Web UI only
  • API usage wasn’t recorded until we moved to a new host, devs still migrating
  • Some healthy usage by tools e.g. B2DROP assistant: Around 400 requests/day (count as a single ip from B2DROP’s server)

The LLM ecosystem

  • Fastest changing field in the history of mankind
  • For open models: If it isn’t on huggingface, it doesn’t exist (except for 🇨🇳)
  • The “open” ecosystem is dominated by a few big players: Meta, Mistral.AI, Google, DeepSeek, QwenLM, Baidu
  • Meta has some of the best open frontier-level models; but no training data
  • Mistral is a close second - also no training data
  • Google has the best small models, and some of the best closed ones
  • AllenAI is catching up and it’s open - small player

The LLM ecosystem: 🇺🇸

  • Microsoft has tiny, bad ones (but I wouldn’t bet against them - getting better)
    • They are putting money on X’s Grok
  • Twitter/X has Grok-3 for paying customers, Grok-1 is enormous and “old” (from march)
    • Colossus supercomputer has 200k gpus, aiming for 1M
    • Grok is #1 on the leaderboard
  • Apple is going their own way + using ChatGPT
    • Falling behind? Who knows?
  • Anthropic is receiving billions from Amazon, MS, Pentagon etc but Claude is completely closed
  • Amazon released its “Nova” models yesterday. 100% closed, interesting tiering (similar to blablador)
  • Nvidia has a bunch of intersting stuff, like accelerated versions of popular models, and small speech/translation models among others

The LLM ecosystem 🇨🇳

  • DeepSeek made waves with R1, even though it’s the same as DeepSeek V3 from May/2024 + reasoning
    • Different from “Open”AI, they published the method, benefitting everyone
  • Baidu has Ernie and Wenxin, free to use, but closed and can’t be deployed
    • 100,000+ GPU clusters
  • AliBaba’s QwenAI has QwQ-32B-Preview (available on Blablador)
  • Tencent has a lot of text-to-3d, image-to-3d, and an open 389B LLM (Hunyiuan Large)

The LLM ecosystem 🇨🇳

  • Baichuan has many open models, and a 1T closed model
  • ByteDance’s Doubao is closed and good
  • 01.AI (or Yi) founded by Kai-Fu Lee (ex Apple, Sgi, MS, Google) has good open models

The LLM ecosystem 🇪🇺

  • Mistral.ai just came with a new small model
  • Fraunhofer/OpenGPT-X/JSC has Teuken-7b
  • DE/NL/NO/DK: TrustLLM
  • SE/ES/CZ/NL/FI/NO/IT: OpenEuroLLM

The LLM ecosystem

  • Evaluation is HARD
    • Benchmarks prove little
    • Absolute majority of new models are fine-tuned versions of existing ones…
    • On the benchmarks themselves!
      • Is this cheating?

Evaluating LLM models

  • LMSys’ arena: https://lmarena.ai
  • Doesn’t differentiate between open and closed
  • Open source gets buried under all private ones

Open Source?

  • Very few models are really open source
  • Most are “open source” in the sense that you can download the weights
  • Either the code isn’t available, or the data

Open Source

  • German academia: OpenGPT-X
    • Trained in Jülich and Dresden
    • For German businesses and academia
    • Yet unclear if training data will be open
  • EU: TrustLLM
    • Trained in Jülich
    • “Trustworthy” LLM
    • Says it will be fully open
  • Laion from FZJ (and others) is also open source
    • Provides datasets, audio, image, video, text encoder models
  • OpenEuroLLM: “Truly open” (there’s nothing out yet)

Open source

  • Outside of Academia, there’s OLMo from AllenAI
    • Has training code, weights and data, all open
  • Intellect-1 was trained collaboratively
    • Up to 112 H100 GPUs simultaneously
    • They claim overall compute utilization of 83% across continents and 96% when training only in the USA
    • Fully open

Nous Research

  • Announced yesterday
  • Training live, on the interent
  • Previous models were Llama finetuned on synthetic data

Non-transformer architectures

  • First one was Mamba in February
  • Jamba from A21 in March
  • Last I checked, LFM 40B MoE from Liquid/MIT was the best one (from 30.09)
  • Performs well on benchmarks
  • What about real examples?
  • Some mathematical discussions about it being turing-complete (probably not)
  • Other example is Hymba from NVIDIA (tiny model, 1.5B)

EU AI Act

  • “GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 10^25 floating point operations (FLOPs)”
  • Moore’s law has something to say about this
  • TrustLLM and OpenGPT-X will have to comply
    • To be used commercially
  • Research-only models are exempt
  • Bureaucratic shot in the foot: the EU AI Act will make it harder for EU models to compete internationally

Juelich Supercomputing Centre

Juwels BOOSTER

Humbler beginnings…

  • Small cluster inherited from other projects
  • Started small, with three nodes
  • Runs Slurm, has a parallel FS/Storage, uses EasyBuild
Haicluster
Haicluster
Haicluster

User demand is growing, we need more hardware

  • Currently around 300+ unique users/day on the website
  • API usage is higher, growing and heavier

Big models need big hardware

  • Llama 3.1 405b runs on WestAI nodes
    • Launched with UNICORE
  • QwQ runs there experimentally
Jureca-DC

It’s being used in the wild!

https://indico.desy.de/event/38849/contributions/162118/
Dmitriy Kostunin’s talk at LIPS24
Controlling a radiotelescope with AI agents
Controlling a radiotelescope with AI agents
cosmosage is deployed on Blablador!

It’s being used in the wild!

  • Someone reverse-engineered the API and created a python package

It’s being used in the wild!

https://git.geomar.de/everardo-gonzalez/blablador-python-bindings

It’s being used in the wild!

  • GEOMAR created a chatbot for their website
  • Scanned their material and created embeddings (RAG)
  • Calls Blablador’s API with embeddings and gets answers
  • Product of the NFDI hackathon DataXplorers // and beyond
  • It’s called TL;DR (Too Long; Didn’t Read) - Source code

It’s being used in the wild!

https://zenodo.org/records/10376144

It’s being used in the wild!

It’s being used in the wild!

  • EUDAT is a collection of data management services
  • Has an instance of NextCloud (File Share, Office, Calendar etc)
  • It’s integrating AI deeply into its services, backed by Blablador!

It’s being used in the wild!

It’s being used in the wild!

It’s being used in the wild!

  • FZJ’s IEK7 (Stratosphere) is also using Blablador on their IEK7Cloud

It’s being used in the wild!

https://github.com/haesleinhuepf/bia-bob

API Access

Demo: API

Demo: API

Demo: API

  • Go to /v1/models
  • Click on Try it out
  • Click on Execute

Demo: API

Demo: cURL

  • curl --header "Authorization: Bearer MY_TOKEN_GOES_HERE"   https://api.helmholtz-blablador.fz-juelich.de/v1/models

Demo: VScode + Continue.dev

  • Yes. It DOES run with Emacs too. Ask your favorite Emacs expert.
  • Yes, vim too!
  • Add continue.dev extension to VSCode
  • On Continue, choose to add model, choose Other OpenAI-compatible API
  • Click in Open Config.json at the end

Demo: VScode + Continue.dev

Demo: VScode + Continue.dev

  • Inside config.json, add at the "models" section:

  •  {
          "model": "AUTODETECT",
          "title": "Blablador",
          "apiKey": "glpat-YOURKEYHERE",
          "apiBase": "https://api.helmholtz-blablador.fz-juelich.de/v1",
          "provider": "openai"
        }
  • Try with the other models you got from the API!

Demo: VScode + Continue.dev

  • Select some code in a python file
  • Type Control-I (cmd-I on Mac) to edit the code, or Control-L to “talk” to blablador about this code
  • Ask Blablador to explain this code!
  • Can also fix, add tests, etc

Demo: VScode + Continue.dev

What can you do with it?

FUTURE

Pickle is hungry

Vision for the (near) future

Blablador, the brand, as an umbrella for ALL inference at JSC

JSC Inference umbrella

  • Blablador as LLM is the first step
  • Grow to include other types of models for Science and Industry

Use cases so far

  • LLMs for science (e.g. OpenGPT-X, TrustLLM, CosmoSage)
  • Prithvi-EO-2.0: Geospatial FM for EO (NASA, IBM, JSC). The 300M and 600M models will be released today
  • Terramind: Multi-Modal Geospatial FM for EO (ESA Phi-Lab, JSC, DLR, IBM, KP Labs). Model will be released in spring 2025
  • Helio NASA’s FM model for Heliophysics, for understating complex solar phenomena (first study design of the model is available)
  • JSC/ESA/DLR’s upcomping model FAST-EO
  • Health: Radiology with Aachen Uniklinik
  • JSC/Cern/ECMWF’s Atmorep
  • Open models:
    • Pango weather
    • Graphcast
  • With privacy!
What do we have to do?

Todo

  • Multi-modality: video, audio, text, images
  • Auto-RAG with privacy:
    • Easy to do badly. Hard to do securely.
  • Everything from the previous slide

Potato

Questions?

No dogs have been harmed for this presentation

Extra slides

LLMOps resource

A No-BS Database of How Companies Actually Deploy LLMs in Production: 300+ Technical Case Studies, Including Self-Hosted LLMs in https://www.zenml.io/llmops-database

“I think the complexity of Python package management holds down AI application development more than is widely appreciated. AI faces multiple bottlenecks — we need more GPUs, better algorithms, cleaner data in large quantities. But when I look at the day-to-day work of application builders, there’s one additional bottleneck that I think is underappreciated: The time spent wrestling with version management is an inefficiency I hope we can reduce.”

Andrew Ng, 28.02.2024

“Building on top of open source can mean hours wrestling with package dependencies, or sometimes even juggling multiple virtual environments or using multiple versions of Python in one application. This is annoying but manageable for experienced developers, but creates a lot of friction for new AI developers entering our field without a background in computer science or software engineering.”

Andrew Ng, 28.02.2024

Like the slides? Want to use them?

Gitlab link to source code of the slides (needs JUDOOR account)

https://gitlab.jsc.fz-juelich.de/strube1/2025-02-course-helmholtz-munich