💡 Tips: Niklas Muennighoff (@Muennighoff) / X

Niklas Muennighoff

265 posts

Niklas Muennighoff

@Muennighoff

Researching AI/LLMs @Stanford @cursor_ai

Joined May 2020

Pinned
Niklas Muennighoff
@Muennighoff
Feb 11, 2025
Last week we released s1 - our simple recipe for sample-efficient reasoning & test-time scaling. We’re releasing 𝐬𝟏.𝟏 trained on the 𝐬𝐚𝐦𝐞 𝟏𝐊 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 but performing much better by using r1 instead of Gemini traces. 60% on AIME25 I. Details in 🧵1/9
Niklas Muennighoff
@Muennighoff
Feb 3, 2025
DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393
158K
Niklas Muennighoff
@Muennighoff
Sep 23, 2024
Excited to start a PhD in AI @Stanford today🌲 Grateful for help from many people! In the LLM era, many rightly questioned me doing a PhD, but the points in @karpathy's great PhD Guide still hold i think. Regardless feel free to reach out if you have extra H100s😁 (or to collab!)
125K
Niklas Muennighoff
@Muennighoff
Feb 3, 2025
DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393
409K
Niklas Muennighoff
@Muennighoff
Sep 4, 2024
Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9
203K
Niklas Muennighoff
@Muennighoff
May 26, 2023
How to keep scaling Large Language Models when data runs out? 🎢 We train 400 models with up to 9B params & 900B tokens to create an extension of Chinchilla scaling laws for repeated data. Results are interesting… 🧐 📜: arxiv.org/abs/2305.16264 1/7
313K
Niklas Muennighoff
@Muennighoff
Aug 26, 2025
Can AI solve open problems in math, physics, coding, medical sciences & beyond? We collected unsolved questions (UQ) & tested frontier LLMs. Some solutions passed expert validation…
86K
Niklas Muennighoff
@Muennighoff
Feb 16, 2024
Introducing GRIT🦾to unify text embedding 🔢& generation 📝. GritLM is open SoTA on embedding (MTEB) & generative tasks (BBH etc.) – Both in 1 model. See 🧵for how GRIT🦾 makes RAG >60% faster & more 📜arxiv.org/abs/2402.09906 💻github.com/ContextualAI/g… 1/12
109K
Niklas Muennighoff
@Muennighoff
Nov 13, 2023
How to train LLMs for low-resource languages?🌏 Building on “Scaling Data-Constrained Language Models”, we train Finnish LLMs🇫🇮 w/ just 38B tokens by repeating for 8 epochs. Val loss is smooth📉 📜arxiv.org/abs/2311.05640 (EMNLP23) Led by @UniTurku 💙 w/ @AMD @huggingface ❤️
Niklas Muennighoff
@Muennighoff
May 26, 2023
How to keep scaling Large Language Models when data runs out? 🎢 We train 400 models with up to 9B params & 900B tokens to create an extension of Chinchilla scaling laws for repeated data. Results are interesting… 🧐 📜: arxiv.org/abs/2305.16264 1/7
118K
Niklas Muennighoff
@Muennighoff
Aug 15, 2023
How to instruction tune Code LLMs w/o #GPT4 data? Releasing 🐙🤖OctoCoder & OctoGeeX: 46.2 on HumanEval🌟SoTA🌟of commercial LLMs 🐙📚CommitPack: 4TB of Git Commits 🐙🎒HumanEvalPack: HumanEval extended to 3 tasks & 6 lang 📜arxiv.org/abs/2308.07124 💻github.com/bigcode-projec… 1/9
130K
Niklas Muennighoff
@Muennighoff
Nov 4, 2022
Crosslingual Generalization through Multitask Finetuning 🌸 Demo: huggingface.co/bigscience/blo… 📜 arxiv.org/abs/2211.01786 💻github.com/bigscience-wor… We present BLOOMZ & mT0, a family of models w/ up to 176B params that follow human instructions in >100 languages zero-shot. 1/7
Niklas Muennighoff
@Muennighoff
Jul 30, 2024
Launching the 1st Arena for Embedding Models: MTEB Arena🏟️ Vote @ hf.co/spaces/mteb/ar… ⚔️ 15 Models: @OpenAI @Google @cohere @Voyage_AI_ @JinaAI_ @SFResearch @nomic_ai E5 GritLM BGE.. 3 Tasks: Retrieval/Clustering/STS Deep dive with me on embeddings & the arena👇 🧵1/13
00:00
59K
Niklas Muennighoff
@Muennighoff
Jul 19, 2024
Information Retrieval (IR) is entering a new era where docs are retrieved based on intensive reasoning: Lvl 1: Keywords - BM25.. Lvl 2: Semantics - SBERT.. Lvl 3: 🧠 Reasoning - ❓ We release BRIGHT ✨to test reasoning-intensive retrieval; Lvl 1/2 methods underperform. 🧶 1/2
Hongjin Su
@hongjin_su
Jul 19, 2024
Retrieval benchmarks saturated? Introducing BRIGHT✨, a realistic and challenging benchmark that requires intensive reasoning to retrieve relevant documents. 🧠📚 Key features: 🔍Reasoning-intensive: Low keyword and semantic overlap between queries and documents. Intensive
31K
Niklas Muennighoff
@Muennighoff
Mar 4, 2025
Had a great time giving a talk on s1 at Microsoft GenAI! I enjoy talks most when they're not a monologue but rather a back-and-forth with new ideas that go beyond the paper. This was one of those thanks to an amazing audience with hard questions😅
21K
Niklas Muennighoff
@Muennighoff
May 13, 2025
Very excited to join @KnightHennessy scholars at Stanford🌲 Loved discussing the big goals other scholars are after — from driving Moore’s Law in biotech to preserving culture via 3D imaging. Personally, most excited about AI that can one day help us cure all diseases :)
KnightHennessy
@KnightHennessy
May 13, 2025
Meet the 2025 cohort of Knight-Hennessy scholars! These 84 scholars will join a diverse community of changemakers at Stanford to build lifelong friendships, deepen leadership skills, & collaborate with peers to address complex challenges facing the world. knight-hennessy.stanford.edu/news/knight-he…
15K