Pang Wei Koh (@pangweikoh) Twitter Tweets • TwiCopy

Ai2

4 months ago

With fresh support of $75M from U.S. National Science Foundation and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

With fresh support of $75M from <a href="/NSF/">U.S. National Science Foundation</a> and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

thumb_up_off_alt659

chat_bubble_outline31

repeat64

shareShare

Yi Tay

@yitayml

3 months ago

Had a really wonderful time hosting Jeff Dean, Quoc Le, benoit schillings and Denny Zhou in Singapore for the Google DeepMind Gemini Singapore 🇸🇬 event last week! 🔥 The event went super well imo, the vibes were on-point and an overwhelming number of people told me directly

Had a really wonderful time hosting <a href="/JeffDean/">Jeff Dean</a>, <a href="/quocleix/">Quoc Le</a>, <a href="/benoitschilling/">benoit schillings</a> and <a href="/denny_zhou/">Denny Zhou</a> in Singapore for the <a href="/GoogleDeepMind/">Google DeepMind</a> Gemini Singapore 🇸🇬 event last week! 🔥

The event went super well imo, the vibes were on-point and an overwhelming number of people told me directly

thumb_up_off_alt253

chat_bubble_outline10

repeat22

shareShare

Valentin Hofmann

@vjhofmann

3 months ago

📢 New #COLM2025 paper 📢 Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴 Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost. 🧵

thumb_up_off_alt193

chat_bubble_outline5

repeat40

shareShare

Ai2

@allen_ai

a month ago

Introducing OlmoEarth 🌍, state-of-the-art AI foundation models paired with ready-to-use open infrastructure to turn Earth data into clear, up-to-date insights within hours—not years.

thumb_up_off_alt558

chat_bubble_outline25

repeat100

shareShare

Zhiyuan Zeng

@zhiyuanzeng_

25 days ago

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵

thumb_up_off_alt446

chat_bubble_outline11

repeat109

shareShare

Pang Wei Koh

@pangweikoh

25 days ago

Two ideas here for scaling up RL for reasoning: 1. Procedurally generating (verifiable) problems lets us adapt difficulty to the model, making training more efficient 2. Teaching the model to reason by hand (e.g., sort numbers w/o code) generalizes to realistic reasoning tasks!

thumb_up_off_alt98

chat_bubble_outline2

repeat6

shareShare

Rulin Shao

@rulinshao

25 days ago

Chinese ancient wisdom from Confucius says 因材施教—adjust your way of teaching according to the student’ abilities. Check out amazing Zhiyuan Zeng’s work on applying such wisdom in RL!

thumb_up_off_alt68

chat_bubble_outline2

repeat4

shareShare

Tong Chen @ ICLR

@tomchen0

23 days ago

OpenAI's blog (openai.com/index/why-lang…) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with

thumb_up_off_alt665

chat_bubble_outline25

repeat124

shareShare

Rulin Shao

@rulinshao

18 days ago

🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model -

thumb_up_off_alt511

chat_bubble_outline7

repeat113

shareShare

Pang Wei Koh

@pangweikoh

18 days ago

We trained an open deep research model!🔍 The hard part is training signal -- deep research tasks are long-form with so many dimensions to what makes a good answer. We solve this thru RL with question-specific rubrics that co-evolve with the policy model. Check it out below!

thumb_up_off_alt85

chat_bubble_outline2

repeat12

shareShare

Pang Wei Koh

@pangweikoh

18 days ago

Unexpected benefit: We've been building tools to help doctors determine treatments for rare genetic diseases, which involves lots of searching -- a natural deep research task! Surprisingly, our 8B model generalizes well and can even match/outperform OpenAI DR on this OOD eval.

thumb_up_off_alt30

chat_bubble_outline0

repeat4

shareShare

John Hewitt

@johnhewtt

17 days ago

Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park

thumb_up_off_alt938

chat_bubble_outline12

repeat128

shareShare

Ai2

@allen_ai

16 days ago

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat296

shareShare

Hanna Hajishirzi

@hannahajishirzi

16 days ago

Introducing Olmo 3 and our entire model flow to build Olmo 3-Think and Olmo3-Instruct. Strong results, big improvements. Massive shoutout to the team who made it happen. Lots of exciting new things come with this release:

thumb_up_off_alt105

chat_bubble_outline6

repeat11

shareShare

Scott Geng

@scottgeng00

16 days ago

Super excited to release Olmo 3 🦕🐄! Wild to see my Delta Learning research go all the way from theory-land to becoming a core piece of the world’s best fully open model. It's good day to be a researcher 🥳

thumb_up_off_alt87

chat_bubble_outline3

repeat6

shareShare

Pang Wei Koh

@pangweikoh

16 days ago

Olmo3 models + paper are out, and with built-in tracing of model output back to the (fully open) training data!

thumb_up_off_alt49

chat_bubble_outline1

repeat4

shareShare

Serina Chang

@serinachang5

16 days ago

📢 Come work with me at UC Berkeley Berkeley AI Research! I’m recruiting PhD students in UC Berkeley EECS and UC Joint Computational Precision Health Program. I work on AI for social good, simulating humans with AI, human-AI interaction, and applications in public health & social science. serinachang5.github.io

📢 Come work with me at UC Berkeley <a href="/berkeley_ai/">Berkeley AI Research</a>! I’m recruiting PhD students in <a href="/Berkeley_EECS/">UC Berkeley EECS</a> and <a href="/UCJointCPH/">UC Joint Computational Precision Health Program</a>. I work on AI for social good, simulating humans with AI, human-AI interaction, and applications in public health & social science.

serinachang5.github.io

thumb_up_off_alt437

chat_bubble_outline7

repeat90

shareShare

Michael Noukhovitch, gonna be @ICLR 2025

@mnoukhov

16 days ago

Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. Stella Li proves this with spurious rewards: RL trained on a random reward signal can't improve on the evals, unlike some previous setups

Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. <a href="/StellaLisy/">Stella Li</a> proves this with spurious rewards: RL trained on a random reward signal can't improve on the evals, unlike some previous setups

thumb_up_off_alt59

chat_bubble_outline4

repeat7

shareShare

finbarr

@finbarrtimbers

16 days ago

One of the coolest parts of working at Ai2 is seeing how strong the PhD students are.

thumb_up_off_alt27

chat_bubble_outline1

repeat1

shareShare

Luca Soldaini ✈️ ICLR 25

@soldni

16 days ago

Thread of appreciation for a few of the students and interns that made Olmo 3 special (just the ones i was fortunate to work with! all Ai2 interns are great!!) 🧵

thumb_up_off_alt104

chat_bubble_outline3

repeat11

shareShare