Zico Kolter (@zicokolter) 's Twitter Profile
Zico Kolter

@zicokolter

Professor and Head of Machine Learning Department at @CarnegieMellon. Board member @OpenAI. Chief Technical Advisor @GraySwanAI. Chief Expert @BoschGlobal.

ID: 841499391508779008

linkhttp://zicokolter.com calendar_today14-03-2017 04:01:04

617 Tweet

21,21K Followers

645 Following

Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited.

Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot.

🧵 1/n
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

1/Being in academia is such a privilege: You get to collaborate with insanely talented & passionate students on their journey to upskill themselves. Very excited to share *OpenUnlearning*: a unified, easily extensible framework for unlearning led by Anmol Mekala Vineeth🧵

1/Being in academia is such a privilege: You get to collaborate with insanely talented & passionate students on their journey to upskill themselves.

Very excited to share *OpenUnlearning*: a unified, easily extensible framework for unlearning led by <a href="/anmol_mekala/">Anmol Mekala</a> <a href="/VineethDorna/">Vineeth</a>🧵
Christina Baek (@_christinabaek) 's Twitter Profile Photo

Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N

Are current reasoning models optimal for test-time scaling? 🌠
No! Models make the same incorrect guess over and over again.

We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math!

1/N
CMU School of Computer Science (@scsatcmu) 's Twitter Profile Photo

Huge thank you to NVIDIA Data Center for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.

Huge thank you to <a href="/NVIDIADC/">NVIDIA Data Center</a> for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.
Christina Baek (@_christinabaek) 's Twitter Profile Photo

When we train models to do QA, are we robustly improving context dependency? No! In our ICLR Oral (Fri 11 AM), we show that if the base model knows the facts already, it shortcuts and learns to ignore the context completely! Visit us to learn more about knowledge conflicts 😀

When we train models to do QA, are we robustly improving context dependency? No!

In our ICLR Oral (Fri 11 AM), we show that if the base model knows the facts already, it shortcuts and learns to ignore the context completely! 

Visit us to learn more about knowledge conflicts 😀
Yutong (Kelly) He (@electronickale) 's Twitter Profile Photo

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

Looking forward to giving a talk this Friday OpenAI with Zhili Feng on some of our privacy & memorization research + how it applies to production LLMs! We've been gaining momentum on detecting, quantifying & erasing memorization; excited to explore its real-world impact!

Looking forward to giving a talk this Friday <a href="/OpenAI/">OpenAI</a> with <a href="/zhilifeng/">Zhili Feng</a> on some of our privacy &amp; memorization research + how it applies to production LLMs! 

We've been gaining momentum on detecting, quantifying &amp; erasing memorization;  excited to explore its real-world impact!
Runtian Zhai (@runtianzhai) 's Twitter Profile Photo

A shorter version of the first three chapters of my thesis is accepted by ICML 2025. It provides a quick start for those interested in learning about the contexture theory. Check it out: arxiv.org/abs/2505.01557

Zhengyang Geng (@zhengyanggeng) 's Twitter Profile Photo

Excited to share our work with my amazing collaborators, Goodeat, Xingjian Bai, Zico Kolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

Excited to share our work with my amazing collaborators, <a href="/Goodeat258/">Goodeat</a>, <a href="/SimulatedAnneal/">Xingjian Bai</a>, <a href="/zicokolter/">Zico Kolter</a>, and Kaiming.

In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,
YixuanEvenXu (@yixuanevenxu) 's Twitter Profile Photo

✨ Did you know that NOT using all generated rollouts in GRPO can boost your reasoning LLM? Meet PODS! We down-sample rollouts and train on just a fraction, delivering notable gains over vanilla GRPO. (1/7)

✨ Did you know that NOT using all generated rollouts in GRPO can boost your reasoning LLM? Meet PODS! We down-sample rollouts and train on just a fraction, delivering notable gains over vanilla GRPO. (1/7)
Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: making it a habit to read about the ✨past ✨ and learn from it to make sense of the present

Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: 

making it a habit to read about the ✨past ✨ and learn from it to make sense of the present
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.

🚨Excited to release OS-Harm! 🚨

The safety of computer use agents has been largely overlooked. 

We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
Yiding Jiang (@yidingjiang) 's Twitter Profile Photo

A mental model I find useful: all data acquisition (web scrapes, synthetic data, RL rollouts, etc.) is really an exploration problem 🔍. This perspective has some interesting implications for where AI is heading. Wrote down some thoughts: yidingjiang.github.io/blog/post/expl…