Andrea Zanette (@zanette_ai) 's Twitter Profile
Andrea Zanette

@zanette_ai

Assistant professor at CMU@ECE

ID: 1761961649593049088

calendar_today26-02-2024 03:49:32

16 Tweet

499 Followers

437 Following

Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited.

Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot.

🧵 1/n
Yifei Zhou (@yifeizhou02) 's Twitter Profile Photo

📢LLM and RL folks! 📢 No good RL algorithm for credit assignment for multi-turn LLM agents on reasoning-heavy tasks? Do not even have a good benchmark for studying it? In SWEET-RL, we give you both (a vibe coding benchmark and SWEET algorithm). A thread 🧵(1/n)

📢LLM and RL folks! 📢 No good RL algorithm for credit assignment for multi-turn LLM agents on reasoning-heavy tasks? Do not even have a good benchmark for studying it?

In SWEET-RL, we give you both (a vibe coding benchmark and SWEET algorithm). A thread 🧵(1/n)
Jacob Springer (@jacspringer) 's Twitter Profile Photo

Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9

Training with more data = better LLMs, right? 🚨

False! Scaling language models by adding more pre-training data can decrease your performance after post-training!

Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇

1/9
Yuda Song @ ICLR 2025 (@yus167) 's Twitter Profile Photo

In the main conference, on Saturday, I will present our paper on the scientific study about every aspect of LLM self-improvement: pre-training, post-training and test-time inferencing. Check out my previous threads for more details. x.com/yus167/status/…

Yuda Song @ ICLR 2025 (@yus167) 's Twitter Profile Photo

On Sunday at the FM-Wild Workshop, I will present our recent work with Zhaoyi Zhou and Andrea Zanette, on accelerating LLM evaluation, with a statistically principled method! Come to the workshop or check out arxiv.org/abs/2502.10563.

Intology (@intologyai) 's Twitter Profile Photo

🇸🇬✈️Come check out Zochi's work at #ICLR2025 — and a big congrats for their first citation 😉🎉 We thank the workshop organizers for approving the work & inviting our reps to present on Zochi's behalf. Locations, times, & and more details below 🧵👇

🇸🇬✈️Come check out Zochi's work at #ICLR2025 — and a big congrats for their first citation 😉🎉

We thank the workshop organizers for approving the work & inviting our reps to present on Zochi's behalf.

Locations, times, & and more details below

🧵👇
Intology (@intologyai) 's Twitter Profile Photo

The 1st fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Zochi, the 1st PhD-level agent. Beta open.

Andy Zhou (@zhouandy_) 's Twitter Profile Photo

Announcing the first fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Several groups have shown AI-generated work at workshops, but main conference acceptance is a far higher bar. While workshops

Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?

Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!

🧵 1/n
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

This is pretty remarkable – AI systems learning to self-improve We're seeing a wave of research where AI isn't just learning from human feedback, it's starting to figure out how to improve itself using its own internal signals A subtle but profound shift.

This is pretty remarkable – AI systems learning to self-improve

We're seeing a wave of research where AI isn't just learning from human feedback, it's starting to figure out how to improve itself using its own internal signals

A subtle but profound shift.
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

"Can Large Reasoning Models Self-Train?" A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT human labels - just learning from their own consistency. Early results rival models trained on ground-truth answers.

"Can Large Reasoning Models Self-Train?"

A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT human labels - just learning from their own consistency.

Early results rival models trained on ground-truth answers.
Murtaza Dalal (@mihdalal) 's Twitter Profile Photo

This is really great work by Fahim and co, moving out of the regime where we have ground truth rewards is critical for the next level of RL scaling in LLMs

Yifei Zhou (@yifeizhou02) 's Twitter Profile Photo

SCA is the first self-improvement rl framework for general multi-turn tool-use agents. It does so by first generating its own verifiers for its own synthetic tasks. Stay tuned for more details!