Andrea Zanette (@zanette_ai) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n

thumb_up_off_alt451

chat_bubble_outline5

repeat92

shareShare

Yifei Zhou

@yifeizhou02

4 months ago

📢LLM and RL folks! 📢 No good RL algorithm for credit assignment for multi-turn LLM agents on reasoning-heavy tasks? Do not even have a good benchmark for studying it? In SWEET-RL, we give you both (a vibe coding benchmark and SWEET algorithm). A thread 🧵(1/n)

thumb_up_off_alt379

chat_bubble_outline3

repeat80

shareShare

Jacob Springer

@jacspringer

4 months ago

Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9

thumb_up_off_alt790

chat_bubble_outline16

repeat173

shareShare

Yuda Song @ ICLR 2025

@yus167

3 months ago

In the main conference, on Saturday, I will present our paper on the scientific study about every aspect of LLM self-improvement: pre-training, post-training and test-time inferencing. Check out my previous threads for more details. x.com/yus167/status/…

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Yuda Song @ ICLR 2025

@yus167

3 months ago

On Sunday at the FM-Wild Workshop, I will present our recent work with Zhaoyi Zhou and Andrea Zanette, on accelerating LLM evaluation, with a statistically principled method! Come to the workshop or check out arxiv.org/abs/2502.10563.

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Intology

@intologyai

3 months ago

🇸🇬✈️Come check out Zochi's work at #ICLR2025 — and a big congrats for their first citation 😉🎉 We thank the workshop organizers for approving the work & inviting our reps to present on Zochi's behalf. Locations, times, & and more details below 🧵👇

thumb_up_off_alt17

chat_bubble_outline2

repeat5

shareShare

Intology

@intologyai

2 months ago

The 1st fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Zochi, the 1st PhD-level agent. Beta open.

thumb_up_off_alt655

chat_bubble_outline36

repeat134

shareShare

Andy Zhou

@zhouandy_

2 months ago

Announcing the first fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Several groups have shown AI-generated work at workshops, but main conference acceptance is a far higher bar. While workshops

thumb_up_off_alt257

chat_bubble_outline17

repeat31

shareShare

Fahim Tajwar

@fahimtajwar10

2 months ago

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

thumb_up_off_alt819

chat_bubble_outline20

repeat136

shareShare

alphaXiv

@askalphaxiv

2 months ago

This is pretty remarkable – AI systems learning to self-improve We're seeing a wave of research where AI isn't just learning from human feedback, it's starting to figure out how to improve itself using its own internal signals A subtle but profound shift.

thumb_up_off_alt581

chat_bubble_outline15

repeat112

shareShare

alphaXiv

@askalphaxiv

2 months ago

"Can Large Reasoning Models Self-Train?" A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT human labels - just learning from their own consistency. Early results rival models trained on ground-truth answers.

thumb_up_off_alt321

chat_bubble_outline8

repeat68

shareShare

Murtaza Dalal

@mihdalal

2 months ago

This is really great work by Fahim and co, moving out of the regime where we have ground truth rewards is critical for the next level of RL scaling in LLMs

thumb_up_off_alt26

chat_bubble_outline1

repeat5

shareShare

Yifei Zhou

@yifeizhou02

a month ago

SCA is the first self-improvement rl framework for general multi-turn tool-use agents. It does so by first generating its own verifiers for its own synthetic tasks. Stay tuned for more details!

thumb_up_off_alt69

chat_bubble_outline0

repeat11

shareShare