Jing Xu (@jingxu_ml) Twitter Tweets • TwiCopy

Jing Xu

@jingxu_ml

a year ago

Drop by Halle C #802 11:30-13:00 and check out our Self Reward work!

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

🚨New paper!🚨 Meta-Rewarding LMs - LM is actor, judge & meta-judge - Learns to reward actions better by judging its own judgments (assigning *meta-rewards*) - Improves acting & judging over time without human labels ... beats Self-Rewarding LMs arxiv.org/abs/2407.19594 🧵(1/6)

thumb_up_off_alt397

chat_bubble_outline2

repeat75

shareShare

Jason Weston

@jaseweston

a year ago

🚨 Self-Consistency Preference Optimization (ScPO)🚨 - New self-training method without human labels - learn to make the model more consistent! - Works well for reasoning tasks where RMs fail to evaluate correctness. - Close to performance of supervised methods *without* labels,

thumb_up_off_alt442

chat_bubble_outline1

repeat106

shareShare

EMNLP 2025

@emnlpmeeting

a year ago

🚨 In 2025, EMNLP 2025 will take place in Suzhou, China from Nov. 5-9!

🚨 In 2025, <a href="/emnlpmeeting/">EMNLP 2025</a> will take place in Suzhou, China from Nov. 5-9!

thumb_up_off_alt332

chat_bubble_outline2

repeat65

shareShare

Jiao Sun

@sunjiao123sun_

a year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a>

We have ethical reviews for authors, but missed it for invited speakers? 😡

thumb_up_off_alt3,3K

chat_bubble_outline184

repeat837

shareShare

Jason Weston

@jaseweston

9 months ago

💀 Introducing RIP: Rejecting Instruction Preferences💀 A method to *curate* high quality data, or *create* high quality synthetic data. Large performance gains across benchmarks (AlpacaEval2, Arena-Hard, WildBench). Paper 📄: arxiv.org/abs/2501.18578

thumb_up_off_alt449

chat_bubble_outline1

repeat77

shareShare

Jeremy Howard

@jeremyphoward

9 months ago

omg

thumb_up_off_alt95

chat_bubble_outline5

repeat8

shareShare

Sainbayar Sukhbaatar

@tesatory

9 months ago

New work! Such a simple method. The gain is surprisingly large.

thumb_up_off_alt19

chat_bubble_outline0

repeat5

shareShare

Jing Xu

@jingxu_ml

9 months ago

New data selection & synthetic data creation method can dramatically improve model performance by filtering out 77% training examples!

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Jason Weston

@jaseweston

9 months ago

Olga Golovneva Tianhao Wu Weizhe Yuan Jing Xu @ICML2025 Sainbayar Sukhbaatar Ping (Iris) Yu 4/ 💀 We show RIP works across various data (WildChat, HelpSteer, Self-RIP), LLMs (Llama 3.1 8B or 3.3 70B) & reward models. We were surprised given how simple RIP is how well it works. Read the paper for more & hope you don't "reject" this research!🪦 📄arxiv.org/abs/2501.18578

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Zichen Liu @ ICLR2025

@zzlccc

9 months ago

🚨There May Not be Aha Moment in R1-Zero-like Training: oatllm.notion.site/oat-zero A common belief about the recent R1-Zero-like training is that self-reflections *emerge* as a result of RL training. We carefully investigated and showed the opposite. 🧵

thumb_up_off_alt472

chat_bubble_outline18

repeat72

shareShare

Anthropic

@anthropicai

8 months ago

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem. Can Claude play Pokémon? A thread:

thumb_up_off_alt8,8K

chat_bubble_outline318

repeat1,1K

shareShare

Jason Weston

@jaseweston

8 months ago

🚨 New Paper 🚨 An Overview of Large Language Models for Statisticians 📝: arxiv.org/abs/2502.17814 - Dual perspectives on Statistics ➕ LLMs: Stat for LLM & LLM for Stat - Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability,

thumb_up_off_alt228

chat_bubble_outline0

repeat58

shareShare

Jason Weston

@jaseweston

6 months ago

Google friends & ex-colleagues -- Google scholar seems pretty broken😔. Our most cited paper from last year "Self-Rewarding LLMs" has disappeared! Scholar has clustered it with another paper (SPIN) and it isn't in the search results. This is bad for PhD student & first author

thumb_up_off_alt72

chat_bubble_outline5

repeat10

shareShare

Archiki Prasad

@archikiprasad

6 months ago

🎉 Excited to share that my internship work, ScPO, on self-training LLMs to improve reasoning without human labels, has been accepted to #ICML2025! Many thanks to my awesome collaborators at AI at Meta and @uncnlp🌞Looking forward to presenting ScPO in Vancouver 🇨🇦

thumb_up_off_alt243

chat_bubble_outline9

repeat35

shareShare

Jason Weston

@jaseweston

5 months ago

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 - 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the

thumb_up_off_alt111

chat_bubble_outline2

repeat29

shareShare

Jason Weston

@jaseweston

4 months ago

🌉 Bridging Offline & Online RL for LLMs 🌉 📝: arxiv.org/abs/2506.21495 New paper shows on verifiable & non-verifiable tasks: - Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO

thumb_up_off_alt446

chat_bubble_outline1

repeat96

shareShare

Jing Xu

@jingxu_ml

4 months ago

Heading to ICML to present our work Rejecting Instruction Preference (RIP) for better data curation and synthesis on Wed 07/16 (4:30pm - 7:00pm)! Excited to connect with folks interested in synthetic data, reasoning, RL and anything in general@FAIR. #ICML2025

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Jason Weston

@jaseweston

3 months ago

🤖Introducing: CoT-Self-Instruct 🤖 📝: arxiv.org/abs/2507.23751 - Builds high-quality synthetic data via reasoning CoT + quality filtering - Gains on reasoning tasks: MATH500, AMC23, AIME24 & GPQA-💎 - Outperforms existing train data s1k & OpenMathReasoning - Gains on

thumb_up_off_alt382

chat_bubble_outline1

repeat65

shareShare

Teknium (e/λ)

@teknium1

3 months ago

I really love synthetic data arxiv.org/abs/2507.23751

thumb_up_off_alt409

chat_bubble_outline9

repeat42

shareShare