Xiangyu Qi (@xiangyuqi_pton) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Xin Eric Wang @ ICLR 2025

@xwang_lk

3 months ago

Do you still remember this image? The definition of “reasoning” has been significantly changed in the past two years.

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take

thumb_up_off_alt2,2K

chat_bubble_outline61

repeat289

shareShare

OpenAI Developers

@openaidevs

3 months ago

You can now connect GitHub repos to deep research in ChatGPT. 🐙 Ask a question and the deep research agent will read and search the repo’s source code and PRs, returning a detailed report with citations. Hit deep research → GitHub to get started.

thumb_up_off_alt8,8K

chat_bubble_outline239

repeat990

shareShare

Kenneth Li

@ke_li_2021

3 months ago

🧵1/ Everyone says toxic data = bad models. But what if more toxic data could help us build less toxic models? Our new paper explores this paradox. Here’s what we found 👇

thumb_up_off_alt537

chat_bubble_outline10

repeat65

shareShare

Peter Henderson

@peterhndrsn

2 months ago

House Energy and Commerce reconciliation text has language preempting all state AI regulations: "no state or political subdivision may enforce any law or regulation regulating artificial intelligence models, artificial intelligence systems, or automated decision systems during

thumb_up_off_alt19

chat_bubble_outline8

repeat8

shareShare

Peter Henderson

@peterhndrsn

2 months ago

So apparently someone from Anthropic submitted a report to the court... with a hallucinated citation by Claude.

thumb_up_off_alt36

chat_bubble_outline5

repeat4

shareShare

Princeton University

@princeton

2 months ago

Princeton engineers have identified a universal weakness in AI chatbots that allows users to bypass safety guardrails and elicit directions for malicious uses, from creating nerve gas to hacking government databases. bit.ly/3SzRto7

thumb_up_off_alt19

chat_bubble_outline0

repeat10

shareShare

Peter Henderson

@peterhndrsn

2 months ago

There are so many hallucinated citations in court nowadays, that I'm starting to put together a tracker. Check it out and feel free to send ones that I've missed along. New tabs coming for more categories of AI+Law cases!

thumb_up_off_alt15

chat_bubble_outline3

repeat4

shareShare

OpenAI

@openai

2 months ago

Introducing a research preview of Codex in ChatGPT openai.com/live

thumb_up_off_alt3,3K

chat_bubble_outline183

repeat475

shareShare

Xiangyu Qi

@xiangyuqi_pton

2 months ago

I did have the “feeling-the-agi” moment when using Codex. It works well even in the gigantic OpenAI repo.

thumb_up_off_alt22

chat_bubble_outline0

repeat0

shareShare

Flavio Adamo

@flavioad

2 months ago

I asked Codex to convert a legacy project from Python 2.7 to 3.11 and from Django 1.x to 5.0 It literally took 12 minutes If you know, that’s usually weeks of pain This is actually insane

thumb_up_off_alt3,3K

chat_bubble_outline139

repeat244

shareShare

Xiangyu Qi

@xiangyuqi_pton

2 months ago

😱

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

OpenAI

@openai

2 months ago

Sam & Jony introduce io

thumb_up_off_alt19,19K

chat_bubble_outline1,1K

repeat2,2K

shareShare

Anthony Peng

@realanthonypeng

2 months ago

🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving

thumb_up_off_alt75

chat_bubble_outline2

repeat17

shareShare

Xuandong Zhao

@xuandongzhao

2 months ago

🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

thumb_up_off_alt3,3K

chat_bubble_outline81

repeat505

shareShare

Pin-Yu Chen

@pinyuchentw

2 months ago

Your LLM Guard Model is secretly a reliable LLM-finetuning-guardrail! IBM Granite Guardian and LLAMA Guard are particularly suited to tracking harmful levels of fine-tuning data at the token level and making training adjustments during fine-tuning Paper: arxiv.org/abs/2505.17196

thumb_up_off_alt17

chat_bubble_outline0

repeat5

shareShare

Stella Li

@stellalisy

2 months ago

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Peter Henderson

@peterhndrsn

2 months ago

The next ~1-4 years will be taking the 2017-2020 years of Deep RL and scaling up: exploration, generalization, long-horizon tasks, credit assignment, continual learning, multi-agent interaction! Lots of cool work to be done! 🎮🤖 But we shouldn't forget big lessons from back

thumb_up_off_alt334

chat_bubble_outline6

repeat35

shareShare

Peter Henderson

@peterhndrsn

2 months ago

Our tracker of “hallucinated” or nonexistent citations in real-world legal contexts has reached over 140 cases across the world. There’s a notable spike in the last 6 months. 📈📈📈

thumb_up_off_alt14

chat_bubble_outline1

repeat6

shareShare

Ahmad Beirami @ ICLR 2025

@abeirami

2 months ago

So far, RL in LLMs has been "RL as a distillation method". - RL helped us distill great verifiers (e.g., code) in the model. - When models are better at verification than generation, we used RL to distill those abilities back to the model. That's about to change with agents!

thumb_up_off_alt26

chat_bubble_outline2

repeat2

shareShare