Yulong Chen (@yulongchen1010) Twitter Tweets • TwiCopy

Sasha Rush

5 months ago

Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week. simons.berkeley.edu/workshops/futu…

thumb_up_off_alt528

chat_bubble_outline4

repeat92

shareShare

Curious how LLMs tackle planning tasks, such as travel and computer use? Our new survey #PlanGenLLMs (arxiv.org/abs/2502.11221) builds on classic work by Kartam and Wilkins (1990) and examines 6 key metrics to compare today's top planning systems. Your next agentic workflow

thumb_up_off_alt34

chat_bubble_outline0

repeat4

shareShare

Andrew Lampinen

@andrewlampinen

4 months ago

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

thumb_up_off_alt751

chat_bubble_outline7

repeat146

shareShare

Fei Liu @ #ICLR2025

@feiliu_nlp

4 months ago

Just got back from the #ICLR2025 trip, grateful for the conversations, questions, and inspiring talks. Thought I'd share a few reflections from the conference (not exhaustive, just things that stuck with me). 1. Can reasoning learned from code/math generalize to all problems?

thumb_up_off_alt140

chat_bubble_outline1

repeat17

shareShare

Andreas Vlachos

@vlachos_nlp

4 months ago

The call for papers for the 8th FEVERworkshop at #ACL is out: fever.ai/workshop.html Deadline for is on May 19th! And if you have a paper already reviewed in ARR, you can commit it until June 9th!

thumb_up_off_alt13

chat_bubble_outline1

repeat9

shareShare

Kyunghyun Cho

@kchonyc

4 months ago

if you want to know more, see * lecture note: arxiv.org/abs/2505.03861 * syllabus: docs.google.com/document/d/1On…

thumb_up_off_alt122

chat_bubble_outline0

repeat17

shareShare

Sheng Zhang

@sheng_zh

4 months ago

🧠Excited to present X-Reasoner — a 7B vision-language model post-trained for reasoning purely on general-domain text, without any images or domain-specific data. X-Reasoner achieves the state of the art 🏆 on challenging multimodal tasks (e.g., 43.0 on MMMU-Pro) and medical

thumb_up_off_alt38

chat_bubble_outline2

repeat8

shareShare

Kenneth Li

@ke_li_2021

4 months ago

🧵1/ Everyone says toxic data = bad models. But what if more toxic data could help us build less toxic models? Our new paper explores this paradox. Here’s what we found 👇

thumb_up_off_alt537

chat_bubble_outline10

repeat65

shareShare

CambridgeNLP

@cambridgenlp

4 months ago

Andreas Vlachos Zifeng Ding Rami Aly Rui Cao Yulong Chen Zhenyun Deng are organising the 8th FEVERworkshop with Mubashara Akhtar Christos Christodoulopoulos Oana Cocarascu Zhijiang Guo Arpit Mittal Michael Schlichtkrull James Thorne Chenxi. Submit 19 May/Commit 9 June. fever.ai/workshop.html

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Andreas Vlachos

@vlachos_nlp

4 months ago

And here are the results of the shared task: fever.ai/task.html which focused on systems able to fact-check a claim with evidence from a search engine, using open weights LLMs on a single GPU in under a minute! Submit papers and come to FEVER to find out more!

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

Caiqi Zhang

@caiqizh

3 months ago

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇

thumb_up_off_alt39

chat_bubble_outline2

repeat22

shareShare

Zhaochen Su

@suzhaochen0110

3 months ago

To further boost the "think with images" community, we've systematically summarized the latest research in our new repository: github.com/zhaochen0110/A… 🧠🖼️Let's make LVLMs see & think! A comprehensive survey paper will be released soon! Stay tuned.

thumb_up_off_alt67

chat_bubble_outline2

repeat18

shareShare

Ruizhe Li

@liruizhe94

3 months ago

🤔Is it possible to accurately and effectively attribute RAG response to relevant context without finetuning or further training surrogate model? 💡We propose an inference-time method called ARC-JSD using JSD for RAG context attribution, which only needs O(sent_num + 1)🚀

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Qingxiu Dong

@qx_dong

3 months ago

⏰ We introduce Reinforcement Pre-Training (RPT🍒) — reframing next-token prediction as a reasoning task using RLVR ✅ General-purpose reasoning 📑 Scalable RL on web corpus 📈 Stronger pre-training + RLVR results 🚀 Allow allocate more compute on specific tokens

thumb_up_off_alt920

chat_bubble_outline28

repeat147

shareShare

Xiao Liang

@mastervito0601

3 months ago

🙋‍♂️ Can RL training address model weaknesses without external distillation? 🚀 Please check our latest work on RL for LLM reasoning! 💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses. 📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4

thumb_up_off_alt134

chat_bubble_outline7

repeat38

shareShare

Ming Zhong

@mingzhong_

3 months ago

Thrilled to share our new reasoning model, Polaris✨! The 4B version achieves a score of 79.4 on AIME 2025, surpassing Claude 4 Opus (75.5) We’re releasing the full RL recipe, data, and weights 🔓 — see all the details below

thumb_up_off_alt42

chat_bubble_outline1

repeat6

shareShare

Jianhao (Elliott) Yan

@yan_elliott

2 months ago

Thrilled to announce our paper has been accepted for an Oral Presentation at ACL 2025 (8% of accepted papers)! 🎉 arxiv.org/pdf/2410.09338 We dive deep into why current #LLM editing methods often fail robustness tests and propose a solution. #ACL2025 #LLMs #ModelEditing 1/X

thumb_up_off_alt16

chat_bubble_outline2

repeat5

shareShare

Ari Holtzman

@universeinanegg

2 months ago

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163

thumb_up_off_alt160

chat_bubble_outline6

repeat30

shareShare

Yulong Chen

@yulongchen1010

a month ago

There is one phd position in NLP/LLM at HIT, Shenzhen, China. Please see the ad below!

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Yulong Chen

@yulongchen1010

3 days ago

Can LLMs learn a new language using only a grammar book and a dictionary like how human adult L-2 learners do? Check our in-progress paper! The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang arxiv.org/pdf/2509.00425

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare