Yulong Chen (@yulongchen1010) 's Twitter Profile
Yulong Chen

@yulongchen1010

Postdoc @cambridgenlp and @FitzwilliamColl | Interned @MSFTResearch ×3 | he/him 🌈 cylnlp.github.io

ID: 924621477873405953

calendar_today29-10-2017 12:58:33

510 Tweet

1,1K Followers

1,1K Following

Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week. simons.berkeley.edu/workshops/futu…

Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week.

simons.berkeley.edu/workshops/futu…
Fei Liu @ #ICLR2025 (@feiliu_nlp) 's Twitter Profile Photo

Curious how LLMs tackle planning tasks, such as travel and computer use? Our new survey #PlanGenLLMs (arxiv.org/abs/2502.11221) builds on classic work by Kartam and Wilkins (1990) and examines 6 key metrics to compare today's top planning systems. Your next agentic workflow

Curious how LLMs tackle planning tasks, such as travel and computer use? Our new survey #PlanGenLLMs (arxiv.org/abs/2502.11221) builds on classic work by Kartam and Wilkins (1990) and examines 6 key metrics to compare today's top planning systems. 

Your next agentic workflow
Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/
Fei Liu @ #ICLR2025 (@feiliu_nlp) 's Twitter Profile Photo

Just got back from the #ICLR2025 trip, grateful for the conversations, questions, and inspiring talks. Thought I'd share a few reflections from the conference (not exhaustive, just things that stuck with me). 1. Can reasoning learned from code/math generalize to all problems?

Andreas Vlachos (@vlachos_nlp) 's Twitter Profile Photo

The call for papers for the 8th FEVERworkshop at #ACL is out: fever.ai/workshop.html Deadline for is on May 19th! And if you have a paper already reviewed in ARR, you can commit it until June 9th!

Sheng Zhang (@sheng_zh) 's Twitter Profile Photo

🧠Excited to present X-Reasoner — a 7B vision-language model post-trained for reasoning purely on general-domain text, without any images or domain-specific data. X-Reasoner achieves the state of the art 🏆 on challenging multimodal tasks (e.g., 43.0 on MMMU-Pro) and medical

🧠Excited to present X-Reasoner — a 7B vision-language model post-trained for reasoning purely on general-domain text, without any images or domain-specific data. 
X-Reasoner achieves the state of the art 🏆 on challenging multimodal tasks (e.g., 43.0 on MMMU-Pro) and medical
Kenneth Li (@ke_li_2021) 's Twitter Profile Photo

🧵1/ Everyone says toxic data = bad models. But what if more toxic data could help us build less toxic models? Our new paper explores this paradox. Here’s what we found 👇

🧵1/
Everyone says toxic data = bad models.
But what if more toxic data could help us build less toxic models?
Our new paper explores this paradox. Here’s what we found 👇
Andreas Vlachos (@vlachos_nlp) 's Twitter Profile Photo

And here are the results of the shared task: fever.ai/task.html which focused on systems able to fact-check a claim with evidence from a search engine, using open weights LLMs on a single GPU in under a minute! Submit papers and come to FEVER to find out more!

Caiqi Zhang (@caiqizh) 's Twitter Profile Photo

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation.

🤩No sampling. No slow post-hoc methods. Not limited to short-form QA!

‼️Just output confidence in a single decoding pass.

✅Better calibration!
🚀 20× faster runtime.

arXiv:2505.23912
👇
Zhaochen Su (@suzhaochen0110) 's Twitter Profile Photo

To further boost the "think with images" community, we've systematically summarized the latest research in our new repository: github.com/zhaochen0110/A… 🧠🖼️Let's make LVLMs see & think! A comprehensive survey paper will be released soon! Stay tuned.

To further boost the "think with images" community, we've systematically summarized the latest research in our new repository: github.com/zhaochen0110/A…
🧠🖼️Let's make LVLMs see & think!
A comprehensive survey paper will be released soon! Stay tuned.
Ruizhe Li (@liruizhe94) 's Twitter Profile Photo

🤔Is it possible to accurately and effectively attribute RAG response to relevant context without finetuning or further training surrogate model? 💡We propose an inference-time method called ARC-JSD using JSD for RAG context attribution, which only needs O(sent_num + 1)🚀

🤔Is it possible to accurately and effectively attribute RAG response to relevant context without finetuning or further training surrogate model?

💡We propose an inference-time method called ARC-JSD using JSD for RAG context attribution, which only needs O(sent_num + 1)🚀
Qingxiu Dong (@qx_dong) 's Twitter Profile Photo

⏰ We introduce Reinforcement Pre-Training (RPT🍒) — reframing next-token prediction as a reasoning task using RLVR ✅ General-purpose reasoning 📑 Scalable RL on web corpus 📈 Stronger pre-training + RLVR results 🚀 Allow allocate more compute on specific tokens

⏰ We introduce Reinforcement Pre-Training (RPT🍒)  

 — reframing next-token prediction as a reasoning task using RLVR  

✅ General-purpose reasoning 
📑 Scalable RL on web corpus
📈 Stronger pre-training + RLVR results
🚀 Allow allocate more compute on specific tokens
Xiao Liang (@mastervito0601) 's Twitter Profile Photo

🙋‍♂️ Can RL training address model weaknesses without external distillation? 🚀 Please check our latest work on RL for LLM reasoning! 💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses. 📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4

🙋‍♂️ Can RL training address model weaknesses without external distillation?

🚀 Please check our latest work on RL for LLM reasoning!

💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses.

📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4
Ming Zhong (@mingzhong_) 's Twitter Profile Photo

Thrilled to share our new reasoning model, Polaris✨! The 4B version achieves a score of 79.4 on AIME 2025, surpassing Claude 4 Opus (75.5) We’re releasing the full RL recipe, data, and weights 🔓 — see all the details below

Jianhao (Elliott) Yan (@yan_elliott) 's Twitter Profile Photo

Thrilled to announce our paper has been accepted for an Oral Presentation at ACL 2025 (8% of accepted papers)! 🎉 arxiv.org/pdf/2410.09338 We dive deep into why current #LLM editing methods often fail robustness tests and propose a solution. #ACL2025 #LLMs #ModelEditing 1/X

Thrilled to announce our paper has been accepted for an Oral Presentation at ACL 2025 (8% of accepted papers)! 🎉 arxiv.org/pdf/2410.09338

We dive deep into why current #LLM editing methods often fail robustness tests and propose a solution. #ACL2025 #LLMs  #ModelEditing 

1/X
Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163

Yulong Chen (@yulongchen1010) 's Twitter Profile Photo

Can LLMs learn a new language using only a grammar book and a dictionary like how human adult L-2 learners do? Check our in-progress paper! The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang arxiv.org/pdf/2509.00425