Shangbin Feng (@shangbinfeng) Twitter Tweets • TwiCopy

Ruiqi Zhong

6 months ago

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD!

I pioneered using LLMs to explain dataset&model. It's used by interp at <a href="/OpenAI/">OpenAI</a> and societal impact <a href="/AnthropicAI/">Anthropic</a>

Tutorial here. It's a great direction & someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P

thumb_up_off_alt523

chat_bubble_outline27

repeat37

shareShare

Weijia Shi

@weijiashi2

6 months ago

Curious about what affects the scaling behaviors of foundation models in neuroscience? Check out Linxing Preston Jiang work below

thumb_up_off_alt32

chat_bubble_outline0

repeat4

shareShare

youngjoongkwon

@youngjoongkwon

6 months ago

Starting this Fall, I’ll be joining the Computer Science Department at Emory University as an assistant professor. My group will focus on 3D computer vision and graphics, working on human digitization, 3D/4D reconstruction, manipulation, generative AI, AR/VR, and telepresence.

thumb_up_off_alt275

chat_bubble_outline25

repeat9

shareShare

Wenhao Yu

@wyu_nd

6 months ago

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs. 📊 13,331 examples across 5 tasks: – Visual RAG – Many-shot ICL – Needle-in-a-haystack – VL Summarization – Long-document VQA 📏 Lengths: 8 / 16 / 32 / 64 / 128K 🔍 Benchmarking both thoroughly & effectively!

thumb_up_off_alt126

chat_bubble_outline3

repeat24

shareShare

Ruiqi Zhong

@zhongruiqi

6 months ago

Personal Update: I just started my first day at Thinking Machines Lab Thinking Machines , and I will continue working on Human+AI collaboration. Super excited about joining the team!!

thumb_up_off_alt715

chat_bubble_outline21

repeat6

shareShare

Songlin Yang

@songlinyang4

6 months ago

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

thumb_up_off_alt424

chat_bubble_outline9

repeat79

shareShare

Vidhisha Balachandran

@vidhisha_b

5 months ago

📌 You can now find all the evaluation logs (and reasoning traces for common benchmarks!) from our inference-time scaling report and the Phi-4 reasoning report at  huggingface.co/datasets/micro…. The evaluation code can be found at Eureka ML Insights: github.com/microsoft/eure….

thumb_up_off_alt56

chat_bubble_outline1

repeat12

shareShare

Akari Asai

@akariasai

5 months ago

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

thumb_up_off_alt178

chat_bubble_outline5

repeat17

shareShare

Jihan Yao

@jihan_yao

5 months ago

We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation ✅ Reliable: 94.3% agreement with human judgment ✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions 🔍Results and Takeaways: > GPT-Image-1 from OpenAI

thumb_up_off_alt29

chat_bubble_outline2

repeat17

shareShare

Ludwig Schmidt

@lschmidt3

5 months ago

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat208

shareShare

Shangbin Feng

@shangbinfeng

5 months ago

Check out our work on LLMs and scientific knowledge updates!

thumb_up_off_alt54

chat_bubble_outline0

repeat11

shareShare

Nathan Lambert

@natolambert

5 months ago

Best AI paper title of the week. Wish it was a backronym.

thumb_up_off_alt222

chat_bubble_outline4

repeat25

shareShare

Weijia Shi

@weijiashi2

5 months ago

Excited to be at #CVPR2025 this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻 ⏰ 1:30–5:00 PM CDT, June 12 📍 Room 107 B, CVPR venue

Excited to be at <a href="/CVPR/">#CVPR2025</a> this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻

⏰ 1:30–5:00 PM CDT, June 12
📍 Room 107 B, CVPR venue

thumb_up_off_alt89

chat_bubble_outline1

repeat7

shareShare

Mickel Liu

@mickel_liu

5 months ago

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

thumb_up_off_alt96

chat_bubble_outline5

repeat22

shareShare

Sarah Wiegreffe (on faculty job market!)

@sarahwiegreffe

5 months ago

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland UMD Department of Computer Science this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

thumb_up_off_alt564

chat_bubble_outline70

repeat48

shareShare

Leo Liu

@zeyuliu10

5 months ago

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

thumb_up_off_alt109

chat_bubble_outline3

repeat40

shareShare

Andriy Burkov

@burkov

4 months ago

GPT-5 will most likely be just one chatbot using a dispatching classifier that decides which model should process each incoming message. The user will no longer puzzle over 7 models to chose from. No serious "intelligence" improvement is expected.

thumb_up_off_alt348

chat_bubble_outline49

repeat25

shareShare

CLS

@chengleisi

4 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Heng Ji

@hengjinlp

4 months ago

I’m looking for a new postdoc to start this fall working on AI for Science/Science-Inspired AI (focusing on chemistry and bioengineering domains for now). Please drop me a CV if interested.

thumb_up_off_alt65

chat_bubble_outline1

repeat17

shareShare

Mickel Liu

@mickel_liu

4 months ago

Excited to share our latest LLM self-play research! We had LLMs challenge themselves in competitive language games, showing improvements across math, logic, and reasoning benchmarks. More evidence that online RL unlocks incredible potential! 🚀

thumb_up_off_alt23

chat_bubble_outline1

repeat4

shareShare