Shangbin Feng (@shangbinfeng) 's Twitter Profile
Shangbin Feng

@shangbinfeng

PhD student @uwcse @uwnlp. Multi-LLM collaboration, social NLP, networks and structures. #水文学家

ID: 1408713582498385926

linkhttp://bunsenfeng.github.io calendar_today26-06-2021 09:07:53

905 Tweet

3,3K Followers

2,2K Following

Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD! 

I pioneered using LLMs to explain dataset&amp;model. It's used by interp at <a href="/OpenAI/">OpenAI</a>  and societal impact <a href="/AnthropicAI/">Anthropic</a> 

Tutorial here. It's a great direction &amp; someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P
youngjoongkwon (@youngjoongkwon) 's Twitter Profile Photo

Starting this Fall, I’ll be joining the Computer Science Department at Emory University as an assistant professor. My group will focus on 3D computer vision and graphics, working on human digitization, 3D/4D reconstruction, manipulation, generative AI, AR/VR, and telepresence.

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs. 📊 13,331 examples across 5 tasks: – Visual RAG – Many-shot ICL – Needle-in-a-haystack – VL Summarization – Long-document VQA 📏 Lengths: 8 / 16 / 32 / 64 / 128K 🔍 Benchmarking both thoroughly & effectively!

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs.
📊 13,331 examples across 5 tasks:
– Visual RAG
– Many-shot ICL
– Needle-in-a-haystack
– VL Summarization
– Long-document VQA
📏 Lengths: 8 / 16 / 32 / 64 / 128K
🔍 Benchmarking both thoroughly &amp; effectively!
Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Personal Update: I just started my first day at Thinking Machines Lab Thinking Machines , and I will continue working on Human+AI collaboration. Super excited about joining the team!!

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

Vidhisha Balachandran (@vidhisha_b) 's Twitter Profile Photo

📌 You can now find all the evaluation logs (and reasoning traces for common benchmarks!) from our inference-time scaling report and the Phi-4 reasoning report at  huggingface.co/datasets/micro…. The evaluation code can be found at Eureka ML Insights: github.com/microsoft/eure….

Akari Asai (@akariasai) 's Twitter Profile Photo

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

Jihan Yao (@jihan_yao) 's Twitter Profile Photo

We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation ✅ Reliable: 94.3% agreement with human judgment ✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions 🔍Results and Takeaways: > GPT-Image-1 from OpenAI

We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

✅ Reliable: 94.3% agreement with human judgment
✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions

🔍Results and Takeaways:

&gt; GPT-Image-1 from <a href="/OpenAI/">OpenAI</a>
Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Excited to be at #CVPR2025 this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻 ⏰ 1:30–5:00 PM CDT, June 12 📍 Room 107 B, CVPR venue

Excited to be at <a href="/CVPR/">#CVPR2025</a> this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻

⏰ 1:30–5:00 PM CDT, June 12
📍 Room 107 B, CVPR venue
Mickel Liu (@mickel_liu) 's Twitter Profile Photo

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat
🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker &amp; Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵
Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile Photo

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland UMD Department of Computer Science this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland <a href="/umdcs/">UMD Department of Computer Science</a> this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
Andriy Burkov (@burkov) 's Twitter Profile Photo

GPT-5 will most likely be just one chatbot using a dispatching classifier that decides which model should process each incoming message. The user will no longer puzzle over 7 models to chose from. No serious "intelligence" improvement is expected.

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Heng Ji (@hengjinlp) 's Twitter Profile Photo

I’m looking for a new postdoc to start this fall working on AI for Science/Science-Inspired AI (focusing on chemistry and bioengineering domains for now). Please drop me a CV if interested.

Mickel Liu (@mickel_liu) 's Twitter Profile Photo

Excited to share our latest LLM self-play research! We had LLMs challenge themselves in competitive language games, showing improvements across math, logic, and reasoning benchmarks. More evidence that online RL unlocks incredible potential! 🚀