Sharon Y. Li (@sharonyixuanli) 's Twitter Profile
Sharon Y. Li

@sharonyixuanli

Assistant Professor @WisconsinCS. Formerly postdoc @Stanford, Ph.D. @Cornell. Making AI safe and reliable for the open world.

ID: 1107711818997395458

linkhttps://pages.cs.wisc.edu/~sharonli calendar_today18-03-2019 18:34:12

707 Tweet

9,9K Followers

757 Following

Sean Xuefeng Du (@xuefeng_du) 's Twitter Profile Photo

📣 Announcing two calls for postdocs and research assistants / interns in my lab at NTU Singapore! 1. NTU AI-for-X Postdoctoral Fellowship is accepting postdoc applications who is jointly supervised by AI faculty and a project mentor in their own research field (X) at NTU. It

📣 Announcing two calls for postdocs and research assistants / interns in my lab at NTU Singapore!  

1. NTU AI-for-X Postdoctoral Fellowship is accepting postdoc applications who is jointly supervised by AI faculty and a project mentor in their own research field (X) at NTU. It
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Multi-Agent Debate (MAD) has been hyped as a collaborative reasoning paradigm — but let me drop the bomb: majority voting, without any debate, often performs on par with MAD. This is what we formally prove in our #NeurIPS2025 Spotlight paper: “Debate or Vote: Which Yields

Multi-Agent Debate (MAD) has been hyped as a collaborative reasoning paradigm — but let me drop the bomb: majority voting, without any debate, often performs on par with MAD.

This is what we formally prove in our #NeurIPS2025 Spotlight paper:
 “Debate or Vote: Which Yields
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Excited to share our #NeurIPS2025 paper: Visual Instruction Bottleneck Tuning (Vittle) Multimodal LLMs do great in-distribution, but often break in the wild. Scaling data or models helps, but it’s costly. 💡 Our work is inspired by the Information Bottleneck (IB) principle,

Excited to share our #NeurIPS2025 paper: Visual Instruction Bottleneck Tuning (Vittle)

Multimodal LLMs do great in-distribution, but often break in the wild. Scaling data or models helps, but it’s costly.

đź’ˇ Our work is inspired by the Information Bottleneck (IB) principle,
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

I will be giving a talk at UPenn CIS Seminar next Tuesday, October 7. More info below events.seas.upenn.edu/event/14856/ thanks Weijie Su for hosting!

I will be giving a talk at UPenn CIS Seminar next Tuesday, October 7.

More info below
events.seas.upenn.edu/event/14856/

thanks <a href="/weijie444/">Weijie Su</a> for hosting!
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Collecting large human preference data is expensive—the biggest bottleneck in reward modeling. In our #NeurIPS2025 paper, we introduce latent-space synthesis for preference data, which is 18× faster and uses a network that’s 16,000× smaller (0.5M vs 8B parameters) than

Collecting large human preference data is expensive—the biggest bottleneck in reward modeling.

In our #NeurIPS2025 paper, we introduce latent-space synthesis for preference data, which is 18× faster and uses a network that’s 16,000× smaller (0.5M vs 8B parameters) than
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Your LVLM says: “There’s a cat on the table.” But… there’s no cat in the image. Not even a whisker. This is object hallucination — one of the most persistent reliability failures in multi-modal language models. Our new #NeurIPS2025 paper introduces GLSim, a simple but

Your LVLM says: “There’s a cat on the table.”
But… there’s no cat in the image. Not even a whisker.

This is object hallucination — one of the most persistent reliability failures in multi-modal language models. 

Our new #NeurIPS2025 paper introduces GLSim, a simple but
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

We hear increasing discussion about aligning LLM with “diverse human values.” But what’s the actual price of pluralism? 🧮 In our #NeurIPS2025 paper (with Shawn Im), we move this debate from the philosophical to the measurable — presenting the first theoretical scaling law

We hear increasing discussion about aligning LLM with “diverse human values.”
But what’s the actual price of pluralism? 🧮

In our #NeurIPS2025 paper (with <a href="/shawnim00/">Shawn Im</a>), we move this debate from the philosophical to the measurable — presenting the first theoretical scaling law
Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Check out our recent work led by Leitian Tao with the AI at Meta team on using hybrid RL for mathematical reasoning tasks. 🔥Hybrid RL offers a promising way to go beyond purely verifiable rewards — combining the reliability of verifier signals with the richness of learned feedback.

Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Took on the challenge of putting together three different keynote talks for the upcoming #ICCV2025 workshops...and here are the titles: 🔍 Explainability Meets Reliability in Large Vision-Language Models — eXCV Workshop (excv-workshop.github.io) October 19, 10:15–10:45 Honolulu

Sharon Y. Li (@sharonyixuanli) 's Twitter Profile Photo

Human preference data is noisy: inconsistent labels, annotator bias, etc. No matter how fancy the post-training algorithm is, bad data can sink your model. 🔥 Min Hsuan (Samuel) Yeh and I are thrilled to release PrefCleanBench — a systematic benchmark for evaluating data cleaning

Human preference data is noisy: inconsistent labels, annotator bias, etc. No matter how fancy the post-training algorithm is, bad data can sink your model. 

🔥 <a href="/Samuel861025/">Min Hsuan (Samuel) Yeh</a> and I are thrilled to release PrefCleanBench — a systematic benchmark for evaluating data cleaning
Hugo Larochelle (@hugo_larochelle) 's Twitter Profile Photo

We at TMLR are proud to announce that selected papers will now be eligible for an opportunity to present at the joint NeurIPS/ICML/ICLR Journal-to-Conference (J2C) Track: medium.com/@TmlrOrg/tmlr-…