Zirui "Colin" Wang (@zwcolin) 's Twitter Profile
Zirui "Colin" Wang

@zwcolin

Incoming CS PhD @Berkeley_EECS; MSCS @princeton_nlp; '25 @siebelscholars; prev @HDSIUCSD; I work on multimodal foundation models; He/Him.

ID: 2986434572

linkhttp://ziruiw.net calendar_today17-01-2015 04:18:40

122 Tweet

1,1K Followers

528 Following

Xi Ye (@xiye_nlp) 's Twitter Profile Photo

๐Ÿ”” I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

๐Ÿšจ I'll be presenting CharXiv this Friday morning at #neurips and Sunday at the MAR workshop. I'm ๐Ÿค— to connect with new friends and chat about developing/enhancing multimodal models (text-to-image, VLMs, etc) and their evaluations! Let's meet up at the conference :)

๐Ÿšจ I'll be presenting CharXiv this Friday morning at #neurips and Sunday at the MAR workshop.

I'm ๐Ÿค— to connect with new friends and chat about developing/enhancing multimodal models (text-to-image, VLMs, etc) and their evaluations! Let's meet up at the conference :)
Danqi Chen (@danqi_chen) 's Twitter Profile Photo

Iโ€™ve just arrived in Vancouver and am excited to join the final stretch of #NeurIPS2024! This morning, we are presenting 3 papers 11am-2pm: - Edge pruning for finding Transformer circuits (#3111, spotlight) Adithya Bhaskar - SimPO (#3410) Yu Meng @ ICLR'25 Mengzhou Xia - CharXiv (#5303)

Iโ€™ve just arrived in Vancouver and am excited to join the final stretch of #NeurIPS2024!

This morning, we are presenting 3 papers 11am-2pm:
- Edge pruning for finding Transformer circuits (#3111, spotlight) <a href="/AdithyaNLP/">Adithya Bhaskar</a> 
- SimPO (#3410) <a href="/yumeng0818/">Yu Meng @ ICLR'25</a> <a href="/xiamengzhou/">Mengzhou Xia</a>
- CharXiv (#5303)
Jing-Jing Li (@drjingjing2026) 's Twitter Profile Photo

1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.

1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.
Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

I'll present CharXiv at tmr's Multimodal Algorithmic Reasoning workshop for a spotlight talk at 11:45am followed by a poster session at 2:15pm in West Building Exhibit Hall A. If you are interested in or working on developing/evaluating multimodal models, let's connect there!

I'll present CharXiv at tmr's Multimodal Algorithmic Reasoning workshop for a spotlight talk at 11:45am followed by a poster session at 2:15pm in West Building Exhibit Hall A.

If you are interested in or working on developing/evaluating multimodal models, let's connect there!
Kaiqu Liang (@kaiqu_liang) 's Twitter Profile Photo

Think your RLHF-trained AI is aligned with your goals? โš ๏ธ We found that RLHF can induce significant misalignment when humans provide feedback by predicting future outcomes ๐Ÿค”, creating incentives for LLM deception ๐Ÿ˜ฑ Introduce โœจRLHS (Hindsight Simulation)โœจ: By simulating

Think your RLHF-trained AI is aligned with your goals?

โš ๏ธ We found that RLHF can induce significant misalignment when humans provide feedback by predicting future outcomes ๐Ÿค”, creating incentives for LLM deception ๐Ÿ˜ฑ

Introduce โœจRLHS (Hindsight Simulation)โœจ: By simulating
Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

While DeepSeek R1 has been flexing ๐Ÿ’ช๐Ÿป, how are VLMs progressing in ๐ซ๐ž๐š๐ฌ๐จ๐ง๐ข๐ง๐ ? โš ๏ธ Major Shift: the latest ๐จ๐ฉ๐ž๐ง-๐ฐ๐ž๐ข๐ ๐ก๐ญ Qwen2.5-VL has beaten the first GPT-4o and is now on par with the latest ChatGPT-4o! ๐Ÿ˜ฒ But what about o1-like models? Can they enhance

While DeepSeek R1 has been flexing ๐Ÿ’ช๐Ÿป, how are VLMs progressing in ๐ซ๐ž๐š๐ฌ๐จ๐ง๐ข๐ง๐ ?

โš ๏ธ Major Shift: the latest ๐จ๐ฉ๐ž๐ง-๐ฐ๐ž๐ข๐ ๐ก๐ญ Qwen2.5-VL has beaten the first GPT-4o and is now on par with the latest ChatGPT-4o! ๐Ÿ˜ฒ

But what about o1-like models? Can they enhance
Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

Six years ago I was a high school senior, and my dream was to get into Berkeley for CS. I got rejected. I appealed. Still No. But that setback only made me stronger. I never let that dream down. And now? I made it. Finally, time to get to visit the campus and know everyone!

Six years ago I was a high school senior, and my dream was to get into Berkeley for CS. I got rejected. I appealed. Still No.

But that setback only made me stronger. I never let that dream down. And now? I made it.

Finally, time to get to visit the campus and know everyone!
Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

It seems that models can figure out the correct rules with RL. I created a synthetic game to run GRPO on VLMs over the weekend and I didn't realize I wrote down the wrong rule for the instruction ๐Ÿคฆ๐Ÿปโ€โ™‚๏ธ. With ~200 steps the model learns the corner cases where the wrong rule can

Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

Life update: I'll be joining Berkeley EECS as a PhD student starting in fall 2025, playing around with multimodal models and llms, being part of Sky Lab & BAIR, and enjoying the unrealโ„ข๏ธ weather ๐Ÿ–๏ธ CA has to offer!

Life update: I'll be joining Berkeley EECS as a PhD student starting in fall 2025, playing around with multimodal models and llms, being part of Sky Lab &amp; BAIR, and enjoying the unrealโ„ข๏ธ weather ๐Ÿ–๏ธ CA has to offer!
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

News: Search Arena is now LIVE! ๐ŸŒ๐Ÿ” โœ… Test web-augmented LLM systems on real-time, real-world tasks โ€” retrieval, writing, debugging & more. โœ… Perplexity, Gemini, OpenAI go head-to-head. โœ… Crowd-powered evals. Leaderboard ๐Ÿ† coming soonโ€ฆ โšกTry it now at lmarena .ai!

Zirui "Colin" Wang (@zwcolin) 's Twitter Profile Photo

i've been working on my masters' thesis and finally got something worth mentioning for the broader impact of the research work i did last year -- it's not another benchmark but an eval that people and devs care about and i'm ready to build more of them :p

i've been working on my masters' thesis and finally got something worth mentioning for the broader impact of the research work i did last year -- 

it's not another benchmark but an eval that people and devs care about

and i'm ready to build more of them :p
Alex Zhang (@a1zhang) 's Twitter Profile Photo

Claude can play Pokemon, but can it play DOOM? With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furthest, finding the blue room! Our VideoGameBench (twenty games from the 90s) and agent are open source so you can try it yourself now --> ๐Ÿงต

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

We're excited to invite everyone to a new Beta version of LMArena! ๐ŸŽ‰ For months, weโ€™ve been poring through community feedback to improve the siteโ€”fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this

Tianle (Tim) Li (@litianleli) 's Twitter Profile Photo

๐Ÿšจ Arena-Hard-v2.0 is here! ๐Ÿšจ Major Improvement: - Better Automatic Judges (Gemini-2.5 & GPT-4.1) ๐Ÿฆพ - 500 Fresh Prompts from LMArena๐Ÿ—ฟ - Tougher Baselines ๐Ÿ‹๏ธ - Multilingual (30+ Langs) ๐ŸŒŽ - Plus Eval for Creative Writing โœ๏ธ Test your model on the hardest prompts from LMArena!

๐Ÿšจ Arena-Hard-v2.0 is here! ๐Ÿšจ

Major Improvement:
- Better Automatic Judges (Gemini-2.5 &amp; GPT-4.1) ๐Ÿฆพ
- 500 Fresh Prompts from LMArena๐Ÿ—ฟ
- Tougher Baselines ๐Ÿ‹๏ธ
- Multilingual (30+ Langs) ๐ŸŒŽ
- Plus Eval for Creative Writing โœ๏ธ

Test your model on the hardest prompts from LMArena!
Xindi Wu (@cindy_x_wu) 's Twitter Profile Photo

Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. ๐Ÿ“ฆ arxiv.org/abs/2504.21850 1/10

Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. ๐Ÿ“ฆ

arxiv.org/abs/2504.21850

1/10
MLPC Group (@mlpcucsd) 's Twitter Profile Photo

Weโ€™re thrilled that our labโ€™s work on โ€œDeeply-Supervised Netsโ€ has received the Test-of-Time Award at AISTATS 2025! ๐Ÿ† This prestigious award honors papers published 10 years ago that have had a lasting and significant impact on the field of artificial intelligence and

Weโ€™re thrilled that our labโ€™s work on โ€œDeeply-Supervised Netsโ€ has received the Test-of-Time Award at AISTATS 2025! ๐Ÿ† This prestigious award honors papers published 10 years ago that have had a lasting and significant impact on the field of artificial intelligence and