Mu Cai (@mucai7) 's Twitter Profile
Mu Cai

@mucai7

Research Scientist @GoogleDeepMind, Multimodal Large Language Models ex: Ph.D. @WisconsinCS | @MSFTResearch

ID: 1126468933676986368

linkhttps://pages.cs.wisc.edu/~mucai/ calendar_today09-05-2019 12:48:17

197 Tweet

2,2K Followers

754 Following

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Gemini 2.5 Flash just dropped. โšก As a hybrid reasoning model, you can control how much it โ€˜thinksโ€™ depending on your ๐Ÿ’ฐ - making it ideal for tasks like building chat apps, extracting data and more. Try an early version in Google AI Studio โ†’ ai.dev

Mu Cai (@mucai7) 's Twitter Profile Photo

Totally agree. Models like #OpenAI 's #o3, #o4mini still can not figure out the basic geometry problems. If visual perception is wrong, then ``reasoning" part is meaningless. Huge room for improvement!

Mu Cai (@mucai7) 's Twitter Profile Photo

#OpenAI's #o3 #o4mini just again demonstrate the power of visual prompting in ViP-LLaVA(CVPR 2024)vip-llava.github.io In 2023, we proved that, drawing hints visually is more effective that elaborating in text, especially for object level understanding. Go for VisualThinking!

Xiang Li (@xiangli54505720) 's Twitter Profile Photo

Hi everyone! I hope you had a great time in Singapore๐Ÿ‡ธ๐Ÿ‡ฌ. Though I could not be there in person, I'm excited to share our poster schedule at #ICLR2025. Feel free to stop by, check out our work, and bring any questions you have to Kanchana Ranasinghe.

Mu Cai (@mucai7) 's Twitter Profile Photo

I am excited to announce that I am not at #ICLR presenting Matryoshka Multimodal Models matryoshka-mm.github.io. ๐Ÿ˜€ But rather, I am online at Bay Area. Ping me if you have any questions or ideas w.r.t paper! Feel free to read the poster at Hall 3 + Hall 2B #86 this morning!

I am excited to announce that I am not at #ICLR presenting Matryoshka Multimodal Models matryoshka-mm.github.io. ๐Ÿ˜€ 

But rather, I am online at Bay Area. Ping me if you have any questions or ideas w.r.t paper!

Feel free to read the poster at Hall 3 + Hall 2B #86 this morning!
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Weโ€™re releasing an updated Gemini 2.5 Pro (I/O edition) to make it even better at coding. ๐Ÿš€ You can build richer web apps, games, simulations and more - all with one prompt. In Google Gemini App, here's how it transformed images of nature into code to represent unique patterns ๐ŸŒฑ

Mu Cai (@mucai7) 's Twitter Profile Photo

Thank you Yong Jae Lee! Without the support from you and our group members, it is impossible for me to have such works. I'll miss the days working in our group.

Thank you <a href="/yong_jae_lee/">Yong Jae Lee</a>! Without the support from you and our group members, it is impossible for me to have such works. I'll miss the days working in our group.
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. Itโ€™s able to: ๐Ÿ”˜ Design faster matrix multiplication algorithms ๐Ÿ”˜ Find new solutions to open math problems ๐Ÿ”˜ Make data centers, chip design and AI training more efficient across Google. ๐Ÿงต

Pushmeet Kohli (@pushmeet) 's Twitter Profile Photo

Excited to announce AlphaEvolve A powerful AI coding agent developed by our team in Google DeepMind that is able to discover impactful new algorithms for important problems in Maths and Computing by combining the creativity of large language models with automated evaluators.

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Google's progress in AI since last year: - The worlds strongest models, on pareto frontier - Gemini app: has over 400M monthly active users - We now process 480T tokens a month, up 50x YoY - Over 7M developers have built with the Gemini API (4x) Much more to come still!

Feng Yao (@fengyao1909) 's Twitter Profile Photo

๐Ÿ”ฅ "Vibe coding" is everywhereโ€”but is it really care-free? We introduce ๐‘๐ž๐š๐‹, an RL framework that trains LLMs with automated program analysis feedback, enabling "vibe coding" to be not just fastโ€”but ๐ฏ๐ฎ๐ฅ๐ง๐ž๐ซ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ-๐Ÿ๐ซ๐ž๐ž & ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ๐ข๐จ๐ง-๐ซ๐ž๐š๐๐ฒ ๐Ÿ›ก๏ธ

๐Ÿ”ฅ "Vibe coding" is everywhereโ€”but is it really care-free?

We introduce ๐‘๐ž๐š๐‹, an RL framework that trains LLMs with automated program analysis feedback, enabling "vibe coding" to be not just fastโ€”but ๐ฏ๐ฎ๐ฅ๐ง๐ž๐ซ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ-๐Ÿ๐ซ๐ž๐ž &amp; ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ๐ข๐จ๐ง-๐ซ๐ž๐š๐๐ฒ ๐Ÿ›ก๏ธ
Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

As a video gaming company, Krafton AI has secretly been cooking something big with NVIDIA AI for a while! ๐Ÿฅณ We introduce Orak, the first comprehensive video gaming benchmark for LLMs! arxiv.org/abs/2506.03610

As a video gaming company, <a href="/Krafton_AI/">Krafton AI</a> has secretly been cooking something big with <a href="/NVIDIAAI/">NVIDIA AI</a> for a while!

๐Ÿฅณ We introduce Orak, the first comprehensive video gaming benchmark for LLMs!

arxiv.org/abs/2506.03610