Michela Paganini (@wondermicky) 's Twitter Profile
Michela Paganini

@wondermicky

Staff Research Scientist @DeepMind | LLMs, Evals & Model Understanding | Previously: @facebookAI | @Yale Physics PhD | @CERN | @BerkeleyLab | @UCBerkeley

ID: 112717746

linkhttp://mickypaganini.github.io calendar_today09-02-2010 13:37:28

4,4K Tweet

6,6K Followers

1,1K Following

Kaggle (@kaggle) 's Twitter Profile Photo

Introducing FACTS Grounding. A new benchmark we’re launching with Google DeepMind to evaluate LLM’s factual accuracy on over 1700 tasks. 🧠📐

Introducing FACTS Grounding. A new benchmark we’re launching with <a href="/GoogleDeepMind/">Google DeepMind</a> to evaluate LLM’s factual accuracy on over 1700 tasks. 🧠📐
Kaggle (@kaggle) 's Twitter Profile Photo

We’ve partnered with Google DeepMind to publish a leaderboard of models on this new factuality benchmark. Check it out at: kaggle.com/facts-leaderbo…

Smoke-away (@smokeawayyy) 's Twitter Profile Photo

> Hey did you see that new model from Google? Which one? > Veo 2 Flash 1217 (exp) preview. oh I missed that. Is it on gemini dot google dot com, aistudio dot google dot com, or labs dot google dot com?

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

A new benchmark for evaluating the factuality of LLMs, powered by Kaggle. Submit your models and let's make the world more grounded : ) deepmind.google/discover/blog/…

Jack Rae (@jack_w_rae) 's Twitter Profile Photo

We released Gemini 2.0 Flash Thinking today! ⚡️🤔 It's a small step towards improved reasoning via inference-time compute, built on top of our small and mighty 2.0 Flash!

I Can't Believe It's Not Better! (@icbinbworkshop) 's Twitter Profile Photo

Our #ICLR2025 ICLR 2026 workshop is looking for submissions on unexpected outcomes and hard-earned lessons in #DeepLearning Submission Deadline: 03 February 2025 Workshop Dates: 27 or 28 April 2025 Location: Singapore More info: shorturl.at/OpQns

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Breaking news from Text-to-Image Arena! 🖼️✨ Google DeepMind’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar! Try the best text2image at LMArena and cast your vote! More analysis👇

Breaking news from Text-to-Image Arena! 🖼️✨

<a href="/GoogleDeepMind/">Google DeepMind</a>’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar!

Try the best text2image at LMArena and cast your vote! More analysis👇
Lora Aroyo (@laroyo) 's Twitter Profile Photo

📢 Join Google DeepMind's DEER team & shape the future of #ResponsibleAI! We're hiring a #ResearchScientist (Fixed Term Contract, 12 month) to tackle fairness in multi-modal AI. Make a real-world impact! Apply by Feb 7th: boards.greenhouse.io/deepmind/jobs/… Alicia Parrish

Sian Gooding (@siangooding) 's Twitter Profile Photo

New paper alert from Google DeepMind! 🚨 We've put LLMs to the test as writing co-pilots – how good are they really at helping us write? LLMs are increasingly used for open-ended tasks like writing assistance, but how do we assess their effectiveness? 🤔 arxiv.org/abs/2503.19711

Sian Gooding (@siangooding) 's Twitter Profile Photo

🚨 I’m hosting a Student Researcher Google DeepMind! Join us on the Autonomous Assistants team (led by Edward Grefenstette ) to explore multi-agent communication—how agents learn to interact, coordinate, and solve tasks together. DM me for details!

Mislav Balunović (@mbalunovic) 's Twitter Profile Photo

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.
Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways. More details below!

Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways.

More details below!
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

Michela Paganini (@wondermicky) 's Twitter Profile Photo

It’s finally #GoogleIO time and I’m so glad to see tons of cool innovations finally shared with the community. My personal favorites: - Gemini text diffusion ⚡️ - AI Mode in Search (so useful to me!) 🔎 - Glasses + Android XR 🥽 - New Veo & Imagen 🎥🎨 - Thought summary in API 💭

Jack Rae (@jack_w_rae) 's Twitter Profile Photo

The Gemini Diffusion release feels like a landmark moment. For text generation, autoregressive models have always outperformed diffusion models from a quality perspective. It wasn't clear that the gap could ever be closed. The team behind this have kept laser focused, broken

The Gemini Diffusion release feels like a landmark moment. 

For text generation, autoregressive models have always outperformed diffusion models from a quality perspective. It wasn't clear that the gap could ever be closed. 

The team behind this have kept laser focused, broken