Machel Reid (@machelreid) 's Twitter Profile
Machel Reid

@machelreid

research scientist @googledeepmind ♊️

ID: 807327556072402945

linkhttp://machelreid.github.io calendar_today09-12-2016 20:54:23

964 Tweet

2,2K Followers

1,1K Following

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Say hello to Grounding with Google Search, available in the Gemini API + Google AI Studio! You can now access real time, fresh, up to date information from Google Search when building with Gemini by enabling the Grounding tool. developers.googleblog.com/en/gemini-api-…

Noam Shazeer (@noamshazeer) 's Twitter Profile Photo

We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try

Jack Rae (@jack_w_rae) 's Twitter Profile Photo

We released Gemini 2.0 Flash Thinking today! ⚡️🤔 It's a small step towards improved reasoning via inference-time compute, built on top of our small and mighty 2.0 Flash!

Graham Neubig (@gneubig) 's Twitter Profile Photo

I'm pretty amazed by how easy it is to localize frontend apps with AI agents. I had a meeting with people in Japan, and wanted to create a prototype of our app in Japanese. In about 8 hours, OpenHands generated 6200 lines of code and now our app is localized in 10 languages.

I'm pretty amazed by how easy it is to localize frontend apps with AI agents.

I had a meeting with people in Japan, and wanted to create a prototype of our app in Japanese. In about 8 hours, OpenHands generated 6200 lines of code and now our app is localized in 10 languages.
Sebastian Ruder (@seb_ruder) 's Twitter Profile Photo

A new year, a new challenge. I recently joined AI at Meta to improve evaluation and benchmarking of LLMs. I'm excited to push on making LLMs more useful and accessible, via open-sourcing data/models and real-world applications. I'll continue to be based in Berlin.

koray kavukcuoglu (@koraykv) 's Twitter Profile Photo

1/ Today we are releasing Gemini 2.5 Pro Experimental, our newest Gemini model with integrated “thinking” and significant performance gains. Very proud of the whole team! 🧵

Oriol Vinyals (@oriolvinyalsml) 's Twitter Profile Photo

Introducing Gemini 2.5 Pro Experimental! 🎉 Our newest Gemini model has stellar performance across math and science benchmarks. It’s an incredible model for coding and complex reasoning, and it’s #1 on the lmarena.ai leaderboard by a drastic 40 ELO margin. Only a handful of

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Gemini 2.5 Flash is here, our first unified reasoning model with thinking budgets. 🔥 It’s on the perato frontier and punches above its price and size!! developers.googleblog.com/en/start-build…

Google AI Developers (@googleaidevs) 's Twitter Profile Photo

⚡Gemini 2.5 Flash is now in Preview. Available on Google AI Studio, it’s our first fully hybrid reasoning model that lets you toggle thinking or set budgets for the optimal quality/cost/latency mix. Maintain 2.0 Flash speed + improved perf even when thinking is off. →

⚡Gemini 2.5 Flash is now in Preview.

Available on Google AI Studio, it’s our first fully hybrid reasoning model that lets you toggle thinking or set budgets for the optimal quality/cost/latency mix. Maintain 2.0 Flash speed + improved perf even when thinking is off. →
Melvin Johnson (@melvinjohnsonp) 's Twitter Profile Photo

Excited to introduce Gemini 2.5 Flash our most cost-efficient thinking model. We are once again at the frontier here. Pretty good well rounded performance.

Excited to introduce Gemini 2.5 Flash our most cost-efficient thinking model. 
We are once again at the frontier here. Pretty good well rounded performance.
Demis Hassabis (@demishassabis) 's Twitter Profile Photo

We've just given our most powerful workhorse model a big upgrade to Gemini 2.5 Flash. You can try it now in preview on ai.dev - yet another Gemini data point on the cost-performance pareto frontier!

We've just given our most powerful workhorse model a big upgrade to Gemini 2.5 Flash. You can try it now in preview on ai.dev - yet another Gemini data point on the cost-performance pareto frontier!
Nathan Lambert (@natolambert) 's Twitter Profile Photo

Gemini 2.5 Pro shipping a granular thinking budget that works could actually be a pretty big deal. The glorious slider that comes before the model just knows how much thinking to do. Helps limit overthinking, helps collect good data on "how hard users think the model is"

Gemini 2.5 Pro shipping a granular thinking budget that works could actually be a pretty big deal. The glorious slider that comes before the model just knows how much thinking to do. Helps limit overthinking, helps collect good data on "how hard users think the model is"
Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Our latest Gemini 2.5 Pro update is now in preview. It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads lmarena.ai with a 24pt Elo score jump since the previous version. We also

Our latest Gemini 2.5 Pro update is now in preview.

It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads <a href="/lmarena_ai/">lmarena.ai</a> with a 24pt Elo score jump since the previous version.

We also
Melvin Johnson (@melvinjohnsonp) 's Twitter Profile Photo

Our latest update to Gemini 2.5 Pro is here. It's SoTA on GPQA Diamond, AIDER and HLE. The team has also worked hard to improve the model on style, persona and creativity. We're excited to see what you build with it. Please let us know any feedback as we're eternally cooking.

Paul Gauthier (@paulgauthier) 's Twitter Profile Photo

Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens. The default thinking mode, where Gemini self-determines the thinking budget, scored 79%. Full leaderboard: aider.chat/docs/leaderboa…

Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens.

The default thinking mode, where Gemini self-determines the thinking budget, scored 79%.

Full leaderboard:
aider.chat/docs/leaderboa…