Nando de Freitas (@nandodf) 's Twitter Profile
Nando de Freitas

@nandodf

VP microsoft.ai understanding & harnessing intelligence responsibly. Past: NPI, AlphaGo tuning, Gato, ReST, AlphaCode, Lyria, Imagen 3, Veo, r-Gemma, Genie ...

ID: 29843511

linkhttps://scholar.google.com/citations?user=nzEluBwAAAAJ&hl=en calendar_today08-04-2009 22:41:09

11,11K Tweet

102,102K Followers

751 Following

Sander Dieleman (@sedielem) 's Twitter Profile Photo

I'm always quite skeptical of work that addresses a long-standing problem with a relatively simple tweak, but this looks promising: wrap the softmax numerator in ReLU(x - 1), and the denom terms in abs(x - 1) to get rid of attention sinks. Would be nice if it holds up at scale!

机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

🔥 State-Space Meets Diffusion: A New Era for Video World Models This paper tackles a core bottleneck in video-based world modeling: long-term memory. While video diffusion models are great at short-term frame prediction, their memory fades fast—especially when modeling long

🔥 State-Space Meets Diffusion: A New Era for Video World Models 

This paper tackles a core bottleneck in video-based world modeling: long-term memory. While video diffusion models are great at short-term frame prediction, their memory fades fast—especially when modeling long
Alex Vacca (@itsalexvacca) 's Twitter Profile Photo

Anthropic's CEO claims AI hallucinates less than humans. Bold statement. So I decided to test it by feeding the same FAKE theories to ChatGPT, Claude, and Gemini to see which one calls me out first. The results shocked me 🧵

Anthropic's CEO claims AI hallucinates less than humans.

Bold statement.

So I decided to test it by feeding the same FAKE theories to ChatGPT, Claude, and Gemini to see which one calls me out first.

The results shocked me 🧵
Mustafa Suleyman (@mustafasuleyman) 's Twitter Profile Photo

Good models admit when they don't know. Great models ask for help figuring it out. The ability to say "I got this far but need xyz to finish" or "I'm stuck at xyz" isn't just better than a refusal (or bad answer), it's the best way to earn user trust.

Kling AI (@kling_ai) 's Twitter Profile Photo

🚀 Big News, Creators! The KLING 2.1 Lineup Just Dropped! Introducing the KLING 2.1 Lineup: - KLING 2.1: ✨ Now in Standard (720p) and Professional (1080p) modes 💸 20 Credits/5s (Standard), or 35 Credits/5s (Pro) - KLING 2.1 Master: ⚡ Superb dynamics & prompt adherence 📺 Now

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently

DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader

DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently
Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Apache Spark 4.0 is out with some huge improvements across the board. SQL’s much more powerful, Spark Connect makes it easier to run apps, new languages and more. It’s amazing to see the community still growing fast and releasing over 5000 patches in 4.0. databricks.com/blog/introduci…

hardmaru (@hardmaru) 's Twitter Profile Photo

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code

Ricky T. Q. Chen (@rickytqchen) 's Twitter Profile Photo

FUDOKI: A Multimodal Model Purely Based on Discrete Flow Matching Really nice work. Uses embedding distances to define corruption process, and a single unified bidirectional Transformer + Discrete Flow model for both image and text generation. No special mask tokens involved!

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

EXP-Bench: Can AI Conduct AI Research Experiments? "EXP-Bench challenges AI agents to formulate hypotheses, design and implement experimental procedures, execute them, and analyze results." "EXP-Bench curated 461 AI research tasks from 51 top-tier AI research papers."

EXP-Bench: Can AI Conduct AI Research Experiments?

"EXP-Bench challenges AI agents to formulate hypotheses, design and  implement experimental procedures, execute them, and analyze results."

"EXP-Bench curated 461 AI research tasks from 51 top-tier AI research papers."
Ling Yang (@lingyang_pku) 's Twitter Profile Photo

We’re excited to release MMaDA-8B-MixCoT publicly — a model with strong instruction-following ability and highly stable, complex CoT reasoning/generation performance. 💻 Code & details: github.com/Gen-Verse/MMaDA 📦 Model weights: huggingface.co/Gen-Verse/MMaD…

merve (@mervenoyann) 's Twitter Profile Photo

ColQwen2 just landed to Hugging Face transformers main 😍 use state-of-the-art visual document retrieval model ColQwen2 for your PDF retrieval or RAG pipelines 🎉 link to notebook and model on the next one ⤵️

ColQwen2 just landed to <a href="/huggingface/">Hugging Face</a> transformers main 😍 
use state-of-the-art visual document retrieval model ColQwen2 for your PDF retrieval or RAG pipelines 🎉

link to notebook and model on the next one ⤵️
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Accelerating Diffusion LLMs via Adaptive Parallel Decoding "We therefore introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel." "Notably, Dream with ADP surpasses the speed of autoregressive Qwen 7B and

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

"We therefore introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel."

"Notably, Dream with ADP surpasses the speed of autoregressive Qwen 7B and