Nando de Freitas (@nandodf) Twitter Tweets • TwiCopy

Nando de Freitas

@nandodf

+ Follow

VP microsoft.ai understanding & harnessing intelligence responsibly. Past: NPI, AlphaGo tuning, Gato, ReST, AlphaCode, Lyria, Imagen 3, Veo, r-Gemma, Genie ...

ID: 29843511

linkhttps://scholar.google.com/citations?user=nzEluBwAAAAJ&hl=en calendar_today08-04-2009 22:41:09

11,11K Tweet

102,102K Followers

751 Following

Sander Dieleman

@sedielem

6 months ago

I'm always quite skeptical of work that addresses a long-standing problem with a relatively simple tweak, but this looks promising: wrap the softmax numerator in ReLU(x - 1), and the denom terms in abs(x - 1) to get rid of attention sinks. Would be nice if it holds up at scale!

thumb_up_off_alt256

chat_bubble_outline3

repeat23

shareShare

机器之心 JIQIZHIXIN

@synced_global

5 months ago

🔥 State-Space Meets Diffusion: A New Era for Video World Models This paper tackles a core bottleneck in video-based world modeling: long-term memory. While video diffusion models are great at short-term frame prediction, their memory fades fast—especially when modeling long

thumb_up_off_alt425

chat_bubble_outline2

repeat63

shareShare

Alex Vacca

@itsalexvacca

5 months ago

Anthropic's CEO claims AI hallucinates less than humans. Bold statement. So I decided to test it by feeding the same FAKE theories to ChatGPT, Claude, and Gemini to see which one calls me out first. The results shocked me 🧵

thumb_up_off_alt7,7K

chat_bubble_outline294

repeat936

shareShare

Mistral AI

@mistralai

5 months ago

Introducing Codestral Embed, the new state-of-the-art embedding model for code.

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat148

shareShare

Mustafa Suleyman

@mustafasuleyman

5 months ago

Good models admit when they don't know. Great models ask for help figuring it out. The ability to say "I got this far but need xyz to finish" or "I'm stuck at xyz" isn't just better than a refusal (or bad answer), it's the best way to earn user trust.

thumb_up_off_alt162

chat_bubble_outline1

repeat13

shareShare

Kling AI

@kling_ai

5 months ago

🚀 Big News, Creators! The KLING 2.1 Lineup Just Dropped! Introducing the KLING 2.1 Lineup: - KLING 2.1: ✨ Now in Standard (720p) and Professional (1080p) modes 💸 20 Credits/5s (Standard), or 35 Credits/5s (Pro) - KLING 2.1 Master: ⚡ Superb dynamics & prompt adherence 📺 Now

thumb_up_off_alt1,1K

chat_bubble_outline134

repeat208

shareShare

DeepSeek

@deepseek_ai

5 months ago

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

thumb_up_off_alt9,9K

chat_bubble_outline386

repeat1,1K

shareShare

Artificial Analysis

@artificialanlys

5 months ago

DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently

thumb_up_off_alt2,2K

chat_bubble_outline63

repeat480

shareShare

Nando de Freitas

@nandodf

5 months ago

x.com/i/article/1928…

thumb_up_off_alt34

chat_bubble_outline0

repeat7

shareShare

Matei Zaharia

@matei_zaharia

5 months ago

Apache Spark 4.0 is out with some huge improvements across the board. SQL’s much more powerful, Spark Connect makes it easier to run apps, new languages and more. It’s amazing to see the community still growing fast and releasing over 5000 patches in 4.0. databricks.com/blog/introduci…

thumb_up_off_alt150

chat_bubble_outline4

repeat30

shareShare

Karim Beguir

@kbeguir

5 months ago

So proud of the InstaDeep dream team open-sourcing mlip, our ML for interatomic potentials framework! ⚛️✨

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

hardmaru

@hardmaru

5 months ago

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code

thumb_up_off_alt992

chat_bubble_outline31

repeat187

shareShare

Nando de Freitas

@nandodf

5 months ago

x.com/i/article/1928…

thumb_up_off_alt169

chat_bubble_outline1

repeat16

shareShare

Ricky T. Q. Chen

@rickytqchen

5 months ago

FUDOKI: A Multimodal Model Purely Based on Discrete Flow Matching Really nice work. Uses embedding distances to define corruption process, and a single unified bidirectional Transformer + Discrete Flow model for both image and text generation. No special mask tokens involved!

thumb_up_off_alt220

chat_bubble_outline1

repeat29

shareShare

Nando de Freitas

@nandodf

5 months ago

x.com/i/article/1929…

thumb_up_off_alt127

chat_bubble_outline1

repeat16

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

5 months ago

EXP-Bench: Can AI Conduct AI Research Experiments? "EXP-Bench challenges AI agents to formulate hypotheses, design and implement experimental procedures, execute them, and analyze results." "EXP-Bench curated 461 AI research tasks from 51 top-tier AI research papers."

thumb_up_off_alt127

chat_bubble_outline4

repeat19

shareShare

Ling Yang

@lingyang_pku

5 months ago

We’re excited to release MMaDA-8B-MixCoT publicly — a model with strong instruction-following ability and highly stable, complex CoT reasoning/generation performance. 💻 Code & details: github.com/Gen-Verse/MMaDA 📦 Model weights: huggingface.co/Gen-Verse/MMaD…

thumb_up_off_alt160

chat_bubble_outline1

repeat36

shareShare

Ben Anson

@benaibean

5 months ago

Is it possible to _derive_ an attention scheme with effective zero-shot generalisation? The answer turns out to be yes! To achieve this, we began by thinking about desirable properties for attention over long contexts, and we distilled 2 key conditions:

thumb_up_off_alt407

chat_bubble_outline6

repeat42

shareShare

merve

@mervenoyann

5 months ago

ColQwen2 just landed to Hugging Face transformers main 😍 use state-of-the-art visual document retrieval model ColQwen2 for your PDF retrieval or RAG pipelines 🎉 link to notebook and model on the next one ⤵️

ColQwen2 just landed to <a href="/huggingface/">Hugging Face</a> transformers main 😍
use state-of-the-art visual document retrieval model ColQwen2 for your PDF retrieval or RAG pipelines 🎉

link to notebook and model on the next one ⤵️

thumb_up_off_alt350

chat_bubble_outline6

repeat64

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

5 months ago

Accelerating Diffusion LLMs via Adaptive Parallel Decoding "We therefore introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel." "Notably, Dream with ADP surpasses the speed of autoregressive Qwen 7B and

thumb_up_off_alt140

chat_bubble_outline6

repeat17

shareShare