Yacine Jernite (@yjernite) Twitter Tweets • TwiCopy

Hynek Kydlíček

2 months ago

We are releasing 📄 FinePDFs: the largest PDF dataset spanning over half a billion documents! - Long context: Documents are 2x longer than web text - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora.

thumb_up_off_alt716

chat_bubble_outline24

repeat118

shareShare

elie

@eliebakouch

2 months ago

This kind of evals are very interesting, and I wish there were a lot more I'm not in the team that thinks a high score is actually good (because it goes against instruction following), but it's great for monitoring and understand the different flavor of post-trained models. It's

thumb_up_off_alt39

chat_bubble_outline2

repeat2

shareShare

Deb Raji

@rajiinio

2 months ago

This reply from Karen is spot on. AI discourse has effectively devolved into mindless chatter because of the anchoring to a shared mythology - everyone (boosters, doomers, even some critics) is endlessly debating & dissecting a version of the technology that doesn't even exist.

thumb_up_off_alt41

chat_bubble_outline1

repeat6

shareShare

Lucie-Aimée Kaffee

@frimelle

2 months ago

I’m thrilled that our INTIMA benchmark, developed to study how AI models handle companionship-like interactions, was featured in Forbes last week. Nothing is quite as amazing as seeing your work not only used but also picked up by journalists to reach a wider audience. The

thumb_up_off_alt5

chat_bubble_outline2

repeat1

shareShare

David Louapre

@dlouapre

2 months ago

🚀 Life update: I’ve joined 🤗Hugging Face as AI Scientist & Educator, starting a new track on **Mechanistic Interpretability of LLMs** 🧠🤖 Over the past 7 years at Ubisoft 🎮, I explored how AI, science & gameplay intersect. I worked on cutting-edge LLM-powered NPCs,

thumb_up_off_alt481

chat_bubble_outline32

repeat39

shareShare

MMitchell

@mmitchell_ai

2 months ago

🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With others Hugging Face , I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how:

thumb_up_off_alt22

chat_bubble_outline4

repeat9

shareShare

Andi Marafioti

@andimarafioti

2 months ago

SmolDocling just got a HUGE improvement, meet GraniteDocling!🚀 Improved performance in all the ways that matter: multilingual, more reliable, but still tiny at 258M params!🤏 It's lightning fast, process a page in 0.35 sec on a consumer GPU using < 500MB VRAM⚡

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Clémentine Fourrier 🍊

@clefourrier

2 months ago

Updated the evaluation guidebook with a new deep dive! 2025 panorama of all the important and next level evaluations that you need to know to build *actually impactful and useful* models! (Assistant tasks, games, forecasting, and more) Tell me wyt! :) github.com/huggingface/ev…

thumb_up_off_alt163

chat_bubble_outline3

repeat26

shareShare

Lewis Tunstall

@_lewtun

2 months ago

By far the most concise and informative guide on post-training evals I've seen in a long time - highly recommended reading!

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Stella Biderman

@blancheminerva

a month ago

Clémentine is continuing to do some of the most important work on evals in the world <3

thumb_up_off_alt29

chat_bubble_outline1

repeat2

shareShare

Kyunghyun Cho

@kchonyc

a month ago

when you give up on this nebulous idea and illusion of prestige, you will finally find peace and freedom. submit to TMLR and JMLR.

thumb_up_off_alt522

chat_bubble_outline10

repeat29

shareShare

Shayne Longpre

@shayneredford

a month ago

Check out The Washington Post's awesome audit of Sora by Nitasha Tiku! washingtonpost.com/technology/int… They cite the Data Provenance Initiative and quote our Joanna!

thumb_up_off_alt10

chat_bubble_outline1

repeat5

shareShare

Lucie-Aimée Kaffee

@frimelle

a month ago

Reuters just reported that Meta will soon use generative AI interactions to target ads across Facebook and Instagram. That’s exactly the kind of shift we explore in our blogpost: 👉 Advertisement, Privacy, and Intimacy: Lessons from Social Media for Conversational AI with

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Alexia Jolicoeur-Martineau

@jm_alexia

a month ago

New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/29/tin… Code: github.com/SamsungSAILMon… Paper: arxiv.org/abs/2510.04871

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat220

shareShare

Loubna Ben Allal

@loubnabenallal1

a month ago

Come say hi and get some copies of the SmolLM3 Blueprint! 🤗

thumb_up_off_alt124

chat_bubble_outline3

repeat8

shareShare

clem 🤗

@clementdelangue

a month ago

Very cool paper! You can discuss with the author here: huggingface.co/papers/2510.04…

thumb_up_off_alt228

chat_bubble_outline8

repeat32

shareShare

Brigitte 🤗

@brigittetousi

a month ago

The Hugging Face Science team is in Montreal for COLM 2025 🍁! Rumour has it, limited-edition tees are up for grabs for top Hub contributors, ranging from smolLM to XL. 😉

The <a href="/huggingface/">Hugging Face</a> Science team is in Montreal for COLM 2025 🍁! Rumour has it, limited-edition tees are up for grabs for top Hub contributors, ranging from smolLM to XL. 😉

thumb_up_off_alt38

chat_bubble_outline4

repeat6

shareShare

clem 🤗

@clementdelangue

23 days ago

So proud to see Reachy Mini named one of the Best Inventions of 2025 by TIME! Huge credit to the Pollen Robotics and Hugging Face teams, turning a concept into thousands of units sold and shipped in under 6 months. We might not be as slick as some other robotics companies (we

So proud to see Reachy Mini named one of the Best Inventions of 2025 by <a href="/TIME/">TIME</a>!

Huge credit to the <a href="/pollenrobotics/">Pollen Robotics</a> and <a href="/huggingface/">Hugging Face</a> teams, turning a concept into thousands of units sold and shipped in under 6 months.

We might not be as slick as some other robotics companies (we

thumb_up_off_alt252

chat_bubble_outline13

repeat40

shareShare

Giada Pistilli

@giadapistilli

12 days ago

I spoke with MIT Technology Review about one of the hardest design questions in conversational AI: should an AI ever be allowed to hang up on a human? In the piece by James O'Donnell, we discussed how cutting users off can be harmful when strong emotional bonds or dependencies have formed.

I spoke with <a href="/techreview/">MIT Technology Review</a> about one of the hardest design questions in conversational AI: should an AI ever be allowed to hang up on a human?

In the piece by <a href="/odonnell_jm/">James O'Donnell</a>, we discussed how cutting users off can be harmful when strong emotional bonds or dependencies have formed.

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

clem 🤗

@clementdelangue

12 days ago

We just released the beta version of the open-source software for Reachy Mini! It means that anyone, thanks to the amazing Google DeepMind mujoco simulation platform, can start building Hugging Face spaces, datasets and models, even if you haven't received your robot yet.

We just released the beta version of the open-source software for Reachy Mini!

It means that anyone, thanks to the amazing <a href="/GoogleDeepMind/">Google DeepMind</a> mujoco simulation platform, can start building <a href="/huggingface/">Hugging Face</a> spaces, datasets and models, even if you haven't received your robot yet.

thumb_up_off_alt490

chat_bubble_outline14

repeat65

shareShare