Daniel Cer (@daniel_m_cer) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard prompts, coding, math, instruction following, and more, including with style control on. Thanks to the hard work of everyone in the Gemini team and

thumb_up_off_alt1,1K

chat_bubble_outline90

repeat314

shareShare

Logan Kilpatrick

@officiallogank

4 months ago

Today we are rolling out an experimental Gemini Embedding model for developers with: – SOTA performance on MTEB (Multilingual) - Input context length of (3K --> 8K) tokens – Output 3K dimensions – Support for over (50 --> 100) languages More details in 🧵

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat137

shareShare

Logan Kilpatrick

@officiallogank

4 months ago

The new model, gemini-embedding-exp-03-07, is our most capable text embedding model yet, surpassing our previous embedding model – text-embedding-004. It’s top ranked on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard!

thumb_up_off_alt266

chat_bubble_outline9

repeat5

shareShare

Logan Kilpatrick

@officiallogank

4 months ago

You can access the new experimental Gemini Embeddings model through the Gemini API right now, we plan to follow up with a production ready version in the months to come: developers.googleblog.com/en/gemini-embe…

thumb_up_off_alt170

chat_bubble_outline7

repeat10

shareShare

AK

@_akhaliq

4 months ago

Gemini Embedding Generalizable Embeddings from Gemini

thumb_up_off_alt535

chat_bubble_outline8

repeat95

shareShare

Jinhyuk Lee

@leejnhk

4 months ago

🎉 Gemini Embedding is LIVE! 🎉 Try our state-of-the-art text embedding model for FREE on Vertex AI (text-embedding-large-exp-03-07; 120 QPM) & AI Studio (gemini-embedding-exp-03-07)! ➡️ APIs: bit.ly/gem-embed-vert…, bit.ly/gem-embed-aist… ➡️ Report: bit.ly/gem-embed-paper

thumb_up_off_alt36

chat_bubble_outline2

repeat8

shareShare

Nandan Thakur

@beirmug

3 months ago

Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern Databricks 🧱

thumb_up_off_alt189

chat_bubble_outline11

repeat34

shareShare

Omar Khattab

@lateinteraction

2 months ago

Google folks continues to do awesome late interaction work. Compared to vanilla ColBERT, a version of this new “CRISP achieves an 11x reduction in the number of vectors—with only a 3.6% quality loss”.

thumb_up_off_alt154

chat_bubble_outline3

repeat14

shareShare

Daniel Cer

@daniel_m_cer

2 months ago

CRISP: Clustering Multi-Vector Representations for Denoising and Pruning arxiv.org/abs/2505.11471 Multi-vector embeddings are better than single-vector on search/retrieval tasks but also have prohibitively more costly representations (e.g., ColBERT’s one embedding per token).

thumb_up_off_alt125

chat_bubble_outline8

repeat17

shareShare

Nandan Thakur

@beirmug

2 months ago

Did you know that fine-tuning retrievers & re-rankers on large but unclean training datasets can harm their performance? 😡 In our new preprint, we re-examine popular IR training data quality by pruning datasets and identifying and relabeling 𝐟𝐚𝐥𝐬𝐞-𝐧𝐞𝐠𝐚𝐭𝐢𝐯𝐞𝐬! 🏷️

thumb_up_off_alt82

chat_bubble_outline2

repeat19

shareShare

Younggyo Seo

@younggyoseo

2 months ago

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

thumb_up_off_alt517

chat_bubble_outline14

repeat107

shareShare

David Bau

@davidbau

2 months ago

Dear MAGA friends, I have been worrying about STEM in the US a lot, because right now the Senate is writing new laws that cut 75% of the STEM budget in the US. Sorry for the long post, but the issue is really important, and I want to share what I know about it. The entire

thumb_up_off_alt466

chat_bubble_outline23

repeat74

shareShare

Sumit

@_reachsumit

2 months ago

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning Introduces RL to teach LLMs adaptive search intensity scaling, i.e. dynamically adjusting retrieval depth based on problem complexity. 📝arxiv.org/abs/2505.23794 👨🏽‍💻github.com/Yuan-Li-FNLP/R…

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Manuel Faysse

@manuelfaysse

a month ago

🚨 Context matters for effective retrieval—but most embedding models cannot leverage crucial information outside of the passage they embed. Our new paper "Context Is Gold to Find the Gold Passage" explores how context-aware embeddings can be trained to boost performance! 🧵(1/N)

thumb_up_off_alt168

chat_bubble_outline7

repeat29

shareShare

Tony Wu

@tonywu_71

a month ago

🚀 ColQwen2 just dropped in Transformers! 🤗 Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)

thumb_up_off_alt577

chat_bubble_outline7

repeat95

shareShare

Raphaël Sourty

@raphaelsrty

a month ago

I'm thrilled to announce the release of FastPlaid ! 🚀🚀 FastPlaid is a high-performance engine for multi-vector search, built from the ground up in Rust (with the help of Torch C++)⚡️ You can view FastPlaid as the counterpart of Faiss for multi vectors.

thumb_up_off_alt238

chat_bubble_outline9

repeat35

shareShare

Benjamin Clavié

@bclavie

a month ago

Multimodal RAG: Just use ColPali/DSE then pass your screenshots to the LLM This is the dream, but how well do LLMs read text contained in images? We wanted to know, so we tried a simple thing: do results change on evals when using screenshots rather than text as input? Yes.

thumb_up_off_alt438

chat_bubble_outline16

repeat81

shareShare

David Wan

@meetdavidwan

a month ago

Excited to share our new work, CLaMR! 🚀 We tackle multimodal content retrieval by jointly considering video, speech, OCR, and metadata. CLaMR learns to dynamically pick the right modality for your query, boosting retrieval by 25 nDCG@10 over single modality retrieval! 🧐

thumb_up_off_alt183

chat_bubble_outline1

repeat61

shareShare

Sumit

@_reachsumit

a month ago

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval Introduces a bi-encoder approach that performs fine-grained token-wise interaction at both spatial and temporal levels using modified MaxSim operations and dual sigmoid loss. 📝arxiv.org/abs/2503.19009

thumb_up_off_alt24

chat_bubble_outline0

repeat3

shareShare

Google Research

@googleresearch

22 days ago

Neural embedding models have become a cornerstone of modern information retrieval. Today we introduce MUVERA, a state-of-the-art retrieval algorithm that reduces complex multi-vector retrieval back to single-vector maximum inner product search. More →goo.gle/4k8YRlN

thumb_up_off_alt245

chat_bubble_outline4

repeat51

shareShare

Daniel Cer

Gate.io

Jeff Dean

Logan Kilpatrick

Logan Kilpatrick

Logan Kilpatrick

AK

Jinhyuk Lee

Nandan Thakur

Omar Khattab

Daniel Cer

Nandan Thakur

Younggyo Seo

David Bau

Sumit

Manuel Faysse

Tony Wu

Raphaël Sourty

Benjamin Clavié

David Wan

Sumit

Google Research