Avi Caciularu (@clu_avi) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure based on LLMs and Visual-QA (VQA), and show NONE of the 12 models we experiment with are diverse. 🧵 1/11

thumb_up_off_alt97

chat_bubble_outline5

repeat32

shareShare

Sasha Goldshtein

@goldshtn

9 months ago

I am hiring a Senior SWE to work on Gemini post-training, improving Gemini factuality. Factuality is a top blocker for LLM adoption and a critical priority for Gemini. Prior experience with LLM training and evaluation is a major advantage. Apply here: google.com/about/careers/…

thumb_up_off_alt33

chat_bubble_outline1

repeat4

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

9 months ago

Massive News from Chatbot Arena🔥 Google DeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past week, now ranks joint #1 overall with an impressive 40+ score leap — matching 4o-latest in and surpassing o1-preview! It also claims #1 on Vision

Massive News from Chatbot Arena🔥

<a href="/GoogleDeepMind/">Google DeepMind</a>'s latest Gemini (Exp 1114), tested with 6K+ community votes over the past week, now ranks joint #1 overall with an impressive 40+ score leap — matching 4o-latest in and surpassing o1-preview! It also claims #1 on Vision

thumb_up_off_alt1,1K

chat_bubble_outline59

repeat307

shareShare

AK

@_akhaliq

8 months ago

Google just released gemini-exp-1121 - significant gains on coding performance - stronger reasoning capabilities - improved visual understanding Now available on Anychat

thumb_up_off_alt128

chat_bubble_outline1

repeat26

shareShare

Jeff Dean

@jeffdean

8 months ago

What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard prompts, coding, math, instruction following, and more, including with style control on. Thanks to the hard work of everyone in the Gemini team and

thumb_up_off_alt1,1K

chat_bubble_outline90

repeat314

shareShare

Yonatan Bitton

@yonatanbitton

8 months ago

🚨 Happening NOW at #NeurIPS2024 with nitzan guetta ! 🎭 #VisualRiddles: A Commonsense and World Knowledge Challenge for Vision-Language Models. 📍 East Ballroom C, Creative AI Track 🔍 visual-riddles.github.io

🚨 Happening NOW at #NeurIPS2024 with <a href="/nitzanguetta/">nitzan guetta</a> !
🎭 #VisualRiddles: A Commonsense and World Knowledge Challenge for Vision-Language Models.
📍 East Ballroom C, Creative AI Track
🔍 visual-riddles.github.io

thumb_up_off_alt50

chat_bubble_outline3

repeat8

shareShare

Avi Caciularu

@clu_avi

8 months ago

🥳

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Sasha Goldshtein

@goldshtn

7 months ago

Today we published FACTS Grounding, a benchmark and leaderboard for evaluating the factuality of LLMs when grounding to the input context. The leaderboard is on Kaggle and we plan to maintain it and track progress. deepmind.google/discover/blog/… kaggle.com/facts-leaderbo…

thumb_up_off_alt26

chat_bubble_outline1

repeat8

shareShare

Avi Caciularu

@clu_avi

7 months ago

🤔🤔🤔

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Mor Geva

@megamor2

7 months ago

How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New

thumb_up_off_alt112

chat_bubble_outline6

repeat27

shareShare

Ori Yoran

@oriyoran

4 months ago

New #ICLR2024 paper! The KoLMogorov Test: can CodeLMs compress data by code generation? The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods! 🧵1/7

thumb_up_off_alt292

chat_bubble_outline8

repeat47

shareShare

omer goldman

@omernlp

4 months ago

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

thumb_up_off_alt33

chat_bubble_outline1

repeat14

shareShare

AK

@_akhaliq

3 months ago

RefVNLI Towards Scalable Evaluation of Subject-driven Text-to-image Generation

thumb_up_off_alt135

chat_bubble_outline1

repeat52

shareShare

Gabrielle Kaili-May Liu

@pybeebee

2 months ago

🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥 How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"? Check out our new preprint to find out! Details in 🧵(1/n):

thumb_up_off_alt10

chat_bubble_outline1

repeat4

shareShare

Eran Hirsch

@hirscheran

2 months ago

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2

thumb_up_off_alt72

chat_bubble_outline3

repeat26

shareShare

Arie Cattan

@ariecattan

2 months ago

🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔 We're excited to introduce our paper: “DRAGged into CONFLICTS: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs”🚀 A thread 🧵👇

thumb_up_off_alt30

chat_bubble_outline2

repeat14

shareShare

Sundar Pichai

@sundarpichai

a month ago

Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. 🔦 Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the

thumb_up_off_alt4,4K

chat_bubble_outline170

repeat445

shareShare

Arman Cohan

@armancohan

a month ago

Excited for the release of SciArena with Ai2! LLMs are now an integral part of research workflows, and SciArena helps measure progress on scientific literature tasks. Also checkout the preprint for a lot more results/analyses. Led by: Yilun Zhao, Kaiyan Zhang 📄 paper:

thumb_up_off_alt79

chat_bubble_outline1

repeat10

shareShare

Nathan Lambert

@natolambert

a month ago

This new benchmark created by Valentina Pyatkin should be the new default replacing IFEval. Some of the best frontier models get <50% and it comes with separate training prompts so people don’t effectively train on test. Wild gap from o3 to Gemini 2.5 pro of like 30 points.

thumb_up_off_alt198

chat_bubble_outline10

repeat22

shareShare

Gabrielle Kaili-May Liu

@pybeebee

6 days ago

I will be presenting our work 𝗠𝗗𝗖𝘂𝗿𝗲 at #ACL2025NLP in Vienna this week! 🇦🇹 Come by if you’re interested in multi-doc reasoning and/or scalable creation of high-quality post-training data! 📍 Poster Session 4 @ Hall 4/5 🗓️ Wed, July 30 | 11-12:30 🔗 aclanthology.org/2025.acl-long.…

thumb_up_off_alt26

chat_bubble_outline0

repeat4

shareShare

Avi Caciularu

Gate.io

Royi Rassin

Sasha Goldshtein

lmarena.ai (formerly lmsys.org)

AK

Jeff Dean

Yonatan Bitton

Avi Caciularu

Sasha Goldshtein

Avi Caciularu

Mor Geva

Ori Yoran

omer goldman

AK

Gabrielle Kaili-May Liu

Eran Hirsch

Arie Cattan

Sundar Pichai

Arman Cohan

Nathan Lambert

Gabrielle Kaili-May Liu