Marco Guerini (@m_guerini) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Blanca C-F

@blanca_c_fi

2 months ago

I am happy to share that our paper "𝐓𝐫𝐮𝐭𝐡 𝐊𝐧𝐨𝐰𝐬 𝐍𝐨 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞: 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐧𝐠 𝐓𝐫𝐮𝐭𝐡𝐟𝐮𝐥𝐧𝐞𝐬𝐬 𝐁𝐞𝐲𝐨𝐧𝐝 𝐄𝐧𝐠𝐥𝐢𝐬𝐡" has been accepted in the #ACL2025 main conference. arxiv.org/abs/2502.09387…

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

thumb_up_off_alt556

chat_bubble_outline10

repeat127

shareShare

Ruben Hassid

@rubenhssd

2 months ago

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)

thumb_up_off_alt12,12K

chat_bubble_outline750

repeat1,1K

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

"Bad" data might be the secret sauce for "good" AI models. To make LLMs behave, let them first see how not to behave. This paper challenges the common wisdom in LLM pre-training that cleaner data always means better models, particularly concerning toxicity. Including toxic

thumb_up_off_alt233

chat_bubble_outline9

repeat22

shareShare

kalomaze

@kalomaze

2 months ago

simple "LLM as a judge" protip if you prompt for something like "provide answers to the TRUE/FALSE rubric questions in order, followed by a one sentence justification" this will be worse than the justification coming *before* the TRUE/FALSE marker

thumb_up_off_alt374

chat_bubble_outline24

repeat7

shareShare

Rohan Paul

@rohanpaul_ai

a month ago

LLMs often struggle to balance memorizing training data with generalizing to new rules. This paper trains Transformer models of different sizes on simple tasks to show smaller models generalize but fail to memorize, while larger models memorize but fail to generalize, especially

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Gary Marcus

@garymarcus

a month ago

Terence Tao reporting on the same problem we have seen endlessly in other domains: LLM’s produce output that “looks” correct but is often deeply wrong and even stupid on careful inspection. I know of no domain in which this is NOT the case. And yet people seem surprised over

thumb_up_off_alt2,2K

chat_bubble_outline101

repeat301

shareShare

Andriy Burkov

@burkov

a month ago

Several folks reached out to me about this Apple paper on reasoning models. I have been saying this for the last 2.5 years: language models don't think, and they don't solve problems. They print sequences of words that make a *human reader believe* that there's a thinking or

thumb_up_off_alt891

chat_bubble_outline87

repeat131

shareShare

Bao Pham

@baophamhq

a month ago

Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns? Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number

thumb_up_off_alt454

chat_bubble_outline3

repeat75

shareShare

Kyle Corbitt

@corbtt

a month ago

GRPO quirk that contradicted my intuition: If you train on a group with rewards [0, 0, 0, 1] And then you train on another group with rewards [0.99, 0.99, 0.99, 1] Because of how GRPO normalizes within groups, the last trajectory will be equally reinforced in both cases!

thumb_up_off_alt529

chat_bubble_outline12

repeat21

shareShare

CSHAM Tutorial (ACL2025)

@csham_tutorial

a month ago

Let's kick off our tutorial presentation by introducing our main sponsor: Hatedemics! This European project leverages AI to combat hate speech and misinformation. We're very grateful for the support! We can't wait to see you all in Vienna! #ACL2025NLP hatedemics.eu

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare

Avi Chawla

@_avichawla

a month ago

Finally! A RAG over code solution that actually works (open-source). Naive chunking used in RAG isn't suited for code. This is because codebases have long-range dependencies, cross-file references, etc., that independent text chunks just can't capture. Graph-Code is a

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat272

shareShare

Gary Marcus

@garymarcus

a month ago

For reasons unknown to me, the AI Safety community has put almost all of its eggs into scaling + system prompts + RL. Judging by how many problems we are seeing now (see below) with models built per that formula, shouldn’t we be desperately trying to find alternatives?

thumb_up_off_alt143

chat_bubble_outline27

repeat17

shareShare

elvis

@omarsar0

a month ago

This paper is impressive! It introduces a clever way of keeping memory use constant regardless of task length. Great use of RL for AI agents to efficiently use memory and reasoning. Here are my full notes:

thumb_up_off_alt843

chat_bubble_outline9

repeat132

shareShare

LanguageCrawler

@languagecrawler

a month ago

AI using AI as a source of information has made Google search full of errors and therefore useless. Google eliminated the most useful tool (especially for language workers) which was the number of hits for a phrase or a group of words. Instead, the AI search engine just makes up

thumb_up_off_alt216

chat_bubble_outline7

repeat52

shareShare

jack morris

@jxmnop

a month ago

i just reviewed five papers for NeurIPS and it was an awful experience: - first paper was clearly LLM-generated. it was too short, the references didn't work, had no experiments or theory at all, and a ton of obvious mistakes. the more i read the less it made sense - two were

thumb_up_off_alt634

chat_bubble_outline37

repeat16

shareShare

David Hall

@dlwh

a month ago

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

thumb_up_off_alt968

chat_bubble_outline21

repeat94

shareShare

Marco Guerini

@m_guerini

a month ago

This: “Google eliminated the most useful tool (especially for language workers) which was the number of hits for a phrase or a group of words.” But also the rest of it.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Marco Guerini

@m_guerini

a month ago

I have been in Israel often in the past, especially many years ago for part of my PhD (Haifa). I had exactly this feeling, even as a stranger. You have to be there to understand. A melting pot of all walks of life and religions.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Gary Marcus

@garymarcus

a month ago

BREAKING: Explosive new paper from MIT/Harvard/UChicago. Things just got worse — a lot worse — for LLM’s and the myth that they can understand and reason. The paper documents a pattern they called Potemkins, a kind of reasoning inconsistency (see figure below). They show that

thumb_up_off_alt2,2K

chat_bubble_outline202

repeat435

shareShare

Marco Guerini

Gate.io

Blanca C-F

EleutherAI

Ruben Hassid

Rohan Paul

kalomaze

Rohan Paul

Gary Marcus

Andriy Burkov

Bao Pham

Kyle Corbitt

CSHAM Tutorial (ACL2025)

Avi Chawla

Gary Marcus

elvis

LanguageCrawler

jack morris

David Hall

Marco Guerini

Marco Guerini

Gary Marcus