Marco Guerini (@m_guerini) 's Twitter Profile
Marco Guerini

@m_guerini

Head of @LanD_FBK research group at @FBK_research | NLP for Social Good.

ID: 1284417614

linkhttp://www.marcoguerini.eu calendar_today20-03-2013 22:19:23

3,3K Tweet

1,1K Followers

987 Following

Blanca C-F (@blanca_c_fi) 's Twitter Profile Photo

I am happy to share that our paper "๐“๐ซ๐ฎ๐ญ๐ก ๐Š๐ง๐จ๐ฐ๐ฌ ๐๐จ ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž: ๐„๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐ง๐  ๐“๐ซ๐ฎ๐ญ๐ก๐Ÿ๐ฎ๐ฅ๐ง๐ž๐ฌ๐ฌ ๐๐ž๐ฒ๐จ๐ง๐ ๐„๐ง๐ ๐ฅ๐ข๐ฌ๐ก" has been accepted in the #ACL2025 main conference. arxiv.org/abs/2502.09387โ€ฆ

EleutherAI (@aieleuther) 's Twitter Profile Photo

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

Can you train a performant language models without using unlicensed text?

We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2
Ruben Hassid (@rubenhssd) 's Twitter Profile Photo

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

They just memorize patterns really well.

Here's what Apple discovered:

(hint: we're not as close to AGI as the hype suggests)
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

"Bad" data might be the secret sauce for "good" AI models. To make LLMs behave, let them first see how not to behave. This paper challenges the common wisdom in LLM pre-training that cleaner data always means better models, particularly concerning toxicity. Including toxic

"Bad" data might be the secret sauce for "good" AI models.

To make LLMs behave, let them first see how not to behave.

This paper challenges the common wisdom in LLM pre-training that cleaner data always means better models, particularly concerning toxicity.

Including toxic
kalomaze (@kalomaze) 's Twitter Profile Photo

simple "LLM as a judge" protip if you prompt for something like "provide answers to the TRUE/FALSE rubric questions in order, followed by a one sentence justification" this will be worse than the justification coming *before* the TRUE/FALSE marker

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

LLMs often struggle to balance memorizing training data with generalizing to new rules. This paper trains Transformer models of different sizes on simple tasks to show smaller models generalize but fail to memorize, while larger models memorize but fail to generalize, especially

LLMs often struggle to balance memorizing training data with generalizing to new rules.

This paper trains Transformer models of different sizes on simple tasks to show smaller models generalize but fail to memorize, while larger models memorize but fail to generalize, especially
Gary Marcus (@garymarcus) 's Twitter Profile Photo

Terence Tao reporting on the same problem we have seen endlessly in other domains: LLMโ€™s produce output that โ€œlooksโ€ correct but is often deeply wrong and even stupid on careful inspection. I know of no domain in which this is NOT the case. And yet people seem surprised over

Andriy Burkov (@burkov) 's Twitter Profile Photo

Several folks reached out to me about this Apple paper on reasoning models. I have been saying this for the last 2.5 years: language models don't think, and they don't solve problems. They print sequences of words that make a *human reader believe* that there's a thinking or

Bao Pham (@baophamhq) 's Twitter Profile Photo

Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns?ย  Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number

Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns?ย  Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number
Kyle Corbitt (@corbtt) 's Twitter Profile Photo

GRPO quirk that contradicted my intuition: If you train on a group with rewards [0, 0, 0, 1] And then you train on another group with rewards [0.99, 0.99, 0.99, 1] Because of how GRPO normalizes within groups, the last trajectory will be equally reinforced in both cases!

CSHAM Tutorial (ACL2025) (@csham_tutorial) 's Twitter Profile Photo

Let's kick off our tutorial presentation by introducing our main sponsor: Hatedemics! This European project leverages AI to combat hate speech and misinformation. We're very grateful for the support! We can't wait to see you all in Vienna! #ACL2025NLP hatedemics.eu

Let's kick off our tutorial presentation by introducing our main sponsor: Hatedemics!

This European project leverages AI to combat hate speech and misinformation. We're very grateful for the support!

We can't wait to see you all in Vienna! #ACL2025NLP
hatedemics.eu
Avi Chawla (@_avichawla) 's Twitter Profile Photo

Finally! A RAG over code solution that actually works (open-source). Naive chunking used in RAG isn't suited for code. This is because codebases have long-range dependencies, cross-file references, etc., that independent text chunks just can't capture. Graph-Code is a

Gary Marcus (@garymarcus) 's Twitter Profile Photo

For reasons unknown to me, the AI Safety community has put almost all of its eggs into scaling + system prompts + RL. Judging by how many problems we are seeing now (see below) with models built per that formula, shouldnโ€™t we be desperately trying to find alternatives?

elvis (@omarsar0) 's Twitter Profile Photo

This paper is impressive! It introduces a clever way of keeping memory use constant regardless of task length. Great use of RL for AI agents to efficiently use memory and reasoning. Here are my full notes:

This paper is impressive!

It introduces a clever way of keeping memory use constant regardless of task length.

Great use of RL for AI agents to efficiently use memory and reasoning.

Here are my full notes:
LanguageCrawler (@languagecrawler) 's Twitter Profile Photo

AI using AI as a source of information has made Google search full of errors and therefore useless. Google eliminated the most useful tool (especially for language workers) which was the number of hits for a phrase or a group of words. Instead, the AI search engine just makes up

jack morris (@jxmnop) 's Twitter Profile Photo

i just reviewed five papers for NeurIPS and it was an awful experience: - first paper was clearly LLM-generated. it was too short, the references didn't work, had no experiments or theory at all, and a ton of obvious mistakes. the more i read the less it made sense - two were

David Hall (@dlwh) 's Twitter Profile Photo

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )
Marco Guerini (@m_guerini) 's Twitter Profile Photo

This: โ€œGoogle eliminated the most useful tool (especially for language workers) which was the number of hits for a phrase or a group of words.โ€ But also the rest of it.

Marco Guerini (@m_guerini) 's Twitter Profile Photo

I have been in Israel often in the past, especially many years ago for part of my PhD (Haifa). I had exactly this feeling, even as a stranger. You have to be there to understand. A melting pot of all walks of life and religions.

Gary Marcus (@garymarcus) 's Twitter Profile Photo

BREAKING: Explosive new paper from MIT/Harvard/UChicago. Things just got worse โ€” a lot worse โ€” for LLMโ€™s and the myth that they can understand and reason. The paper documents a pattern they called Potemkins, a kind of reasoning inconsistency (see figure below). They show that

BREAKING: Explosive new paper from MIT/Harvard/UChicago.

Things just got worse โ€” a lot worse โ€” for LLMโ€™s and the myth that they can understand and reason.

The paper documents a pattern they called Potemkins, a kind of reasoning inconsistency (see figure below). They show that