Tessa Barton (@tessybarton) 's Twitter Profile
Tessa Barton

@tessybarton

Inventor of GPU purse. AI Research Scientist. Prev: @MosaicML x @Databricks, @NYTimes.

ID: 384239260

linkhttp://gpupurse.com calendar_today03-10-2011 10:05:22

338 Tweet

3,3K Followers

1,1K Following

Tessa Barton (@tessybarton) 's Twitter Profile Photo

It warms my heart to see the underappreciated, compassionate work FarmKind is doing for farm animals. I grew up on a farm and I have a soft spot for the animals who feed us.

It warms my heart to see the underappreciated, compassionate work <a href="/farmkind_giving/">FarmKind</a> is doing for farm animals. I grew up on a farm and I have a soft spot for the animals who feed us.
Ying Sheng (@ying11231) 's Twitter Profile Photo

Deterministic inference, here you are. True on-policy RL is on the way. Although we are mostly using off-policy, having a deterministic mode will make many things easier!

Cody Blakeney (@code_star) 's Twitter Profile Photo

In all seriousness its really cool to see the gauntlet become a standard in evaluating base models. Tessa Barton Jeremy Dohmann Mansheej Paul Abhi Venigalla and I worked really hard thinking carefully about how to design aggregations that gave meaningful signal across model scales

Leo Gao (@nabla_theta) 's Twitter Profile Photo

Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well. openai.com/index/understa…

Tristan Hume (@trishume) 's Twitter Profile Photo

Every time we train a great new model I need to frantically try to write a new take home that the model can’t defeat so we can still hire post-release. This one was tough, many drafts based on real problems fell before Claude Code’s “ultrathink” and needed to be scrapped.

Jack Lindsey (@jack_w_lindsey) 's Twitter Profile Photo

Looking at the model’s internal feature activations, we noticed two things. (1) The model appeared to be internally aware that it was “holding back its true thoughts” and providing a fake summary. (2) The model seemed to interpret the results as a prompt injection attack. (3/7)

Looking at the model’s internal feature activations, we noticed two things. (1) The model appeared to be internally aware that it was “holding back its true thoughts” and providing a fake summary. (2) The model seemed to interpret the results as a prompt injection attack. (3/7)
Brian Huang ✈️ ICLR (@brianryhuang) 's Twitter Profile Photo

Astasia Myers IMO he was moreso talking about the marginal returns on increasing compute and how to allocate compute. (napkin math illustration bear with me) scaling pretraining and scaling RL is not over but the gains from scaling aren't addressing fundamental failures in models (he mainly

MBZUAI (@mbzuai) 's Twitter Profile Photo

Today, we are releasing a new version of K2 (K2-V2), a 360-open LLM built from scratch as a superior base for reasoning adaptation, while still excelling at core LLM capabilities like conversation, knowledge retrieval, and long-context understanding. K2 fills a major gap: highly

Today, we are releasing a new version of K2 (K2-V2), a 360-open LLM built from scratch as a superior base for reasoning adaptation, while still excelling at core LLM capabilities like conversation, knowledge retrieval, and long-context understanding.

K2 fills a major gap: highly
Chelsea Finn (@chelseabfinn) 's Twitter Profile Photo

I'm giving two talks at NeurIPS tomorrow! - iterative improvement of generative models, incl π0.6* (9:40 am, SPIGM workshop, with Yoonho Lee @NeurIPS) - long-horizon memory & autonomy (10:30 am, EWM workshop)

Nathan Lambert (@natolambert) 's Twitter Profile Photo

Good researchers obsess over evals The story of Olmo 3 (post-training), told through evals NeurIPS Talk tomorrow. Upper Level Room 2, 10:35AM.

Good researchers obsess over evals
The story of Olmo 3 (post-training), told through evals
NeurIPS Talk tomorrow.
Upper Level Room 2, 10:35AM.