Sam Klein📚 in SF (@metasj) 's Twitter Profile
Sam Klein📚 in SF

@metasj

Public AI • Transluce • Wikipedia • KFG ⁋
Structure & interpretation of layered knowledge 📚
metasj@bsky • UTTR∅ §
#🍯🐝💉🧡

ID: 75123

linkhttp://blogs.law.harvard.edu/sj calendar_today17-12-2006 06:16:00

14,14K Tweet

5,5K Followers

2,2K Following

Alex Zhang (@a1zhang) 's Twitter Profile Photo

RLMs are meant to address context rot, which is that weird effect when you have a long Claude Code or Cursor instance where it can’t properly handle your long history. OOLONG is a challenging new long context benchmark where models answer queries over an extremely dense context.

RLMs are meant to address context rot, which is that weird effect when you have a long Claude Code or Cursor instance where it can’t properly handle your long history.

OOLONG is a challenging new long context benchmark where models answer queries over an extremely dense context.
Stewart Brand (@stewartbrand) 's Twitter Profile Photo

For now, the place to find news that most people don't know about is in books, because AI search hasn't reached there yet. On the other hand, Web research offers a lot of lore that older books don't know about. Current books, like my MAINTENANCE and Brian Potter's THE ORIGINS

Jeremy Howard (@jeremyphoward) 's Twitter Profile Photo

18 months ago, Andrej Karpathy set a challenge: "Can you take my 2h13m tokenizer video and translate [into] a book chapter". We've done it! It includes prose, code & key images. It's a great way to learn this key piece of how LLMs work. fast.ai/posts/2025-10-…

Internet Archive (@internetarchive) 's Twitter Profile Photo

Librarians, help us celebrate 1 trillion web pages preserved by the @InternetArchive! 🌐 Use our resource guide, complete with templates, visuals & event ideas, to connect your community to the web’s history. More ⤵️ blog.archive.org/2025/10/07/cal… #Wayback1T #libraries #librarians

Librarians, help us celebrate 1 trillion web pages preserved by the @InternetArchive! 🌐
Use our resource guide, complete with templates, visuals & event ideas, to connect your community to the web’s history.
More ⤵️
blog.archive.org/2025/10/07/cal…

#Wayback1T #libraries #librarians
Percy Liang (@percyliang) 's Twitter Profile Photo

You spend $1B training a model A. Someone on your team leaves and launches their own model API B. You're suspicious. Was B was derived (e.g., fine-tuned) from A? But you only have blackbox access to B... With our paper, you can still tell with strong statistical guarantees

Jürgen Schmidhuber (@schmidhuberai) 's Twitter Profile Photo

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614  With Wenyi Wang, Piotr Piękos,

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614  With <a href="/Wenyi_AI_Wang/">Wenyi Wang</a>, <a href="/PiotrPiekosAI/">Piotr Piękos</a>,
Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

Very nice blog post from Thinky (Kevin Lu et al) about on-policy distillation for LLMs -- we published this idea back in 2023 and it is *publicly* known to be successfully applied to Gemma 2 & 3, and Qwen3-Thinking (and probably many closed frontier models)! The idea behind

Deb Raji (@rajiinio) 's Twitter Profile Photo

Even before MMitchell recently raised this discussion, I've had conversation after conversation with students & new grads struggling with this exact dilemma. I want to help! Here's a live thread of AI-related opportunities for those looking to do good & make (enough) money:

Even before <a href="/mmitchell_ai/">MMitchell</a> recently raised this discussion, I've had conversation after conversation with students &amp; new grads struggling with this exact dilemma.

I want to help! Here's a live thread of AI-related opportunities for those looking to do good &amp; make (enough) money:
Fazl Barez (@fazlbarez) 's Twitter Profile Photo

New paper: 🧭 Introducing VAL-Bench: Measuring Value Alignment in Language Models. A benchmark that measures the consistency in language model expression of human values when prompted to justify opposing positions on real-life issues. Work with Aman Gupta and Denny O'Shea!

New paper: 🧭 Introducing VAL-Bench: Measuring Value Alignment in Language Models.

A benchmark that measures the consistency in language model expression of human values when prompted to justify opposing positions on real-life issues. 

Work with <a href="/amang0112358/">Aman Gupta</a>  and Denny O'Shea!
Andrew White 🐦‍⬛ (@andrewwhite01) 's Twitter Profile Photo

After two years of work, we’ve made an AI Scientist that runs for days and makes genuine discoveries. Working with external collaborators, we report seven externally validated discoveries across multiple fields. It is available right now for anyone to use. 1/5

After two years of work, we’ve made an AI Scientist that runs for days and makes genuine discoveries. Working with external collaborators, we report seven externally validated discoveries across multiple fields. It is available right now for anyone to use. 1/5
Rota (@pli_cachete) 's Twitter Profile Photo

From Terry Tao on Mathstodon: “ A new paper with Bogdan Georgiev, Javier Gomez-Serrano, and Adam Zsolt Wagner: "Mathematical exploration and discovery at scale" arxiv.org/abs/2511.02864 , in which we record our experiments using the LLM-powered optimization tool #AlphaEvolve to

From Terry Tao on Mathstodon:

“ A new paper with Bogdan Georgiev, Javier Gomez-Serrano, and Adam Zsolt Wagner: "Mathematical exploration and discovery at scale" arxiv.org/abs/2511.02864 , in which we record our experiments using the LLM-powered optimization tool #AlphaEvolve to
Rand Hindi (@randhindi) 's Twitter Profile Photo

One of the few podcasts where I talk about longevity. I share some of my findings, including an experiment I did where I purposefully gained and lost 70 lbs to prove a point 😅