Ken Liu (@kenziyuliu) 's Twitter Profile
Ken Liu

@kenziyuliu

PhD @StanfordAILab @StanfordNLP w/ @percyliang @sanmikoyejo. Prev @GoogleDeepMind, @SCSatCMU @Sydney_Uni 🇦🇺

ID: 820474984225062912

linkhttps://ai.stanford.edu/~kzliu calendar_today15-01-2017 03:37:34

385 Tweet

1,1K Followers

867 Following

Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

I've been lucky enough to see an early draft of this. It has a surprising RL angle! The RL community has long suspected that the Decision Transformer might be doing “trajectory stitching,” but I haven’t seen empirical evidence yet. Ken’s paper shows how subsequences can be

I've been lucky enough to see an early draft of this. It has a surprising RL angle! The RL community has long suspected that the Decision Transformer might be doing  “trajectory stitching,” but I haven’t seen empirical evidence yet. Ken’s paper shows how subsequences can be
Saurabh Shah (@saurabh_shah2) 's Twitter Profile Photo

Really really interesting work. Adds to the pile of evidence that train-test decontamination is super hard, and we are probably not doing a good job of this in general.

Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in OpenAI’s non-technical reports? Percy Liang and Tatsunori Hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.

Xiangyu Qi (@xiangyuqi_pton) 's Twitter Profile Photo

Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML.

John Yang (@jyangballin) 's Twitter Profile Photo

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified.

We built it by synthesizing a ton of agentic training data from 100+ Python repos.

Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
David Hall (@dlwh) 's Twitter Profile Photo

Come read about all the mistakes I made along the way to beating Llama 3.1 8B on 14/19 benchmarks. We trained from scratch, made plenty of wrong turns, and learned a lot.

Percy Liang (@percyliang) 's Twitter Profile Photo

AI agents have the potential to significantly alter the cybersecurity landscape. To help us understand this change, we are excited to release BountyBench, the first framework to capture offensive & defensive cyber-capabilities in evolving real-world systems.

AI agents have the potential to significantly alter the cybersecurity landscape. To help us understand this change, we are excited to release BountyBench, the first framework to capture offensive & defensive cyber-capabilities in evolving real-world systems.
Epoch AI (@epochairesearch) 's Twitter Profile Photo

Is AI already superhuman at FrontierMath? To answer this question, we ran a competition at MIT, pitting eight teams of mathematicians against o4-mini-medium. Result: o4-mini beat all but two teams. And while AIs aren't yet clearly superhuman, they probably will be soon.

Is AI already superhuman at FrontierMath? 

To answer this question, we ran a competition at MIT, pitting eight teams of mathematicians against o4-mini-medium. 

Result: o4-mini beat all but two teams. And while AIs aren't yet clearly superhuman, they probably will be soon.
Aryaman Arora (@aryaman2020) 's Twitter Profile Photo

new paper! 🫡 why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!

new paper! 🫡

why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

Qinan Yu (@qinan_yu) 's Twitter Profile Photo

🎀 fine-grained, interpretable representation steering for LMs! meet RePS — Reference-free Preference Steering! 1⃣ outperforms existing methods on 2B-27B LMs, nearly matching prompting 2⃣ supports both steering and suppression (beat system prompts!) 3⃣ jailbreak-proof (1/n)

🎀 fine-grained, interpretable representation steering for LMs!
meet RePS — Reference-free Preference Steering!

1⃣ outperforms existing methods on 2B-27B LMs, nearly matching prompting
2⃣ supports both steering and suppression (beat system prompts!)
3⃣ jailbreak-proof

(1/n)
Zhengxuan Wu (@zhengxuanzenwu) 's Twitter Profile Photo

we present a new representation steering training objective to rival against prompting! and you also get: - a fun trick: you can mitigate the side-effects of randomly selecting steering factors by simply training with it. - a long appendix with our core dumps on steering