priya joseph (@ayirpelle) 's Twitter Profile
priya joseph

@ayirpelle

geek, entrepreneur, 'I strictly color outside the lines!', opinions r my own indeed. @ayirpelle , universal handle at this time

ID: 498631199

calendar_today21-02-2012 07:56:57

662,662K Tweet

4,4K Followers

5,5K Following

Ajasja 💻🧬🔬 (@ajasjaljubetic) 's Twitter Profile Photo

We listened to your feedback. Proscuplt is: *Easier to install - now uses use containers 🚀 *Easier to run in SLURM - specify queue & task details directly in the prosculpt YAML file. github.com/ajasja/proscul… Big thanks to Federico Olivieri, Alina Konstantinova, Nej Bizjak, and Žan Žnidar🙏

Santiago (@svpino) 's Twitter Profile Photo

Cursor isn't leading anymore. Claude Code and Gemini Code are, in my opinion, ahead of everyone else. Windsurf is dead, and VSCode Copilot is too far behind.

Miles Turpin (@milesaturpin) 's Twitter Profile Photo

New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

New @Scale_AI paper! 🌟

LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).
Miles Turpin (@milesaturpin) 's Twitter Profile Photo

To construct VFT datasets, we identify cases where models are influenced by cues but don't mention them. We then use another model to minimally edit the CoT to explicitly acknowledge the cue's influence, and fine-tune on these faithful explanations.

To construct VFT datasets, we identify cases where models are influenced by cues but don't mention them. We then use another model to minimally edit the CoT to explicitly acknowledge the cue's influence, and fine-tune on these faithful explanations.
Miles Turpin (@milesaturpin) 's Twitter Profile Photo

The results are striking: After RL, only 6% of VFT-trained model responses are undetected reward hacks (i.e., unverbalized). In contrast, the baseline model hits 88% after RL, and a baseline intervention that tries to get models to avoid reward hacking (BCT) reaches 99%!

The results are striking: After RL, only 6% of VFT-trained model responses are undetected reward hacks (i.e., unverbalized). In contrast, the baseline model hits 88% after RL, and a baseline intervention that tries to get models to avoid reward hacking (BCT) reaches 99%!
Miles Turpin (@milesaturpin) 's Twitter Profile Photo

All models reach ~100% cue influence rates after RL (they all learn to exploit the rewards). So, the dramatic improvement comes from VFT increasing verbalization rates from 8% to 42% after initial training, then up to 94% after RL.

All models reach ~100% cue influence rates after RL (they all learn to exploit the rewards). So, the dramatic improvement comes from VFT increasing verbalization rates from 8% to 42% after initial training, then up to 94% after RL.
Miles Turpin (@milesaturpin) 's Twitter Profile Photo

We also investigate "over-verbalization" and we find that balanced accuracy (avg. of sensitivity and specificity) gets as high as 77% during RL (random=50%), suggesting that models are good at abstaining from claiming cue influence when their answer isn't actually affected.

Miles Turpin (@milesaturpin) 's Twitter Profile Photo

We're excited about VFT as a practical path toward safer AI systems—perfect alignment can be difficult, but transparent/detectable reward hacking gives us a chance to fix our reward functions. But whatever you do, don’t train against your CoT monitor!

Yijia Shao (@echoshao8899) 's Twitter Profile Photo

After sharing our preprint on the Future of Work with AI Agents, we received strong interest in the WORKBank database. Today, we’re excited to release it publicly—along with a visualization tool to explore occupational and sector-level insights🧵

After sharing our preprint on the Future of Work with AI Agents, we received strong interest in the WORKBank database. Today, we’re excited to release it publicly—along with a visualization tool to explore occupational and sector-level insights🧵
Eric Topol (@erictopol) 's Twitter Profile Photo

New, important insights for neurodegenerative diseases by high-throughput proteomics today #GNPC 1. APOε4 carriers have a distinct pro-inflammatory immune proteomic signature of dysregulation in the brain and blood Nature Medicine nature.com/articles/s4159…

New, important insights for neurodegenerative diseases by high-throughput proteomics today #GNPC 
1.  APOε4 carriers have a distinct pro-inflammatory immune proteomic signature of dysregulation in the brain and blood 
<a href="/NatureMedicine/">Nature Medicine</a> 
nature.com/articles/s4159…
Calvin French-Owen (@calvinfo) 's Twitter Profile Photo

As they say, some personal news– I just left OpenAI after launching Codex. Extremely grateful to everyone there who I got the chance to work with and learn from. Still figuring out what's next, but there's a lot left to build out there.

As they say, some personal news–

I just left <a href="/OpenAI/">OpenAI</a> after launching Codex. Extremely grateful to everyone there who I got the chance to work with and learn from.

Still figuring out what's next, but there's a lot left to build out there.
Mengyue Yang ✈️ ICLR 2025 (@mengyue_yang_) 's Twitter Profile Photo

Unfortunately, I won't be able to attend #ICML2025 in person due to visa delays. But I'm excited to share our paper in ICML: "Large Language Models are Demonstration Pre-Selectors for Themselves"! 🧠📄 💡 What if LLMs could help pick better examples for themselves? We propose

Linux Kernel Security (@linkersec) 's Twitter Profile Photo

Linux Kernel Hardening: Ten Years Deep Talk by Kees Cook about the relevance of various Linux kernel vulnerability classes and the mitigations that address them. Video: youtube.com/watch?v=c_NxzS… Slides: static.sched.com/hosted_files/l…

Linux Kernel Hardening: Ten Years Deep

Talk by <a href="/kees_cook/">Kees Cook</a> about the relevance of various Linux kernel vulnerability classes and the mitigations that address them.

Video: youtube.com/watch?v=c_NxzS…
Slides: static.sched.com/hosted_files/l…
Alexander Kirillov (@_alex_kirillov_) 's Twitter Profile Photo

We have been working hard for the past 6 months on what I believe is the most ambitious multimodal AI program in the world. It is fantastic to see how pieces of a system that previously seemed intractable just fall into place. Feeling so lucky to create the future with this

Twist Bioscience (@twistbioscience) 's Twitter Profile Photo

Development of regenerative grafts would aid stroke and injury victims, ease strain on families and the health care system, and position the U.S. as a leader in brain repair technology ARPA-H buff.ly/rJ9qPgj

Ankur Nagpal (@ankurnagpal) 's Twitter Profile Photo

Something interesting about the new QSBS rules from the new tax bill: Most startups today that have raised less than $75M of assets could end up with two "classes" of QSBS eligibility for employee shares Shares granted after July 4 would have a $15M exemption & partial QSBS

Jonny (@hsu_jonny) 's Twitter Profile Photo

Really excited to share that we’re in the Y Combinator Combinator summer batch! I’m even more excited to be teaming up with Phil Fradkin and Ian Shi on this next chapter. Blank Bio, we're building the next generation of foundation models for RNA.

Really excited to share that we’re in the <a href="/ycombinator/">Y Combinator</a>  Combinator summer batch!

I’m even more excited to be teaming up with <a href="/phil_fradkin/">Phil Fradkin</a> and <a href="/ianshi3/">Ian Shi</a> on this next chapter. 

<a href="/blankbio_/">Blank Bio</a>, we're building the next generation of foundation models for RNA.
Bairu Hou (@hou_bairu) 's Twitter Profile Photo

Just describe your task (and optionally the input) — our method dynamically prune the LLM into a smaller one that’s most suits the task/input and gets it ready for inference in just 0.1 seconds! We call it "instruction-following" model pruning. Check out our #ICML2025 paper,

Just describe your task (and optionally the input) — our method dynamically prune the LLM into a smaller one that’s most suits the task/input and gets it ready for inference in just 0.1 seconds!

We call it "instruction-following" model pruning.

Check out our #ICML2025 paper,
Ed Turner (@edturner42) 's Twitter Profile Photo

1/6: Emergent misalignment (EM) is when you train on eg bad medical advice and the LLM becomes generally evil We've studied how; this update explores why Can models just learn to give bad advice? Yes, easy with regularisation But it’s less stable than general evil! Thus EM