priya joseph (@ayirpelle) Twitter Tweets • TwiCopy

priya joseph

@ayirpelle

+ Follow

geek, entrepreneur, 'I strictly color outside the lines!', opinions r my own indeed. @ayirpelle , universal handle at this time

ID: 498631199

calendar_today21-02-2012 07:56:57

662,662K Tweet

4,4K Followers

5,5K Following

Ajasja 💻🧬🔬

@ajasjaljubetic

2 months ago

We listened to your feedback. Proscuplt is: *Easier to install - now uses use containers 🚀 *Easier to run in SLURM - specify queue & task details directly in the prosculpt YAML file. github.com/ajasja/proscul… Big thanks to Federico Olivieri, Alina Konstantinova, Nej Bizjak, and Žan Žnidar🙏

thumb_up_off_alt53

chat_bubble_outline1

repeat14

shareShare

Santiago

2 months ago

Cursor isn't leading anymore. Claude Code and Gemini Code are, in my opinion, ahead of everyone else. Windsurf is dead, and VSCode Copilot is too far behind.

thumb_up_off_alt3,3K

chat_bubble_outline388

repeat131

shareShare

Miles Turpin

2 months ago

New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

New @Scale_AI paper! 🌟

LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

thumb_up_off_alt217

chat_bubble_outline7

repeat36

shareShare

Miles Turpin

2 months ago

To construct VFT datasets, we identify cases where models are influenced by cues but don't mention them. We then use another model to minimally edit the CoT to explicitly acknowledge the cue's influence, and fine-tune on these faithful explanations.

To construct VFT datasets, we identify cases where models are influenced by cues but don't mention them. We then use another model to minimally edit the CoT to explicitly acknowledge the cue's influence, and fine-tune on these faithful explanations.

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Miles Turpin

2 months ago

The results are striking: After RL, only 6% of VFT-trained model responses are undetected reward hacks (i.e., unverbalized). In contrast, the baseline model hits 88% after RL, and a baseline intervention that tries to get models to avoid reward hacking (BCT) reaches 99%!

The results are striking: After RL, only 6% of VFT-trained model responses are undetected reward hacks (i.e., unverbalized). In contrast, the baseline model hits 88% after RL, and a baseline intervention that tries to get models to avoid reward hacking (BCT) reaches 99%!

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Miles Turpin

2 months ago

All models reach ~100% cue influence rates after RL (they all learn to exploit the rewards). So, the dramatic improvement comes from VFT increasing verbalization rates from 8% to 42% after initial training, then up to 94% after RL.

All models reach ~100% cue influence rates after RL (they all learn to exploit the rewards). So, the dramatic improvement comes from VFT increasing verbalization rates from 8% to 42% after initial training, then up to 94% after RL.

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Miles Turpin

2 months ago

We also investigate "over-verbalization" and we find that balanced accuracy (avg. of sensitivity and specificity) gets as high as 77% during RL (random=50%), suggesting that models are good at abstaining from claiming cue influence when their answer isn't actually affected.

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Miles Turpin

2 months ago

We're excited about VFT as a practical path toward safer AI systems—perfect alignment can be difficult, but transparent/detectable reward hacking gives us a chance to fix our reward functions. But whatever you do, don’t train against your CoT monitor!

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Miles Turpin

2 months ago

Thanks to amazing coauthors: Andy Arditi Marvin Li @ ICML 2025 Joe Benton and Julian Michael Paper link: static.scale.com/uploads/669155…

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Yijia Shao

2 months ago

After sharing our preprint on the Future of Work with AI Agents, we received strong interest in the WORKBank database. Today, we’re excited to release it publicly—along with a visualization tool to explore occupational and sector-level insights🧵

After sharing our preprint on the Future of Work with AI Agents, we received strong interest in the WORKBank database. Today, we’re excited to release it publicly—along with a visualization tool to explore occupational and sector-level insights🧵

thumb_up_off_alt45

chat_bubble_outline1

repeat15

shareShare

Eric Topol

2 months ago

New, important insights for neurodegenerative diseases by high-throughput proteomics today #GNPC 1. APOε4 carriers have a distinct pro-inflammatory immune proteomic signature of dysregulation in the brain and blood Nature Medicine nature.com/articles/s4159…

New, important insights for neurodegenerative diseases by high-throughput proteomics today #GNPC
1. APOε4 carriers have a distinct pro-inflammatory immune proteomic signature of dysregulation in the brain and blood
<a href="/NatureMedicine/">Nature Medicine</a>
nature.com/articles/s4159…

thumb_up_off_alt148

chat_bubble_outline7

repeat38

shareShare

Calvin French-Owen

2 months ago

As they say, some personal news– I just left OpenAI after launching Codex. Extremely grateful to everyone there who I got the chance to work with and learn from. Still figuring out what's next, but there's a lot left to build out there.

As they say, some personal news–

I just left <a href="/OpenAI/">OpenAI</a> after launching Codex. Extremely grateful to everyone there who I got the chance to work with and learn from.

Still figuring out what's next, but there's a lot left to build out there.

thumb_up_off_alt836

chat_bubble_outline50

repeat9

shareShare

Mengyue Yang ✈️ ICLR 2025

2 months ago

Unfortunately, I won't be able to attend #ICML2025 in person due to visa delays. But I'm excited to share our paper in ICML: "Large Language Models are Demonstration Pre-Selectors for Themselves"! 🧠📄 💡 What if LLMs could help pick better examples for themselves? We propose

thumb_up_off_alt146

chat_bubble_outline1

repeat19

shareShare

Linux Kernel Security

2 months ago

Linux Kernel Hardening: Ten Years Deep Talk by Kees Cook about the relevance of various Linux kernel vulnerability classes and the mitigations that address them. Video: youtube.com/watch?v=c_NxzS… Slides: static.sched.com/hosted_files/l…

Linux Kernel Hardening: Ten Years Deep

Talk by <a href="/kees_cook/">Kees Cook</a> about the relevance of various Linux kernel vulnerability classes and the mitigations that address them.

Video: youtube.com/watch?v=c_NxzS…
Slides: static.sched.com/hosted_files/l…

thumb_up_off_alt89

chat_bubble_outline0

repeat21

shareShare

Alexander Kirillov

@_alex_kirillov_

2 months ago

We have been working hard for the past 6 months on what I believe is the most ambitious multimodal AI program in the world. It is fantastic to see how pieces of a system that previously seemed intractable just fall into place. Feeling so lucky to create the future with this

thumb_up_off_alt248

chat_bubble_outline6

repeat10

shareShare

Twist Bioscience

@twistbioscience

2 months ago

Development of regenerative grafts would aid stroke and injury victims, ease strain on families and the health care system, and position the U.S. as a leader in brain repair technology ARPA-H buff.ly/rJ9qPgj

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Ankur Nagpal

2 months ago

Something interesting about the new QSBS rules from the new tax bill: Most startups today that have raised less than $75M of assets could end up with two "classes" of QSBS eligibility for employee shares Shares granted after July 4 would have a $15M exemption & partial QSBS

thumb_up_off_alt84

chat_bubble_outline19

repeat3

shareShare

Jonny

2 months ago

Really excited to share that we’re in the Y Combinator Combinator summer batch! I’m even more excited to be teaming up with Phil Fradkin and Ian Shi on this next chapter. Blank Bio, we're building the next generation of foundation models for RNA.

Really excited to share that we’re in the <a href="/ycombinator/">Y Combinator</a> Combinator summer batch!

I’m even more excited to be teaming up with <a href="/phil_fradkin/">Phil Fradkin</a> and <a href="/ianshi3/">Ian Shi</a> on this next chapter.

<a href="/blankbio_/">Blank Bio</a>, we're building the next generation of foundation models for RNA.

thumb_up_off_alt87

chat_bubble_outline12

repeat4

shareShare

Bairu Hou

2 months ago

Just describe your task (and optionally the input) — our method dynamically prune the LLM into a smaller one that’s most suits the task/input and gets it ready for inference in just 0.1 seconds! We call it "instruction-following" model pruning. Check out our #ICML2025 paper,

Just describe your task (and optionally the input) — our method dynamically prune the LLM into a smaller one that’s most suits the task/input and gets it ready for inference in just 0.1 seconds!

We call it "instruction-following" model pruning.

Check out our #ICML2025 paper,

thumb_up_off_alt10

chat_bubble_outline2

repeat2

shareShare

Ed Turner

2 months ago

1/6: Emergent misalignment (EM) is when you train on eg bad medical advice and the LLM becomes generally evil We've studied how; this update explores why Can models just learn to give bad advice? Yes, easy with regularisation But it’s less stable than general evil! Thus EM

thumb_up_off_alt63

chat_bubble_outline3

repeat9

shareShare