Rishub Jain (@shubadubadub) 's Twitter Profile
Rishub Jain

@shubadubadub

Research Engineer at @GoogleDeepMind, currently working on Safe+Ethical AI

ID: 370444519

linkhttp://rishubjain.github.io calendar_today09-09-2011 01:27:44

74 Tweet

235 Followers

421 Following

David Lindner (@davlindner) 's Twitter Profile Photo

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward?

Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them!

Inspired by myopic optimization but better performance – details in🧵
David Lindner (@davlindner) 's Twitter Profile Photo

Want to join one of the best AI safety teams in the world? We're hiring Google DeepMind! We have open positions for research engineers and research scientists in the AGI Safety & Alignment and Gemini Safety teams. Locations: London, Zurich, New York, Mountain View and SF

Rohin Shah (@rohinmshah) 's Twitter Profile Photo

We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.

We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.
Arthur Conmy (@arthurconmy) 's Twitter Profile Photo

We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with Neel Nanda, the Google DeepMind AGI Safety team, and me: apply by 28th February as a

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

AGI could revolutionize many fields - from healthcare to education - but it's crucial that it’s developed responsibly. Today, we’re sharing how we’re thinking about safety and security on the path to AGI. → goo.gle/3R08XcD

Rohin Shah (@rohinmshah) 's Twitter Profile Photo

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.)

AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.
Sophia (@sopharicks) 's Twitter Profile Photo

Thanks to Sophie Bridgers and Rishub Jain for sharing with the BuzzRobot community the Google DeepMind framework on how AI and humans can complement each other and create synergy! Watch the lecture on our YouTube channel: youtu.be/IeXaiCvPM_E

Andreas Terzis (@aterzis) 's Twitter Profile Photo

1/3 🚨 AGI agents are venturing into untrusted territories, but current LLMs face vulnerabilities like prompt injections. How do we ensure their safety? šŸ¤”

Saffron Huang (@saffronhuang) 's Twitter Profile Photo

I have a new piece out in Noema Magazine today with sam manning on how and why we should ensure broad ownership in AI. UBI is not the answer to the threat of automation. We need capital-based approaches (human, productive, financial capital) to mitigate economic/political power

Xiangyu Qi (@xiangyuqi_pton) 's Twitter Profile Photo

Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML.