Vijay Karunamurthy (@vjkaruna) 's Twitter Profile
Vijay Karunamurthy

@vjkaruna

Field CTO @scale_AI . Apple, Google, YouTube.

ID: 4413741

calendar_today12-04-2007 21:44:28

3,3K Tweet

2,2K Followers

492 Following

Tech Entrepreneurs Association of Mumbai (TEAM) (@mumbai_tech_) 's Twitter Profile Photo

Vijay Karunamurthy isn't just in AI; he's shaping it. 🤖 As Field CTO at Scale AI, he's instrumental in advancing AI across industries. This former Google and YouTube engineer (think personalized human interfaces!) co-founded AVOS Systems and now leads the charge in data labeling,

Scale AI (@scale_ai) 's Twitter Profile Photo

New SEAL Leaderboard: EnigmaEval, a benchmark assessing models’ puzzle-solving capabilities. o1 leads. scl.ai/enigma-eval EnigmaEval is part of a new class of extremely challenging benchmarks that expose current models' limitations. The benchmark comprises 1184 puzzles

Vijay Karunamurthy (@vjkaruna) 's Twitter Profile Photo

Great talking to Alex Kantrowitz on the Big Technology substack, on the challenge ahead for building human-level AI for the real economy. bigtechnology.com/p/scale-ai-cto…

Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

🚨 Gemini 2.5 Pro Exp dropped and it's now #1 across SEAL leaderboards: 🥇 Humanity’s Last Exam 🥇 VISTA (multimodal) 🥇 (tie) Tool Use 🥇 (tie) MultiChallenge (multi-turn) 🥉 (tie) Enigma (puzzles) Congrats to Demis Hassabis Sundar Pichai & team! 🔗 scale.com/leaderboard

🚨 Gemini 2.5 Pro Exp dropped and it's now #1 across SEAL leaderboards:

🥇 Humanity’s Last Exam
🥇 VISTA (multimodal)
🥇 (tie) Tool Use
🥇 (tie) MultiChallenge (multi-turn)
🥉 (tie) Enigma (puzzles)

Congrats to <a href="/demishassabis/">Demis Hassabis</a> <a href="/sundarpichai/">Sundar Pichai</a> &amp; team!

🔗 scale.com/leaderboard
Graham Neubig (@gneubig) 's Twitter Profile Photo

We're happy to have Scale AI be a gold sponsor of the CMU agent workshop! The workshop is coming up soon 4/10-11, so please sign up if you're interested in agents and can come to the workshop: cmu-agent-workshop.github.io

Daniel Berrios (@danielxberrios) 's Twitter Profile Photo

🎉 Excited to share more about the work we’ve been doing at Scale AI to help AI labs: ✅ evaluate model performance ✅ analyze weaknesses ✅ drive targeted improvements Thank you to Will Knight WIRED for covering!

🎉 Excited to share more about the work we’ve been doing at <a href="/scale_AI/">Scale AI</a> to help AI labs:
✅ evaluate model performance
✅ analyze weaknesses
✅ drive targeted improvements

Thank you to <a href="/willknight/">Will Knight</a> <a href="/WIRED/">WIRED</a> for covering!
Summer Yue (@summeryue0) 's Twitter Profile Photo

If a model lies when pressured—it’s not ready for AGI. The new MASK leaderboard is live. Built on the private split of our open-source honesty benchmark (w/ Center for AI Safety), it tests whether models lie under pressure—even when they know better. 📊 Leaderboard:

If a model lies when pressured—it’s not ready for AGI.

The new MASK leaderboard is live.

Built on the private split of our open-source honesty benchmark (w/ <a href="/ai_risks/">Center for AI Safety</a>), it tests whether models lie under pressure—even when they know better.

📊 Leaderboard:
Vijay Karunamurthy (@vjkaruna) 's Twitter Profile Photo

“The five-year deal will include creating an AI personalized learning platform and AI teacher assistant.. and work with governments in Asia and Europe could account for a significant piece of sales.” Overview of our work empowering global citizens: cnbc.com/2025/04/16/sca…

Nathaniel Li (@natliml) 's Twitter Profile Photo

We're releasing a multimodal benchmark for troubleshooting complex virology protocols. Expert virologists score only 22.1%, even with internet access and answering questions in their subdomains of expertise. Frontier LLMs score up to 43.8%.

Scale AI (@scale_ai) 's Twitter Profile Photo

Navigating the agent landscape as an enterprise is tricky as it grows more complex. Enterprise agents need more than reasoning—they need precision, feedback loops, and to live inside well-designed products

Siebel School of Computing and Data Science (@illinoiscds) 's Twitter Profile Photo

"Remember what Illinois has already taught you, take those calculated risks, question your assumptions, and pivot without fear when reality doesn't match the plan." - From Steve Chen's 2025 commencement speech ▶️ bit.ly/43Wdjrt

"Remember what Illinois has already taught you, take those calculated risks, question your assumptions, and pivot without fear when reality doesn't match the plan." - From Steve Chen's 2025 commencement speech
▶️ bit.ly/43Wdjrt
Vijay Karunamurthy (@vjkaruna) 's Twitter Profile Photo

“How do we create the right virtual environments for agents to act in… so we know when to intervene [and] escalatory thresholds.” Great deep-dive with Scale AI ‘s Christina Knight on Lawfare Daily! podcasts.apple.com/us/podcast/the…

“How do we create the right virtual environments for agents to act in… so we know when to intervene [and] escalatory thresholds.” 

Great deep-dive with <a href="/scale_AI/">Scale AI</a> ‘s Christina Knight on Lawfare Daily! 

podcasts.apple.com/us/podcast/the…
Chen Bo Calvin Zhang (@calvincbzhang) 's Twitter Profile Photo

New Scale AI research in collaboration with Anthropic introduces SHADE-Arena, a benchmark to test for AI sabotage. SHADE-Arena evaluates an AI agent's ability to complete a task while secretly pursuing a harmful objective, all while being watched by an AI monitor. 🧵