Vijay Karunamurthy (@vjkaruna) Twitter Tweets • TwiCopy

Tech Entrepreneurs Association of Mumbai (TEAM)

7 months ago

Vijay Karunamurthy isn't just in AI; he's shaping it. 🤖 As Field CTO at Scale AI, he's instrumental in advancing AI across industries. This former Google and YouTube engineer (think personalized human interfaces!) co-founded AVOS Systems and now leads the charge in data labeling,

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Scale AI

@scale_ai

7 months ago

New SEAL Leaderboard: EnigmaEval, a benchmark assessing models’ puzzle-solving capabilities. o1 leads. scl.ai/enigma-eval EnigmaEval is part of a new class of extremely challenging benchmarks that expose current models' limitations. The benchmark comprises 1184 puzzles

thumb_up_off_alt22

chat_bubble_outline3

repeat9

shareShare

steve chen

@stevechen

7 months ago

Happy 20th birthday, YouTube. How you've grown...!

thumb_up_off_alt2,2K

chat_bubble_outline169

repeat279

shareShare

Vijay Karunamurthy

@vjkaruna

6 months ago

Great talking with Summer Yue and Dan Hendrycks about Humanity’s Last Exam, and pushing the frontiers of model evaluation, reasoning and calibration.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Vijay Karunamurthy

@vjkaruna

6 months ago

Great talking to Alex Kantrowitz on the Big Technology substack, on the challenge ahead for building human-level AI for the real economy. bigtechnology.com/p/scale-ai-cto…

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Alexandr Wang

@alexandr_wang

5 months ago

🚨 Gemini 2.5 Pro Exp dropped and it's now #1 across SEAL leaderboards: 🥇 Humanity’s Last Exam 🥇 VISTA (multimodal) 🥇 (tie) Tool Use 🥇 (tie) MultiChallenge (multi-turn) 🥉 (tie) Enigma (puzzles) Congrats to Demis Hassabis Sundar Pichai & team! 🔗 scale.com/leaderboard

thumb_up_off_alt1,1K

chat_bubble_outline102

repeat214

shareShare

Graham Neubig

@gneubig

5 months ago

We're happy to have Scale AI be a gold sponsor of the CMU agent workshop! The workshop is coming up soon 4/10-11, so please sign up if you're interested in agents and can come to the workshop: cmu-agent-workshop.github.io

thumb_up_off_alt57

chat_bubble_outline1

repeat10

shareShare

Daniel Berrios

@danielxberrios

5 months ago

🎉 Excited to share more about the work we’ve been doing at Scale AI to help AI labs: ✅ evaluate model performance ✅ analyze weaknesses ✅ drive targeted improvements Thank you to Will Knight WIRED for covering!

🎉 Excited to share more about the work we’ve been doing at <a href="/scale_AI/">Scale AI</a> to help AI labs:
✅ evaluate model performance
✅ analyze weaknesses
✅ drive targeted improvements

Thank you to <a href="/willknight/">Will Knight</a> <a href="/WIRED/">WIRED</a> for covering!

thumb_up_off_alt35

chat_bubble_outline5

repeat7

shareShare

Vijay Karunamurthy

@vjkaruna

5 months ago

Neat to see a very ‘timely’ idea (optionality and time scales in RL) - published 25 years ago.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Vijay Karunamurthy

@vjkaruna

5 months ago

Excited to be back on campus for the UN AI for Good law track with DLA Piper!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Summer Yue

@summeryue0

5 months ago

If a model lies when pressured—it’s not ready for AGI. The new MASK leaderboard is live. Built on the private split of our open-source honesty benchmark (w/ Center for AI Safety), it tests whether models lie under pressure—even when they know better. 📊 Leaderboard:

thumb_up_off_alt49

chat_bubble_outline1

repeat15

shareShare

Vijay Karunamurthy

@vjkaruna

5 months ago

“The five-year deal will include creating an AI personalized learning platform and AI teacher assistant.. and work with governments in Asia and Europe could account for a significant piece of sales.” Overview of our work empowering global citizens: cnbc.com/2025/04/16/sca…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Nathaniel Li

@natliml

4 months ago

We're releasing a multimodal benchmark for troubleshooting complex virology protocols. Expert virologists score only 22.1%, even with internet access and answering questions in their subdomains of expertise. Frontier LLMs score up to 43.8%.

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Scale AI

@scale_ai

4 months ago

Navigating the agent landscape as an enterprise is tricky as it grows more complex. Enterprise agents need more than reasoning—they need precision, feedback loops, and to live inside well-designed products

thumb_up_off_alt14

chat_bubble_outline1

repeat8

shareShare

Siebel School of Computing and Data Science

@illinoiscds

3 months ago

"Remember what Illinois has already taught you, take those calculated risks, question your assumptions, and pivot without fear when reality doesn't match the plan." - From Steve Chen's 2025 commencement speech ▶️ bit.ly/43Wdjrt

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Vijay Karunamurthy

@vjkaruna

3 months ago

Incredible getting a tour of the Simons Institute for the Theory of Computing at Berkeley this morning - new candidate for a transmon qubit (in a Cal enclosure!)

Incredible getting a tour of the <a href="/SimonsInstitute/">Simons Institute for the Theory of Computing</a> at Berkeley this morning - new candidate for a transmon qubit (in a Cal enclosure!)

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Vijay Karunamurthy

@vjkaruna

3 months ago

“How do we create the right virtual environments for agents to act in… so we know when to intervene [and] escalatory thresholds.” Great deep-dive with Scale AI ‘s Christina Knight on Lawfare Daily! podcasts.apple.com/us/podcast/the…

“How do we create the right virtual environments for agents to act in… so we know when to intervene [and] escalatory thresholds.”

Great deep-dive with <a href="/scale_AI/">Scale AI</a> ‘s Christina Knight on Lawfare Daily!

podcasts.apple.com/us/podcast/the…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

The National Security Institute

@masonnatsec

3 months ago

Fdr and Exec Director Jamil N. Jaffer spoke on a panel at the World Economic Forum Technology Retreat 2025 titled “AI and the Privatization of Sovereignty.” Mod: Cathy Li of Centre for AI Excellence, WEF Co-Panellists: Eileen Donahoe of Global Digital Policy Incubator; Vijay Karunamurthy of Scale AI;

Fdr and Exec Director <a href="/jamil_n_jaffer/">Jamil N. Jaffer</a> spoke on a panel at the <a href="/WorlEconomicF/">World Economic Forum</a> Technology Retreat 2025 titled “AI and the Privatization of Sovereignty.”

Mod: Cathy Li of Centre for AI Excellence, WEF
Co-Panellists: <a href="/EileenDonahoe/">Eileen Donahoe</a> of <a href="/Stanford_GDPi/">Global Digital Policy Incubator</a>; <a href="/vjkaruna/">Vijay Karunamurthy</a> of <a href="/scale_AI/">Scale AI</a>;

thumb_up_off_alt2

chat_bubble_outline1

repeat3

shareShare

Daniel Levine

@daniel_levine

3 months ago

Thank you Alexandr Wang

thumb_up_off_alt246

chat_bubble_outline10

repeat9

shareShare

Chen Bo Calvin Zhang

@calvincbzhang

2 months ago

New Scale AI research in collaboration with Anthropic introduces SHADE-Arena, a benchmark to test for AI sabotage. SHADE-Arena evaluates an AI agent's ability to complete a task while secretly pursuing a harmful objective, all while being watched by an AI monitor. 🧵

thumb_up_off_alt53

chat_bubble_outline3

repeat8

shareShare