Rebecca Qian (@rebeccatqian) 's Twitter Profile
Rebecca Qian

@rebeccatqian

llm evals @PatronusAI, previously research eng @MetaAI

ID: 1516652840600719363

calendar_today20-04-2022 05:40:30

184 Tweet

682 Followers

354 Following

Nikhil Abraham (@nikhilabm) 's Twitter Profile Photo

We taught a robot to cook Michelin-quality dishes. Meet Zippy, the robotic chef: 🧑‍🍳 Already cooking for Michelin-star chefs 🧠 Culinary brain pre-trained on 5M+ multi-modal recipes 🍳 Learns new recipes from a single demonstration 🔥 Plug n play for commercial kitchens 1/

Matt Hartman (@matthartman) 's Twitter Profile Photo

If you are building or building *with* open source AI, join the first open-source AI conference, live streamed April 16th with speakers from the top open-source companies in the world

If you are building or building *with* open source AI, join the first open-source AI conference, live streamed April 16th with speakers from the top open-source companies in the world
PatronusAI (@patronusai) 's Twitter Profile Photo

Great to see Databricks use our eval benchmark FinanceBench to evaluate their new finetuning method TAO! ⚡ Test-time Adaptive Optimization (TAO) is a new finetuning method for reference-free use cases, i.e. it doesn't need labels to work, in contrast to SFT. It uses test-time

PatronusAI (@patronusai) 's Twitter Profile Photo

We are hosting a legal AI hackathon with Stanford University on Sunday! Thrilled to be sponsoring this event with Thomson Reuters, Bloomberg Law, LlamaIndex 🦙, and more. Come stop by our booth to say hi and see our product in action 🎉 And no, this is not an April Fools joke :) RSVP here:

We are hosting a legal AI hackathon with <a href="/Stanford/">Stanford University</a> on Sunday! Thrilled to be sponsoring this event with <a href="/thomsonreuters/">Thomson Reuters</a>, <a href="/BLaw/">Bloomberg Law</a>, <a href="/llama_index/">LlamaIndex 🦙</a>, and more. Come stop by our booth to say hi and see our product in action 🎉

And no, this is not an April Fools joke :)

RSVP here:
Rebecca Qian (@rebeccatqian) 's Twitter Profile Photo

New agent benchmark 👀 we all have moments where we remember scenes but can’t recall the movie name, or picture the scenery but can’t remember the location. BLUR evaluates agent abilities to perform tip-of-the-tongue search and reasoning!

Rebecca Qian (@rebeccatqian) 's Twitter Profile Photo

HF leaderboard is live!! I’ve gotten a lot of requests to eval models on BLUR over the past few hours. You can now submit your models 🏆 huggingface.co/spaces/Patronu…

Annie Franco (@anniefranco) 's Twitter Profile Photo

Building good benchmarks is hard, and PatronusAI has released what may be the coolest agent eval yet: ✅ Realistic and objectively useful task ✅ Multilingual, multimodal, and multi-domain ✅ Easy for humans, still challenging for agents

Annie Franco (@anniefranco) 's Twitter Profile Photo

My colleague Chris McConnell and I greatly enjoyed seeing Sky CH. Wang Darshan Deshpande Rebecca Qian Anand Kannappan bring this project to life. We’re excited to finally see it out in the world, and look forward to collaborating on the next one!

Avanika Narayan (@avanika15) 's Twitter Profile Photo

can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a

PatronusAI (@patronusai) 's Twitter Profile Photo

1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

Non-deterministic trajectories need autonomous supervision. Introducing Percival, a SoTA system to detect issues with long context agentic problems and suggest fixes to systems. The time to make a move towards autonomous evaluations is now! 🔥

Non-deterministic trajectories need autonomous supervision. Introducing Percival, a SoTA system to detect issues with long context agentic problems and suggest fixes to systems. 

The time to make a move towards autonomous evaluations is now! 🔥
Clémentine Fourrier 🍊 (@clefourrier) 's Twitter Profile Photo

To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning... but to do so automatically, you need an LLM... 🤔so how do you evaluate the trace evaluator? With TRAIL, which contains: - a full taxonomy of agent errors and most frequent failure cases,

Vincent Liu (@vincentjliu) 's Twitter Profile Photo

The future of robotics isn't in the lab – it's in your hands. Can we teach robots to act in the real world without a single robot demonstration? Introducing EgoZero. Train real-world robot policies from human-first egocentric data. No robots. No teleop. Just Aria glasses and

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

After three incredible years, today is my last day at Google DeepMind! I am truly grateful to the amazing colleagues who made the journey 1000x more fruitful and enjoyable! I am forever indebted to my collaborators who showed me how to be better at everything via demonstrations.

After three incredible years, today is my last day at Google DeepMind!

I am truly grateful to the amazing colleagues who made the journey 1000x more fruitful and enjoyable! I am forever indebted to my collaborators who showed me how to be better at everything via demonstrations.
Victor Sanh (@sanhestpasmoi) 's Twitter Profile Photo

🔥Big exciting news - I've started a new company! 🚀 We are building AI agents that take actions in the real world by orchestrating the movement of physical goods. We're working with our first partners and are now growing the founding engineering team. We're building in NYC,

PatronusAI (@patronusai) 's Twitter Profile Photo

Thank you, Professor zhou Yu and Berkeley Summit House, for the AI Agents in Action: Industry × Academia Exchange! Rebecca Qian, our CTO, was on a panel with Vinay Rao (Advisor at Anthropic), Shunyu Yao (Research Scientist at OpenAI), Robert Parker (Founder of Perceptix),

Thank you, Professor <a href="/Zhou_Yu_AI/">zhou Yu</a> and <a href="/bklsummithouse/">Berkeley Summit House</a>, for the AI Agents in Action: Industry × Academia Exchange!

<a href="/rebeccatqian/">Rebecca Qian</a>, our CTO, was on a panel with Vinay Rao (Advisor at <a href="/AnthropicAI/">Anthropic</a>), <a href="/ShunyuYao12/">Shunyu Yao</a> (Research Scientist at <a href="/OpenAI/">OpenAI</a>), Robert Parker (Founder of Perceptix),