Rebecca Qian (@rebeccatqian) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We taught a robot to cook Michelin-quality dishes. Meet Zippy, the robotic chef: 🧑‍🍳 Already cooking for Michelin-star chefs 🧠 Culinary brain pre-trained on 5M+ multi-modal recipes 🍳 Learns new recipes from a single demonstration 🔥 Plug n play for commercial kitchens 1/

thumb_up_off_alt4,4K

chat_bubble_outline333

repeat592

shareShare

Matt Hartman

@matthartman

4 months ago

If you are building or building *with* open source AI, join the first open-source AI conference, live streamed April 16th with speakers from the top open-source companies in the world

thumb_up_off_alt79

chat_bubble_outline2

repeat18

shareShare

PatronusAI

@patronusai

4 months ago

Great to see Databricks use our eval benchmark FinanceBench to evaluate their new finetuning method TAO! ⚡ Test-time Adaptive Optimization (TAO) is a new finetuning method for reference-free use cases, i.e. it doesn't need labels to work, in contrast to SFT. It uses test-time

thumb_up_off_alt20

chat_bubble_outline1

repeat3

shareShare

PatronusAI

@patronusai

3 months ago

We are hosting a legal AI hackathon with Stanford University on Sunday! Thrilled to be sponsoring this event with Thomson Reuters, Bloomberg Law, LlamaIndex 🦙, and more. Come stop by our booth to say hi and see our product in action 🎉 And no, this is not an April Fools joke :) RSVP here:

We are hosting a legal AI hackathon with <a href="/Stanford/">Stanford University</a> on Sunday! Thrilled to be sponsoring this event with <a href="/thomsonreuters/">Thomson Reuters</a>, <a href="/BLaw/">Bloomberg Law</a>, <a href="/llama_index/">LlamaIndex 🦙</a>, and more. Come stop by our booth to say hi and see our product in action 🎉

And no, this is not an April Fools joke :)

RSVP here:

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Rebecca Qian

@rebeccatqian

3 months ago

New agent benchmark 👀 we all have moments where we remember scenes but can’t recall the movie name, or picture the scenery but can’t remember the location. BLUR evaluates agent abilities to perform tip-of-the-tongue search and reasoning!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Rebecca Qian

@rebeccatqian

3 months ago

HF leaderboard is live!! I’ve gotten a lot of requests to eval models on BLUR over the past few hours. You can now submit your models 🏆 huggingface.co/spaces/Patronu…

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Annie Franco

@anniefranco

3 months ago

Building good benchmarks is hard, and PatronusAI has released what may be the coolest agent eval yet: ✅ Realistic and objectively useful task ✅ Multilingual, multimodal, and multi-domain ✅ Easy for humans, still challenging for agents

thumb_up_off_alt6

chat_bubble_outline1

repeat4

shareShare

Annie Franco

@anniefranco

3 months ago

My colleague Chris McConnell and I greatly enjoyed seeing Sky CH. Wang Darshan Deshpande Rebecca Qian Anand Kannappan bring this project to life. We’re excited to finally see it out in the world, and look forward to collaborating on the next one!

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Rebecca Qian

@rebeccatqian

2 months ago

Welcome Varun Gangal to PatronusAI 🚀🚀 excited to work on eval research together

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Avanika Narayan

@avanika15

2 months ago

can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a

thumb_up_off_alt245

chat_bubble_outline13

repeat57

shareShare

PatronusAI

@patronusai

2 months ago

1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,

thumb_up_off_alt119

chat_bubble_outline15

repeat37

shareShare

Darshan Deshpande

@getdarshan

2 months ago

Non-deterministic trajectories need autonomous supervision. Introducing Percival, a SoTA system to detect issues with long context agentic problems and suggest fixes to systems. The time to make a move towards autonomous evaluations is now! 🔥

thumb_up_off_alt10

chat_bubble_outline1

repeat4

shareShare

Clémentine Fourrier 🍊

@clefourrier

2 months ago

To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning... but to do so automatically, you need an LLM... 🤔so how do you evaluate the trace evaluator? With TRAIL, which contains: - a full taxonomy of agent errors and most frequent failure cases,

thumb_up_off_alt55

chat_bubble_outline3

repeat8

shareShare

Clémentine Fourrier 🍊

@clefourrier

2 months ago

Check out the very cool work from our friends PatronusAI 🔥 work here! huggingface.co/spaces/Patronu…

thumb_up_off_alt17

chat_bubble_outline1

repeat7

shareShare

Vincent Liu

@vincentjliu

2 months ago

The future of robotics isn't in the lab – it's in your hands. Can we teach robots to act in the real world without a single robot demonstration? Introducing EgoZero. Train real-world robot policies from human-first egocentric data. No robots. No teleop. Just Aria glasses and

thumb_up_off_alt165

chat_bubble_outline15

repeat40

shareShare

Ahmad Beirami @ ICLR 2025

@abeirami

a month ago

After three incredible years, today is my last day at Google DeepMind! I am truly grateful to the amazing colleagues who made the journey 1000x more fruitful and enjoyable! I am forever indebted to my collaborators who showed me how to be better at everything via demonstrations.

thumb_up_off_alt659

chat_bubble_outline37

repeat10

shareShare

Victor Sanh

@sanhestpasmoi

a month ago

🔥Big exciting news - I've started a new company! 🚀 We are building AI agents that take actions in the real world by orchestrating the movement of physical goods. We're working with our first partners and are now growing the founding engineering team. We're building in NYC,

thumb_up_off_alt337

chat_bubble_outline37

repeat34

shareShare

PatronusAI

@patronusai

22 days ago

Thank you, Professor zhou Yu and Berkeley Summit House, for the AI Agents in Action: Industry × Academia Exchange! Rebecca Qian, our CTO, was on a panel with Vinay Rao (Advisor at Anthropic), Shunyu Yao (Research Scientist at OpenAI), Robert Parker (Founder of Perceptix),

Thank you, Professor <a href="/Zhou_Yu_AI/">zhou Yu</a> and <a href="/bklsummithouse/">Berkeley Summit House</a>, for the AI Agents in Action: Industry × Academia Exchange!

<a href="/rebeccatqian/">Rebecca Qian</a>, our CTO, was on a panel with Vinay Rao (Advisor at <a href="/AnthropicAI/">Anthropic</a>), <a href="/ShunyuYao12/">Shunyu Yao</a> (Research Scientist at <a href="/OpenAI/">OpenAI</a>), Robert Parker (Founder of Perceptix),

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Rebecca Qian

Gate.io

Nikhil Abraham

Matt Hartman

PatronusAI

PatronusAI

Rebecca Qian

Rebecca Qian

Annie Franco

Annie Franco

Rebecca Qian

Avanika Narayan

PatronusAI

Darshan Deshpande

Clémentine Fourrier 🍊

Clémentine Fourrier 🍊

Vincent Liu

Ahmad Beirami @ ICLR 2025

Victor Sanh

PatronusAI