Krista Opsahl-Ong (@kristahopsalong) 's Twitter Profile
Krista Opsahl-Ong

@kristahopsalong

CS PhD student @Stanford @StanfordAILab || Prev @Google @Microsoft

ID: 455142765

calendar_today04-01-2012 19:46:04

59 Tweet

1,1K Followers

493 Following

Ivan Zhou (@ivanzhouyq) 's Twitter Profile Photo

Great to see a collaboration between Andrew Ng, Databricks and DSPy ! 🌟 DSPy is a powerful and thoughtful framework. The way it treats an LLM system as a broad search space and optimizes the entire system is very impressive. It is part of the production workflow in

Krista Opsahl-Ong (@kristahopsalong) 's Twitter Profile Photo

Highly recommend this DeepLearning.AI course by Chen Qian as a way to learn about DSPy & how you can leverage it to build & optimize agents! 🤖⭐ Chen did a phenomenal job crafting these lessons with helpful hands-on tutorials. Let us know what you think! 🧩

Krista Opsahl-Ong (@kristahopsalong) 's Twitter Profile Photo

Agent Bricks is officially launched! 🤖🧱 It's been incredibly fun working on these products with the rest of Databricks Mosaic Research & the Databricks engineering team. Excited to see what folks are able to build with them!

Krista Opsahl-Ong (@kristahopsalong) 's Twitter Profile Photo

I’ll be at #ICML this week! ✈️🇨🇦 Excited to chat with folks about DSPy, automatic prompt optimization methods, compound AI systems, and research roles at Databricks (we’re hiring!). If these topics interest you, feel free to reach out or find me at the Databricks booth!

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

The #SIGIR2025 Best Paper just awarded to the WARP engine for fast late interaction! Congrats to Luca Scheerer🎉 WARP was his ETH Zurich MS thesis, completed while visiting us at @StanfordNLP. Incidentally, it's the fifth Paper Award for a ColBERT paper since 2020!* Luca did an

The #SIGIR2025 Best Paper just awarded to the WARP engine for fast late interaction!

Congrats to Luca Scheerer🎉 WARP was his <a href="/ETH_en/">ETH Zurich</a> MS thesis, completed while visiting us at @StanfordNLP.

Incidentally, it's the fifth Paper Award for a ColBERT paper since 2020!*

Luca did an
Brando Miranda (@brandohablando) 's Twitter Profile Photo

🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14

🚨 Can your LLM really do math—or is it cramming the test set?
 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 

1. openreview.net/forum?id=kqj2C…
2. icml.cc/virtual/2025/p…

#ICML2025 East Exhibition Hall A-B, #E-2502

🧵1/14
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

There is a rich set of research questions in design and optimization of agentic workflows with a ton of room for theoretical & algorithmic work! A great starting point to get exposed to them is the MIPRO paper (Krista Opsahl-Ong Omar Khattab et al.) and the DSPy framework.

There is a rich set of research questions in design and optimization of agentic workflows with a ton of room for theoretical &amp; algorithmic work!

A great starting point to get exposed to them is the MIPRO paper (<a href="/kristahopsalong/">Krista Opsahl-Ong</a> <a href="/lateinteraction/">Omar Khattab</a> et al.) and the DSPy framework.
Jonathan Frankle (@jefrankle) 's Twitter Profile Photo

RLVR isn't just for math and coding! At Databricks, it's impacting products and users across domains. One example: SQL Q&A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product.

RLVR isn't just for math and coding! At <a href="/databricks/">Databricks</a>, it's impacting products and users across domains. One example: SQL Q&amp;A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product.
Michael Bendersky (@bemikelive) 's Twitter Profile Photo

Since joining Databricks, our research team has been hard at work on Agent Bricks, a new product that helps enterprises develop state-of-the-art domain-specific agents. We are now releasing a research blog about Agent Learning from Human Feedback (ALHF) databricks.com/blog/agent-lea…

Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Really excited about ALHF, new work from our research team that lets users give natural language feedback to agents and optimizes them for it. It sort of upends the traditional supervision paradigm where you get a scalar reward, and it makes AI more customizable for non-experts.

Really excited about ALHF, new work from our research team that lets users give natural language feedback to agents and optimizes them for it. It sort of upends the traditional supervision paradigm where you get a scalar reward, and it makes AI more customizable for non-experts.
Alex Trott (@alexrtrott) 's Twitter Profile Photo

Ever wonder what it'd look like if an LLM Judge and a Reward Model had a baby? So did we, which is why we created PGRM -- the Prompt-Guided Reward Model. TLDR: You get the instructability of an LLM judge + the calibration of an RM in a single speedy package (1/n)

Ever wonder what it'd look like if an LLM Judge and a Reward Model had a baby? So did we, which is why we created PGRM -- the Prompt-Guided Reward Model. 

TLDR: You get the instructability of an LLM judge + the calibration of an RM in a single speedy package (1/n)
Jonathan Frankle (@jefrankle) 's Twitter Profile Photo

Not that I have a favorite recent project, but... 🧵 LLM judges are the popular way to evaluate generative models. But they have drawbacks. They're: * Generative, so slow and expensive. * Nondeterministic. * Uncalibrated. They don't know how uncertain they are. Meet PGRM!

Ivan Zhou (@ivanzhouyq) 's Twitter Profile Photo

Automated prompt optimization (GEPA) can push open-source models beyond frontier performance on enterprise tasks — at a fraction of the cost! 🔑 Key results from our research Databricks Mosaic Research: 1⃣ gpt-oss-120b + GEPA beats Claude Opus 4.1 on Information Extraction (+2.2 points) —

Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.

Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.