Liana (@lianapatel_) 's Twitter Profile
Liana

@lianapatel_

CS PhD student @Stanford, building lotus-data.github.io for fast and easy LLM-powered data processing

ID: 1355177355896115201

linkhttps://liana313.github.io calendar_today29-01-2021 15:34:05

117 Tweet

900 Followers

217 Following

Jure Leskovec (@jure) 's Twitter Profile Photo

🚀 Excited to share POPPER --- Automated Hypothesis Validation with Agentic Sequential Falsifications. Looking forward to seeing how the community builds on this! 🚀💡 🔗 GitHub: github.com/snap-stanford/… 📄 Paper: arxiv.org/abs/2502.09858 Huge thanks to the incredible team behind

Sid Jha (@sid_jha1) 's Twitter Profile Photo

Happy to see many new integrations being built inside LOTUS! We hope that it makes writing LM programs even faster 🚀

NovaSky (@novaskyai) 's Twitter Profile Photo

1/8 🚀 Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain NovaSky . S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

1/8 🚀
Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain <a href="/NovaSkyAI/">NovaSky</a> .

S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* &gt; o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).
Simon Guo 🦝 (@simonguozirui) 's Twitter Profile Photo

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench!

Turns out KernelBench is quite challenging 🧠 —  frontier models outperform the PyTorch Eager baseline &lt;20% of the time.

More 🧵👇
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!

We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Ankush Singal (@andysingal) 's Twitter Profile Photo

Struggling with complex data queries? Enter LOTUS! 🤯 This new engine combines LLMs with databases for powerful, AI-driven insights. Think Text2SQL on steroids! Liana Medium link: medium.com/ai-artistry/lo…

Struggling with complex data queries? Enter LOTUS! 🤯 This new engine combines LLMs with databases for powerful, AI-driven insights. Think Text2SQL on steroids! <a href="/lianapatel_/">Liana</a>  <a href="/Medium/">Medium</a> 

link: medium.com/ai-artistry/lo…
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

One of the main goals I had while building out multilspy (aka.ms/multilspy) was that eventually LLMs will be able to tool call LSPs. Happy to see steps in this direction: Checkout MultilspyMCP (playbooks.com/mcp/asimihsan-…), which provides an mcp implementation over multilspy!

Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved!

The following amazing work theoretically proves the necessary and
Melissa Pan (@melissapan) 's Twitter Profile Photo

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️
🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks!

Paper: arxiv.org/pdf/2503.13657
Code: github.com/multi-agent-sy…

🧵1/n
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use!

With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive
Rose (@rose_e_wang) 's Twitter Profile Photo

I defended my PhD from Stanford CS Stanford NLP Group 🌲 w/ Stanford CS first all-female committee!! My dissertation focused on AI methods, evaluations & interventions to improve Education. So much gratitude for the support & love - and SO excited for the next chapter!!!! 🥳

I defended my PhD from Stanford CS <a href="/stanfordnlp/">Stanford NLP Group</a> 🌲 w/ Stanford CS first all-female committee!! My dissertation focused on AI methods, evaluations &amp; interventions to improve Education.

So much gratitude for the support &amp; love - and SO excited for the next chapter!!!! 🥳
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

So many things in the run-up to DSPy 3. Here's a first, EXPERIMENTAL one: 🚨We're releasing dspy.GRPO, an online RL optimizer for DSPy programs Your DSPy code as-is can be dspy.GRPO'ed. Yes, even compound multi-module programs. Led by Noah Ziems Lakshya A Agrawal dilara.

Qinan Yu (@qinan_yu) 's Twitter Profile Photo

🎀 fine-grained, interpretable representation steering for LMs! meet RePS — Reference-free Preference Steering! 1⃣ outperforms existing methods on 2B-27B LMs, nearly matching prompting 2⃣ supports both steering and suppression (beat system prompts!) 3⃣ jailbreak-proof (1/n)

🎀 fine-grained, interpretable representation steering for LMs!
meet RePS — Reference-free Preference Steering!

1⃣ outperforms existing methods on 2B-27B LMs, nearly matching prompting
2⃣ supports both steering and suppression (beat system prompts!)
3⃣ jailbreak-proof

(1/n)
Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models.

(Joint work with <a href="/achakravarthy01/">Ayush Chakravarthy</a>, <a href="/ryansehrlich/">Ryan Ehrlich</a>, <a href="/EyubogluSabri/">Sabri Eyuboglu</a>, <a href="/brad19brown/">Bradley Brown</a>, <a href="/jshetaye/">Joseph Shetaye</a>,