Liana (@lianapatel_) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚀 Excited to share POPPER --- Automated Hypothesis Validation with Agentic Sequential Falsifications. Looking forward to seeing how the community builds on this! 🚀💡 🔗 GitHub: github.com/snap-stanford/… 📄 Paper: arxiv.org/abs/2502.09858 Huge thanks to the incredible team behind

thumb_up_off_alt60

chat_bubble_outline8

repeat12

shareShare

Sid Jha

@sid_jha1

5 months ago

Happy to see many new integrations being built inside LOTUS! We hope that it makes writing LM programs even faster 🚀

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Melissa Pan

@melissapan

5 months ago

Checkout LOTUS 1.1.0 - a bunch of new features added! 🔥🚀

thumb_up_off_alt19

chat_bubble_outline0

repeat2

shareShare

NovaSky

@novaskyai

5 months ago

1/8 🚀 Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain NovaSky . S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

1/8 🚀
Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain <a href="/NovaSkyAI/">NovaSky</a> .

S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

thumb_up_off_alt205

chat_bubble_outline7

repeat49

shareShare

Simon Guo 🦝

@simonguozirui

5 months ago

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

thumb_up_off_alt305

chat_bubble_outline9

repeat68

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

4 months ago

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

$🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.$

thumb_up_off_alt142

chat_bubble_outline3

repeat42

shareShare

Ankush Singal

@andysingal

4 months ago

Struggling with complex data queries? Enter LOTUS! 🤯 This new engine combines LLMs with databases for powerful, AI-driven insights. Think Text2SQL on steroids! Liana Medium link: medium.com/ai-artistry/lo…

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

4 months ago

One of the main goals I had while building out multilspy (aka.ms/multilspy) was that eventually LLMs will be able to tool call LSPs. Happy to see steps in this direction: Checkout MultilspyMCP (playbooks.com/mcp/asimihsan-…), which provides an mcp implementation over multilspy!

thumb_up_off_alt90

chat_bubble_outline1

repeat21

shareShare

Azalia Mirhoseini

@azaliamirh

3 months ago

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and

$In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and$

thumb_up_off_alt170

chat_bubble_outline0

repeat33

shareShare

Jared Quincy Davis

@jaredq_

3 months ago

Ember: an inference-time scaling architecture framework 🧵 (1/8)

thumb_up_off_alt276

chat_bubble_outline10

repeat106

shareShare

Melissa Pan

@melissapan

3 months ago

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

thumb_up_off_alt186

chat_bubble_outline4

repeat54

shareShare

Azalia Mirhoseini

@azaliamirh

3 months ago

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive

thumb_up_off_alt390

chat_bubble_outline3

repeat74

shareShare

Rose

@rose_e_wang

2 months ago

I defended my PhD from Stanford CS Stanford NLP Group 🌲 w/ Stanford CS first all-female committee!! My dissertation focused on AI methods, evaluations & interventions to improve Education. So much gratitude for the support & love - and SO excited for the next chapter!!!! 🥳

I defended my PhD from Stanford CS <a href="/stanfordnlp/">Stanford NLP Group</a> 🌲 w/ Stanford CS first all-female committee!! My dissertation focused on AI methods, evaluations & interventions to improve Education.

So much gratitude for the support & love - and SO excited for the next chapter!!!! 🥳

thumb_up_off_alt748

chat_bubble_outline52

repeat43

shareShare

Omar Khattab

@lateinteraction

2 months ago

So many things in the run-up to DSPy 3. Here's a first, EXPERIMENTAL one: 🚨We're releasing dspy.GRPO, an online RL optimizer for DSPy programs Your DSPy code as-is can be dspy.GRPO'ed. Yes, even compound multi-module programs. Led by Noah Ziems Lakshya A Agrawal dilara.

thumb_up_off_alt571

chat_bubble_outline23

repeat77

shareShare

Qinan Yu

@qinan_yu

2 months ago

🎀 fine-grained, interpretable representation steering for LMs! meet RePS — Reference-free Preference Steering! 1⃣ outperforms existing methods on 2B-27B LMs, nearly matching prompting 2⃣ supports both steering and suppression (beat system prompts!) 3⃣ jailbreak-proof (1/n)

thumb_up_off_alt212

chat_bubble_outline1

repeat35

shareShare

Jordan Juravsky

@jordanjuravsky

a month ago

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

thumb_up_off_alt168

chat_bubble_outline3

repeat38

shareShare

Liana

Gate.io

Jure Leskovec

Sid Jha

Melissa Pan

NovaSky

Simon Guo 🦝

Lakshya A Agrawal

Ankush Singal

Lakshya A Agrawal

Azalia Mirhoseini

Jared Quincy Davis

Melissa Pan

Azalia Mirhoseini

Rose

Omar Khattab

Qinan Yu

Jordan Juravsky