Shivam Garg (@shivamg_13) 's Twitter Profile
Shivam Garg

@shivamg_13

Researcher @MSFTResearch. Previously postdoc @Harvard and PhD student @Stanford.

ID: 128799170

calendar_today02-04-2010 06:33:50

34 Tweet

247 Followers

252 Following

Nikhil Vyas (@vyasnikhil96) 's Twitter Profile Photo

1/n A technical thread on our results in arxiv.org/pdf/2406.17748 on connecting the Shampoo optimizer and Optimal Kronecker product approximation of the the Adagrad (or Hessian) preconditioner.

Besmira Nushi 💙💛 (@besanushi) 's Twitter Profile Photo

Excited to announce the release of Eureka, an open-source framework for evaluating and understanding large foundation models! 🌟 Eureka offers: 🔍In-depth analysis of 12 cutting-edge models 🧠 Multimodal & language capability testing beyond single-score reporting and rankings 📈

Excited to announce the release of Eureka, an open-source framework for evaluating and understanding large foundation models! 🌟

Eureka offers: 🔍In-depth analysis of 12 cutting-edge models 🧠 Multimodal & language capability testing beyond single-score reporting and rankings 📈
John Langford (@johnclangford) 's Twitter Profile Photo

New reqs for low to high level researcher positions: jobs.careers.microsoft.com/global/en/job/… , jobs.careers.microsoft.com/global/en/job/…, jobs.careers.microsoft.com/global/en/job/…, jobs.careers.microsoft.com/global/en/job/…, with postdocs from Akshay and Miro Dudik x.com/MiroDudik/stat… . Please apply or pass to those who may :-)

Besmira Nushi 💙💛 (@besanushi) 's Twitter Profile Photo

💡Eureka Insight Day 5: This is a meta point about measurements in Eureka ML Insights. Several datasets in Eureka-Bench were procedurally/dynamically generated, including: GeoMeter, Image Understanding, Vision Language Understanding, Kitab, and Toxigen. This means that it is

💡Eureka Insight Day 5: This is a meta point about measurements in Eureka ML Insights. Several datasets in Eureka-Bench were procedurally/dynamically generated, including: GeoMeter, Image Understanding, Vision Language Understanding, Kitab, and Toxigen. This means that it is
Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

LLMs Can In-context Learn Multiple Tasks in Superposition We explore a bizarre LLM superpower that allows them to solve multiple ICL tasks in parallel. This is related to the view of them as simulators in superposition [cref:j⧉nus] arxiv.org/pdf/2410.05603 1/n

LLMs Can In-context Learn Multiple Tasks in Superposition

We explore a bizarre LLM superpower that allows them to solve multiple ICL tasks in parallel.

This is related to the view of them as simulators in superposition [cref:<a href="/repligate/">j⧉nus</a>]

arxiv.org/pdf/2410.05603
1/n
Ahmed Awadallah (@ahmedhawadallah) 's Twitter Profile Photo

Developing capable AI agents is one of the most interesting problems to work on in AI right now. Excited to share some of what we are building in the open! #OmniParser, #AutoGen and more to come soon!

AutoGen (@pyautogen) 's Twitter Profile Photo

📢Introducing Magentic-One, a generalist 5-agent multi-agent system for solving open-ended web- and file-based tasks. 🤖🤖🤖🤖🤖 Magentic-One represents a significant step towards agents that can complete tasks that people encounter in their daily lives and can achieve strong

📢Introducing Magentic-One, a generalist 5-agent multi-agent system for solving open-ended web- and file-based tasks. 🤖🤖🤖🤖🤖

Magentic-One represents a significant step towards agents that can complete tasks that people encounter in their daily lives and can achieve strong
Andrew Ilyas (@andrew_ilyas) 's Twitter Profile Photo

Machine unlearning ("removing" training data from a trained ML model) is a hard, important problem. Datamodel Matching (DMM): a new unlearning paradigm with strong empirical performance! w/ Kristian Georgiev Roy Rinberg Sam Park Shivam Garg Aleksander Madry Seth Neel (1/4)

Ahmed Awadallah (@ahmedhawadallah) 's Twitter Profile Photo

Synthetic data is becoming essential for training and fine-tuning models, but there’s a lot we still need to learn about best practices for generating, evaluating, and using it effectively. To support this research, we’re excited to release **orca-agentinstruct-1M**—a fully

Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

What is reasoning? Do LLMs use it? Does it help? Is o1 really that better than sonnet? How do you even measure all that? MSR AI Frontiers is working to figure it all out, and we're looking for interns to work on evals to better understand LLMs. Please apply!! Link below:

Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

How do you train reasoning models? What's the role of verifiers, RL, and synth data generation? How do these fit in multi-agent workflows? To find out, come join us for an internship at MSR AI Frontiers. Link below :D

John Langford (@johnclangford) 's Twitter Profile Photo

A new post: Headroom for AI Development hunch.net/?p=13763046 . It's quite interesting to compare biological and silicon capabilities.

Daniel Litt (@littmath) 's Twitter Profile Photo

In this thread I want to share some thoughts about the FrontierMath benchmark, on which, according to OpenAI, some frontier models are scoring ~20%. This is benchmark consisting of difficult math problems with numerical answers. What does it measure, and what doesn't it measure?

In this thread I want to share some thoughts about the FrontierMath benchmark, on which, according to OpenAI, some frontier models are scoring ~20%. This is benchmark consisting of difficult math problems with numerical answers. What does it measure, and what doesn't it measure?
John Langford (@johnclangford) 's Twitter Profile Photo

The Belief State Transformer edwardshu.com/bst-website/ is at ICLR this week. The BST objective efficiently creates compact belief states: summaries of the past sufficient for all future predictions. See the short talk: microsoft.com/en-us/research… and mgostIH for further discussion.

Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.
AutoGen (@pyautogen) 's Twitter Profile Photo

🚀 Introducing Magentic-UI — an experimental human-centered web agent from Microsoft Research . It automates your web tasks while keeping you in control 🧠🤝—through co-planning, co-tasking, action guards, and plan learning. 🔓 Fully open-source. We can't wait for you to try it. 🔗

Andrew Ilyas (@andrew_ilyas) 's Twitter Profile Photo

“How will my model behave if I change the training data?” Recent(-ish) work w/ Logan Engstrom: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).

“How will my model behave if I change the training data?”

Recent(-ish) work w/ <a href="/logan_engstrom/">Logan Engstrom</a>: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).