Shivam Garg (@shivamg_13) Twitter Tweets • TwiCopy

Nikhil Vyas

a year ago

1/n A technical thread on our results in arxiv.org/pdf/2406.17748 on connecting the Shampoo optimizer and Optimal Kronecker product approximation of the the Adagrad (or Hessian) preconditioner.

thumb_up_off_alt43

chat_bubble_outline8

repeat11

shareShare

Excited to announce the release of Eureka, an open-source framework for evaluating and understanding large foundation models! 🌟 Eureka offers: 🔍In-depth analysis of 12 cutting-edge models 🧠 Multimodal & language capability testing beyond single-score reporting and rankings 📈

thumb_up_off_alt114

chat_bubble_outline3

repeat29

shareShare

John Langford

@johnclangford

a year ago

New reqs for low to high level researcher positions: jobs.careers.microsoft.com/global/en/job/… , jobs.careers.microsoft.com/global/en/job/…, jobs.careers.microsoft.com/global/en/job/…, jobs.careers.microsoft.com/global/en/job/…, with postdocs from Akshay and Miro Dudik x.com/MiroDudik/stat… . Please apply or pass to those who may :-)

thumb_up_off_alt108

chat_bubble_outline0

repeat33

shareShare

Besmira Nushi 💙💛

@besanushi

a year ago

💡Eureka Insight Day 5: This is a meta point about measurements in Eureka ML Insights. Several datasets in Eureka-Bench were procedurally/dynamically generated, including: GeoMeter, Image Understanding, Vision Language Understanding, Kitab, and Toxigen. This means that it is

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare

Dimitris Papailiopoulos

@dimitrispapail

a year ago

LLMs Can In-context Learn Multiple Tasks in Superposition We explore a bizarre LLM superpower that allows them to solve multiple ICL tasks in parallel. This is related to the view of them as simulators in superposition [cref:j⧉nus] arxiv.org/pdf/2410.05603 1/n

thumb_up_off_alt643

chat_bubble_outline16

repeat122

shareShare

Ahmed Awadallah

@ahmedhawadallah

a year ago

Developing capable AI agents is one of the most interesting problems to work on in AI right now. Excited to share some of what we are building in the open! #OmniParser, #AutoGen and more to come soon!

thumb_up_off_alt50

chat_bubble_outline5

repeat11

shareShare

AutoGen

@pyautogen

a year ago

📢Introducing Magentic-One, a generalist 5-agent multi-agent system for solving open-ended web- and file-based tasks. 🤖🤖🤖🤖🤖 Magentic-One represents a significant step towards agents that can complete tasks that people encounter in their daily lives and can achieve strong

thumb_up_off_alt424

chat_bubble_outline3

repeat112

shareShare

Andrew Ilyas

@andrew_ilyas

a year ago

Machine unlearning ("removing" training data from a trained ML model) is a hard, important problem. Datamodel Matching (DMM): a new unlearning paradigm with strong empirical performance! w/ Kristian Georgiev Roy Rinberg Sam Park Shivam Garg Aleksander Madry Seth Neel (1/4)

thumb_up_off_alt137

chat_bubble_outline2

repeat23

shareShare

Ahmed Awadallah

@ahmedhawadallah

a year ago

Synthetic data is becoming essential for training and fine-tuning models, but there’s a lot we still need to learn about best practices for generating, evaluating, and using it effectively. To support this research, we’re excited to release **orca-agentinstruct-1M**—a fully

thumb_up_off_alt88

chat_bubble_outline5

repeat17

shareShare

Dimitris Papailiopoulos

@dimitrispapail

a year ago

What is reasoning? Do LLMs use it? Does it help? Is o1 really that better than sonnet? How do you even measure all that? MSR AI Frontiers is working to figure it all out, and we're looking for interns to work on evals to better understand LLMs. Please apply!! Link below:

thumb_up_off_alt145

chat_bubble_outline4

repeat8

shareShare

Dimitris Papailiopoulos

@dimitrispapail

a year ago

How do you train reasoning models? What's the role of verifiers, RL, and synth data generation? How do these fit in multi-agent workflows? To find out, come join us for an internship at MSR AI Frontiers. Link below :D

thumb_up_off_alt141

chat_bubble_outline8

repeat9

shareShare

Joe Hudson

@fu_joehudson

a year ago

How to be more emotionally intelligent (without trying so hard) 🧵 for Threadapalooza!

thumb_up_off_alt4,4K

chat_bubble_outline117

repeat845

shareShare

Seth Neel

@sethinternet

10 months ago

Excited to see our work on leveraging datamodels for unlearning published in ICLR 2026 — check out our blog post below for details!

thumb_up_off_alt13

chat_bubble_outline1

repeat2

shareShare

Ahmed Awadallah

@ahmedhawadallah

9 months ago

Better, faster screen parsing for GUI agents with OmniParser v2 Check out the Gradio space!

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare

John Langford

@johnclangford

9 months ago

A new post: Headroom for AI Development hunch.net/?p=13763046 . It's quite interesting to compare biological and silicon capabilities.

thumb_up_off_alt16

chat_bubble_outline0

repeat6

shareShare

Daniel Litt

@littmath

9 months ago

In this thread I want to share some thoughts about the FrontierMath benchmark, on which, according to OpenAI, some frontier models are scoring ~20%. This is benchmark consisting of difficult math problems with numerical answers. What does it measure, and what doesn't it measure?

thumb_up_off_alt944

chat_bubble_outline27

repeat131

shareShare

John Langford

@johnclangford

7 months ago

The Belief State Transformer edwardshu.com/bst-website/ is at ICLR this week. The BST objective efficiently creates compact belief states: summaries of the past sufficient for all future predictions. See the short talk: microsoft.com/en-us/research… and mgostIH for further discussion.

thumb_up_off_alt104

chat_bubble_outline5

repeat19

shareShare

Dimitris Papailiopoulos

@dimitrispapail

7 months ago

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat237

shareShare

AutoGen

@pyautogen

6 months ago

🚀 Introducing Magentic-UI — an experimental human-centered web agent from Microsoft Research . It automates your web tasks while keeping you in control 🧠🤝—through co-planning, co-tasking, action guards, and plan learning. 🔓 Fully open-source. We can't wait for you to try it. 🔗

thumb_up_off_alt276

chat_bubble_outline7

repeat70

shareShare

Andrew Ilyas

@andrew_ilyas

5 months ago

“How will my model behave if I change the training data?” Recent(-ish) work w/ Logan Engstrom: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).

“How will my model behave if I change the training data?”

Recent(-ish) work w/ <a href="/logan_engstrom/">Logan Engstrom</a>: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).

thumb_up_off_alt381

chat_bubble_outline10

repeat66

shareShare