Harsha Nori (@harshanori) 's Twitter Profile
Harsha Nori

@harshanori

Director, Research Engineering at Microsoft:

microsoft.com/en-us/research…

ID: 975109962

calendar_today28-11-2012 01:29:53

76 Tweet

568 Followers

209 Following

Microsoft Research (@msftresearch) 's Twitter Profile Photo

Discover the best prompting techniques for the OpenAI o1-preview model, which achieved unprecedented accuracy on medical benchmarks. msft.it/6019WNc35

Discover the best prompting techniques for the OpenAI o1-preview model, which achieved unprecedented accuracy on medical benchmarks. msft.it/6019WNc35
Paul Calcraft (@paul_cal) 's Twitter Profile Photo

Microsoft test-time compute scaling tests on o1: few % point bumps on medical reasoning for 15-30% more reasoning tokens. But overall enhanced pareto frontier. Shown across prompting & models for perf vs $ o1 has nearly saturated this bench set & needs so much less scaffolding

Microsoft test-time compute scaling tests on o1: few % point bumps on medical reasoning for 15-30% more reasoning tokens. But overall enhanced pareto frontier. Shown across prompting & models for perf vs $

o1 has nearly saturated this bench set & needs so much less scaffolding
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Reasoning models like o1/o3 will dramatically improve medical use-cases of LLMs. Here's some evidence for that. Microsoft tested o1-preview on a bunch of medical QA tasks. Not only did it achieve new SOTA, but spending more time thinking with more reasoning tokens improved the

Reasoning models like o1/o3 will dramatically improve medical use-cases of LLMs.

Here's some evidence for that. Microsoft tested o1-preview on a bunch of medical QA tasks.  Not only did it achieve new SOTA, but spending more time thinking with more reasoning tokens improved the
Harsha Nori (@harshanori) 's Twitter Profile Photo

Absolutely fantastic work by Karan Singhal and co-authors OpenAI . Comprehensive and thoughtful evaluations for language models applications in healthcare. Open source evaluation work like this really moves the entire field forward.

Harsha Nori (@harshanori) 's Twitter Profile Photo

Was a fantastic collaboration on bringing guidance to the full family of OpenAI models. The most comprehensive structured outputs meet the world's best models 🫶 github.com/guidance-ai Shoutout Michal Moskal Andrew Braunstein cc Michelle Pokrass Nikunj Handa Eric Horvitz Kevin Scott

Derya Unutmaz, MD (@deryatr_) 's Twitter Profile Photo

MAI-DxO appears to be a remarkable advance in medical AI! “> MAI-DxO boosted performance of every model tested on those 304 cases > 85.5% solve rate vs. 20% by a group of physicians > Its higher accuracy came with LOWER overall testing costs than lone LLMs or physicians”

Satya Nadella (@satyanadella) 's Twitter Profile Photo

Excited to share two advances that bring us closer to real-world impact in healthcare AI: SDBench introduces a new benchmark that transforms 304 NEJM cases into interactive diagnostic simulations. AI must ask questions, order tests, and weigh costs, mirroring the complexity of

Excited to share two advances that bring us closer to real-world impact in healthcare AI:

SDBench introduces a new benchmark that transforms 304 NEJM cases into interactive diagnostic simulations. AI must ask questions, order tests, and weigh costs, mirroring the complexity of