
Dean Carignan
@deancarignan
Chief of Staff for @Microsoft's Chief Scientific Officer; exploring responsible practices in AI, Data Science, ML Ops. Ex: @MSFTReseach @Mckinsey, @Worldbank
ID: 77415138
26-09-2009 06:35:47
396 Tweet
1,1K Followers
1,1K Following

Should we trust LLM evaluations on publicly available benchmarks?🤔 Our latest work studies the overfitting of few-shot learning with GPT-4. with Harsha Nori Vanessa Rodrigues Besmira Nushi 💙💛 and Rich Caruana Paper: arxiv.org/abs/2404.06209 More details👇 [1/N]
![Sebastian Bordt (@sbordt) on Twitter photo Should we trust LLM evaluations on publicly available benchmarks?🤔
Our latest work studies the overfitting of few-shot learning with GPT-4.
with <a href="/HarshaNori/">Harsha Nori</a> Vanessa Rodrigues <a href="/besanushi/">Besmira Nushi 💙💛</a> and Rich Caruana
Paper: arxiv.org/abs/2404.06209
More details👇 [1/N] Should we trust LLM evaluations on publicly available benchmarks?🤔
Our latest work studies the overfitting of few-shot learning with GPT-4.
with <a href="/HarshaNori/">Harsha Nori</a> Vanessa Rodrigues <a href="/besanushi/">Besmira Nushi 💙💛</a> and Rich Caruana
Paper: arxiv.org/abs/2404.06209
More details👇 [1/N]](https://pbs.twimg.com/media/GLifyinWUAAUa63.png)

Important in machine learning to recognize that patterns of error can change with model updates, such that new errors can show up—even when overall model accuracy increases. Besmira Nushi 💙💛 Microsoft Research









Apply the strategies of billion-dollar companies to your own business & life. Ed Mylett interviews #Microsoft Chief #Innovation Officers JoAnn Garbin and Dean Carignan, coauthors of the new book The Insider's Guide to #InnovationatMicrosoft Post Hill Press podcasts.apple.com/us/podcast/the…

How “Pasteur’s quadrant” enlightens the #invention-#innovation challenge. Big Think excerpts the new book The Insider's Guide to #InnovationatMicrosoft by #Microsoft Chief #Innovation Officers JoAnn Garbin and Dean Carignan from Post Hill Press bigthink.com/business/how-p…

Check out our new tutorial on Magentic-UI by Maya Murad Learn about Magentic's human-in-the-loop features including: 🧑‍🤝‍🧑 Co-planning 🤝 Co-tasking 🛡️ Action Guards 🧠Plan Learning 🔀 Parallel Task Execution 👇
