Valerie Chen (@valeriechen_) 's Twitter Profile
Valerie Chen

@valeriechen_

phd student @mldcmu @SCSatCMU + visitor @NYUDataScience | building @CopilotArena | previously @MSFTResearch @yale @CMU_Robotics @IBMResearch

ID: 1374055043230535685

linkhttps://valeriechen.github.io/ calendar_today22-03-2021 17:47:10

279 Tweet

1,1K Followers

480 Following

Valerie Chen (@valeriechen_) 's Twitter Profile Photo

The presentation is happening today at the Programming and Software Use session (G401)! More details about the paper below👇

Valerie Chen (@valeriechen_) 's Twitter Profile Photo

Can we use LLMs to generate high-quality *and* original text for creative tasks? We explore where existing models fall on these two axes and try to understand what techniques can push the frontier of novel LLM outputs. Check out Vishakh Padmakumar's thread for more details 👇

Valerie Chen (@valeriechen_) 's Twitter Profile Photo

Who is winning the race to claim the LLMs for SWE market? We share our thoughts based on our Copilot Arena work. See article below for current sentiments and what lies ahead 👇

Hussein Mozannar (@hsseinmzannar) 's Twitter Profile Photo

Excited to release my first lead project Magentic-UI at Microsoft Research, an OS web agent application designed for efficient human-agent interaction. CUA agents are cool but they're not so useful yet, Magentic-UI helps us study how to get value from them. github.com/microsoft/mage…

Ameet Talwalkar (@atalwalkar) 's Twitter Profile Photo

I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵

I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵
NYU Center for Data Science (@nyudatascience) 's Twitter Profile Photo

CDS PhD student Vishakh Padmakumar, with co-authors John (Yueh-Han) Chen, Jane Pan, Valerie Chen, and CDS Associate Professor He He, has published new research on the trade-off between originality and quality in LLM outputs. Read more: nyudatascience.medium.com/in-ai-generate…

Copilot Arena (@copilotarena) 's Twitter Profile Photo

New result: Qwen-2.5-Coder jumps from 13th to joint 1st place with fill-in-the-middle (FiM)! Congrats to Qwen 🥳 Also check out lmarena.ai 's new UI 🖥️✨

New result: Qwen-2.5-Coder jumps from 13th to joint 1st place with fill-in-the-middle (FiM)! Congrats to <a href="/Alibaba_Qwen/">Qwen</a> 🥳

Also check out <a href="/lmarena_ai/">lmarena.ai</a> 's new UI 🖥️✨
elvis (@omarsar0) 's Twitter Profile Photo

Coding Agents 🤝 Multimodal Browsing Can AI agents generalize beyond their intended scope? Great paper on how you can build generalist agents with superior performance over specialized agents. What models and tools work the best? Here are my notes:

Coding Agents 🤝 Multimodal Browsing

Can AI agents generalize beyond their intended scope?

Great paper on how you can build generalist agents with superior performance over specialized agents.

What models and tools work the best?

Here are my notes:
Valerie Chen (@valeriechen_) 's Twitter Profile Photo

Exciting new work led by Aditya Soni showing how a few tools can enable agents to solve diverse tasks — from software engineering 🧑‍💻 to information seeking 🔍. Even more exciting to see some of these contributions integrated into OpenHands👐! Check out 🧵for more details✨

All Hands AI (@allhands_ai) 's Twitter Profile Photo

The paper about this versatile agent, OpenHands-Versa, was lead by Aditya Soni at CMU, and you can read much more about the methodology: - His summary: x.com/Aditya_Soni_8/… - The paper: arxiv.org/abs/2506.03011 - Our blog: all-hands.dev/blog/building-…

Graham Neubig (@gneubig) 's Twitter Profile Photo

Huge shout-out to Aditya Soni at CMU, who's amazing work on his paper laid the foundation for accuracy improvements on many tasks: x.com/Aditya_Soni_8/… And Juan at All Hands AI, who set up VersaBench to do such a diverse variety of benchmarking.

Xingyao Wang (@xingyaow_) 's Twitter Profile Photo

Very excited about OpenHands Versa! With it, OpenHands just got even more versatile — I asked it today to update my website with this paper: "Can you add this to my paper list for this year? arxiv.org/abs/2506.03011" Details and prompts in 🧵

Aditya Soni (@aditya_soni_8) 's Twitter Profile Photo

Excited about the results! OpenHands-Versa ranks #1 both in terms of accuracy and cost 🚀 The cost savings are primarily due to context condensation in OpenHands-Versa: it suffices to retain the most recent browsing observation instead of all previous browsing observations.