Yuxiang (Jimmy) Wu (@yuxiangjwu) 's Twitter Profile
Yuxiang (Jimmy) Wu

@yuxiangjwu

Co-founder & CTO @WecoAI | Building AI that builds AI | UCL PhD | LLM, NLP, ML | previously @allen_ai @MetaAI @MSFTResearch

ID: 2883271903

calendar_today30-10-2014 11:23:55

223 Tweet

1,1K Followers

1,1K Following

Yuandong Tian (@tydsh) 's Twitter Profile Photo

Great to hear that OpenAI has confirmed that the startup Weco AI, co-founded by my former intern Zhengyao Jiang, has the best Machine Learning Engineer Agent in the world😀in their MLE-Bench. Congrats!

Yuxiang (Jimmy) Wu (@yuxiangjwu) 's Twitter Profile Photo

We’re hiring a full-time Frontend Engineer! Join us to build next-gen apps that bring AI to life through natural language. If you’re passionate about product and AI, apply now! #hiring #frontend #AI linkedin.com/jobs/view/4069…

Laura Ruis (@lauraruis) 's Twitter Profile Photo

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢 🧵⬇️

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️
METR (@metr_evals) 's Twitter Profile Photo

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

AIDE was built for tabular Machine Learning and optimized for GPT-4. It surprised me by generalizing to new models (o1) & deep learning tasks by OpenAI MLE-Bench. RE-Bench now shows it scaling to cutting-edge AI research, this is mind-blowing!

AIDE was built for tabular Machine Learning and optimized for GPT-4.

It surprised me by generalizing to new models (o1) &amp; deep learning tasks by <a href="/OpenAI/">OpenAI</a>  MLE-Bench.

RE-Bench now shows it scaling to cutting-edge AI research, this is mind-blowing!
Sohee Yang (@soheeyang_) 's Twitter Profile Photo

🚨 New Paper 🚨 Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80%+ for

Jiao Sun (@sunjiao123sun_) 's Twitter Profile Photo

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans! 

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a> 

We have ethical reviews for authors, but missed it for invited speakers? 😡
Yuxiang (Jimmy) Wu (@yuxiangjwu) 's Twitter Profile Photo

Over the past few months, the Weco AI team has been hard at work building Weco AI Functions—a platform that simplifies adding and optimizing AI features with just a function call. My favorite part? Effortless A/B testing and versioning. You can compare multiple LLMs

Over the past few months, the <a href="/WecoAI/">Weco AI</a> team has been hard at work building Weco AI Functions—a platform that simplifies adding and optimizing AI features with just a function call.

My favorite part? Effortless A/B testing and versioning. You can compare multiple LLMs
Yuandong Tian (@tydsh) 's Twitter Profile Photo

Nice experience😀. Define a function with natural language, and the function call is available to you immediately anywhere. "What you think immediately becomes what you get" 🚀🚀

Machine Learning Street Talk (@mlstreettalk) 's Twitter Profile Photo

We spoke with Laura Ruis from Cohere For AI and UCL about her paper "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models" where she demonstrated an interesting gap between retrieval and reasoning queries in LLMs indicating the presence of synthesised

Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

The entire internet era has been about transmitting information. AI will take computer science to the next level, by generating new knowledge through trial and error. LLMs following the scientific methodology can already automate R&D. Check the paper of AIDE!

Yuxiang (Jimmy) Wu (@yuxiangjwu) 's Twitter Profile Photo

I used to spend weeks in trial-and-error loops building deep learning models, until we built AIDE to handle that work for us. Now I can tackle more than 20 ML problems at once and train 1,000+ models in parallel. It’s incredibly empowering! See how we’re rethinking machine

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Love this project: nanoGPT -> recursive self-improvement benchmark. Good old nanoGPT keeps on giving and surprising :) - First I wrote it as a small little repo to teach people the basics of training GPTs. - Then it became a target and baseline for my port to direct C/CUDA