Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile
Zhengyao Jiang

@zhengyaojiang

Cofounder @WecoAI. Building AI agents that build AI. PhD in Machine Learning, UCL @UCL_DARK @ai_ucl. (Zheng=j-uhng, j as in job; yao=y-aoww)

ID: 4074787285

linkhttp://zhengyaojiang.github.io calendar_today31-10-2015 01:52:12

328 Tweet

3,3K Followers

352 Following

Chen Sun 🤖🧠🇨🇦 (@chensun92) 's Twitter Profile Photo

Our team at Google DeepMind is looking to hire a talented new Research Scientist! Our group (under Ed H. Chi) aims to push the frontier of AI-human interactions through personalization of LLMs and deeply understanding the open-ended nature of user intentions. Beneath this lies

Our team at <a href="/GoogleDeepMind/">Google DeepMind</a> is looking to hire a talented new Research Scientist!

Our group (under <a href="/edchi/">Ed H. Chi</a>) aims to push the frontier of AI-human interactions through personalization of LLMs and deeply understanding the open-ended nature of user intentions.

Beneath this lies
Jakob Foerster (@j_foerst) 's Twitter Profile Photo

🚨job alert🚨Foerster Lab for AI Research is looking for a postdoc, deadline is 1st of September (my.corehr.com/pls/uoxrecruit…) By many reports we are on the global Pareto frontier of talent density, agency and ambition. Join us!

Edward Grefenstette (@egrefen) 's Twitter Profile Photo

🚨 Research Scientist Hiring alert! 🚨 Applications close THIS FRIDAY for Google DeepMind research scientist roles to work on autonomous assistants and human-facing agentic capabilities, self-improvement, and open-endedness.

Minqi Jiang (@minqijiang) 's Twitter Profile Photo

Weco is one of the most under-the-radar teams building frontier agents. Very excited to see they now have the resources to take their vision for recursively-improving ASI to the next level. Congrats Zhengyao Jiang, Yuxiang (Jimmy) Wu, Dhruv Srikanth, and co!

Jack Parker-Holder (@jparkerholder) 's Twitter Profile Photo

Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time

Tim Rocktäschel (@_rockt) 's Twitter Profile Photo

Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world model. Fantastic cross-team effort led by Jack Parker-Holder and Shlomi Fruchter. Below some interactive worlds and capabilities that were highlights for me

Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

Honestly pretty blown away. You control a realistic avatar with your keyboard in real-time, all powered by a single neural network. The mind-blowing bit: the real world's computation happens at the atomic level, unimaginably expensive to simulate fully (just imagine storing

Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

Quick takes on GPT-5 on MLE-Bench: - It's not based on the full set or the lite set but rather the "30 most interesting" competitions. - Not very good scientific practice. - Medal rates for all models are quite low (below 10%). It seems they didn't apply any agentic

Quick takes on GPT-5 on MLE-Bench:

- It's not based on the full set or the lite set but rather the "30 most interesting" competitions.
    - Not very good scientific practice.
- Medal rates for all models are quite low (below 10%). It seems they didn't apply any agentic
Tom Johnson (@tomjohndesign) 's Twitter Profile Photo

This is how I feel about vibe coding. Any project I try that has any kind of complication has this immediate burst of progress. Things are amazing and it feels like a superpower. Then... as I add more complexity, things crash to a halt. The only projects that I think I can

This is how I feel about vibe coding.

Any project I try that has any kind of complication has this immediate burst of progress. Things are amazing and it feels like a superpower. Then... as I add more complexity, things crash to a halt.

The only projects that I think I can
Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

Not surprised that CoT fails when problem depth grows. RL post-training will lengthen chains, but it won’t make infinite-horizon reasoners. Humans get there by caching conclusions in persistent memory (notes but primarily synapses). Memory is the frontier.

elvis (@omarsar0) 's Twitter Profile Photo

M3-Agent: A Multimodal Agent with Long-Term Memory Impressive application of multimodal agents. Lots of great insights throughout the paper. Here are my notes with key insights:

M3-Agent: A Multimodal Agent with Long-Term Memory

Impressive application of multimodal agents. 

Lots of great insights throughout the paper.

Here are my notes with key insights:
Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

People used polymarket to get a prior on which frontier lab would ship the best model next. Turns out we can just ask an LLM now. For many unresolved events, querying an AI already gives you a forecast that’s as informative as prediction markets.

People used polymarket to get a prior on which frontier lab would ship the best model next.
Turns out we can just ask an LLM now.

For many unresolved events, querying an AI already gives you a forecast that’s as informative as prediction markets.
Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

A simple inference time heuristic pushes GPT-OSS-120B to 99.9% on AIME’25 (GPT-5-pro level). My interpretation: pretraining + SFT + KL divergence heavily anchor the policy to “autocomplete style” reasoning, the model struggles to drop unproductive rollouts once it started. So