Kaiyu Yang (@kaiyuyang4) 's Twitter Profile
Kaiyu Yang

@kaiyuyang4

Research Scientist at @Meta Fundamental AI Research (FAIR). Previously: Postdoc @Caltech, PhD @PrincetonCS, Undergrad @Tsinghua_Uni.

ID: 1134665912227946497

linkhttps://yangky11.github.io/ calendar_today01-06-2019 03:40:09

240 Tweet

3,3K Followers

2,2K Following

Alex Gu @ iclr (@minimario1729) 's Twitter Profile Photo

📢 Excited to share our new paper: Challenges and Paths Towards AI for SWE We discuss: 🛠️ 6 sub-tasks needed for SWE 🤖 9 challenges of today's AI in SWE 🔮 9 future directions to address the challenges w/ collaborators from MIT, Berkeley, Cornell, Stanford, and UPenn ⬇️ (1/n)

📢 Excited to share our new paper: Challenges and Paths Towards AI for SWE

We discuss:
🛠️ 6 sub-tasks needed for SWE
🤖 9 challenges of today's AI in SWE
🔮 9 future directions to address the challenges

w/ collaborators from MIT, Berkeley, Cornell, Stanford, and UPenn

⬇️ (1/n)
Yong Lin (@yong18850571) 's Twitter Profile Photo

We are excited to announce the release of Goedel-Pset (huggingface.co/datasets/Goede…), the largest Lean statement dataset, which contains 1.73 million samples. Goedel-Pset is 10 times larger than Lean Workbook. We hope this resource will facilitate further research within the

We are excited to announce the release of Goedel-Pset (huggingface.co/datasets/Goede…), the largest Lean statement dataset, which contains 1.73 million samples. Goedel-Pset is 10 times larger than Lean Workbook. We hope this resource will facilitate further research within the
Lean (@leanprover) 's Twitter Profile Photo

Fascinating talk by Thomas Hubert on AlphaProof at IMO 2024! Combining Lean's formal verification with DeepMind's RL techniques led to solving one of the hardest problems that stumped most humans. Watch: youtube.com/watch?v=TFBzP7… #LeanLang #AlphaProof

Tom Zahavy (@tzahavy) 's Twitter Profile Photo

I am looking to hire a student researcher to work with AlphaProof on a project at the intersection of AI, math, computation, and creativity. Background in AI for math, and/or Lean is desired. If interested, please get in touch. The position will be based in London.

Jia Li (@jiali52524397) 's Twitter Profile Photo

We believe formal math is the future. 🔥Introducing Kimina-Prover Preview, a Numina & Kimi.ai collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F. github.com/MoonshotAI/Kim…

We believe formal math is the future.
🔥Introducing Kimina-Prover Preview, a Numina &
<a href="/Kimi_Moonshot/">Kimi.ai</a>  collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F.
github.com/MoonshotAI/Kim…
UC Berkeley RDI (@berkeleyrdi) 's Twitter Profile Photo

What if AI could generate mathematical proofs that can be verified rigorously by machines? 🤖🧮 In Lecture 9 of Advanced LLM Agents MOOC, Kaiyu Yang (Meta FAIR) explores how large language models merge with formal systems like Lean to deliver fully verifiable math! #FormalMath

What if AI could generate mathematical proofs that can be verified rigorously by machines? 🤖🧮 In Lecture 9 of Advanced LLM Agents MOOC, <a href="/KaiyuYang4/">Kaiyu Yang</a> (<a href="/Meta/">Meta</a> FAIR) explores how large language models merge with formal systems like Lean to deliver fully verifiable math! #FormalMath
Dawn Song (@dawnsongtweets) 's Twitter Profile Photo

🔥 Really excited to announce close to 1,000 teams already registered for #AgentX—building the future of Agentic AI across Entrepreneurship & Research tracks! 🚀 💰 Prize pool now $125K+, with total prizes/resources surpassing $400K! 🏆 Highlights: 💸 $40K CASH AWARDS sponsored

🔥 Really excited to announce close to 1,000 teams already registered for #AgentX—building the future of Agentic AI across Entrepreneurship &amp; Research tracks! 🚀
💰 Prize pool now $125K+, with total prizes/resources surpassing $400K!

🏆 Highlights:
💸 $40K CASH AWARDS sponsored
Sean Welleck (@wellecks) 's Twitter Profile Photo

I was honored to give a talk on AI for theorem proving for the Berkeley Advanced LLM Agents course! "Bridging Informal and Formal Mathematical Reasoning with AI" Youtube: youtube.com/live/Gy5Nm17l9… Slides: wellecks.com/data/welleck20… It covers three themes from our recent work: -

I was honored to give a talk on AI for theorem proving for the Berkeley Advanced LLM Agents course! 

"Bridging Informal and Formal Mathematical Reasoning with AI"

Youtube: youtube.com/live/Gy5Nm17l9…
Slides: wellecks.com/data/welleck20…

It covers three themes from our recent work:
-
Zhaoyu Li (@_zhaoyu_li_) 's Twitter Profile Photo

Come join our AI for Math & Theorem Proving social at #ICLR2025! Looking forward to talking with everyone interested in LLMs for reasoning!

Dawn Song (@dawnsongtweets) 's Twitter Profile Photo

📣 Today 4/21 at 10:10 AM PT, join us for the 11th Advanced LLM Agents MOOC lecture on Program Verification & Generating Verified Code by Swarat Chaudhuri UT Austin. 🌐 Join the thriving community of the LLM Agents MOOC series, with 23K+ registered learners & 10K+ members on

📣 Today 4/21 at 10:10 AM PT, join us for the 11th Advanced LLM Agents MOOC lecture on Program Verification &amp; Generating Verified Code by Swarat Chaudhuri <a href="/UTAustin/">UT Austin</a>.
🌐 Join the thriving community of the LLM Agents MOOC series, with 23K+ registered learners &amp; 10K+ members on
Chi Jin (@chijinml) 's Twitter Profile Photo

Writing math proof in Lean is surprisingly addictive. Watching Terence Tao formalize Lean proofs feels like watching a top-tier gamer playing on Twitch. :-) youtube.com/watch?v=c1ixXM…

George Tsoukalas (@gtsoukal) 's Twitter Profile Photo

DeepSeekProverV2 solves 47/657 problems on PutnamBench! The model represents a substantial advance in theorem proving. The previous best model only solved 10 problems! I'm excited to see DeepSeek's performance on IMO 2025 :)

Pan Lu (@lupantech) 's Twitter Profile Photo

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs &amp; reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs.

➡️ ineqmath.github.io

To tackle
Christian Szegedy (@chrszegedy) 's Twitter Profile Photo

A mathematical paper autoformalized for the first time: amazing work by Morph, presented today at the Big Proof conference by Jared Duker Lichtman and Jesse Michael Han. I am very impressed by the blazing fast progress of the morph team. Especially by Leyan Pan and _.

Dawn Song (@dawnsongtweets) 's Twitter Profile Photo

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity.
 In our latest work:

 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects

 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars
🤖