Kaiyu Yang (@kaiyuyang4) Twitter Tweets • TwiCopy

Alex Gu @ iclr

7 months ago

📢 Excited to share our new paper: Challenges and Paths Towards AI for SWE We discuss: 🛠️ 6 sub-tasks needed for SWE 🤖 9 challenges of today's AI in SWE 🔮 9 future directions to address the challenges w/ collaborators from MIT, Berkeley, Cornell, Stanford, and UPenn ⬇️ (1/n)

thumb_up_off_alt123

chat_bubble_outline3

repeat32

shareShare

Yong Lin

@yong18850571

7 months ago

We are excited to announce the release of Goedel-Pset (huggingface.co/datasets/Goede…), the largest Lean statement dataset, which contains 1.73 million samples. Goedel-Pset is 10 times larger than Lean Workbook. We hope this resource will facilitate further research within the

thumb_up_off_alt79

chat_bubble_outline1

repeat19

shareShare

Lean

@leanprover

7 months ago

Fascinating talk by Thomas Hubert on AlphaProof at IMO 2024! Combining Lean's formal verification with DeepMind's RL techniques led to solving one of the hardest problems that stumped most humans. Watch: youtube.com/watch?v=TFBzP7… #LeanLang #AlphaProof

thumb_up_off_alt163

chat_bubble_outline0

repeat41

shareShare

Tom Zahavy

@tzahavy

7 months ago

I am looking to hire a student researcher to work with AlphaProof on a project at the intersection of AI, math, computation, and creativity. Background in AI for math, and/or Lean is desired. If interested, please get in touch. The position will be based in London.

thumb_up_off_alt548

chat_bubble_outline19

repeat47

shareShare

Jia Li

@jiali52524397

7 months ago

We believe formal math is the future. 🔥Introducing Kimina-Prover Preview, a Numina & Kimi.ai collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F. github.com/MoonshotAI/Kim…

We believe formal math is the future.
🔥Introducing Kimina-Prover Preview, a Numina &
<a href="/Kimi_Moonshot/">Kimi.ai</a> collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F.
github.com/MoonshotAI/Kim…

thumb_up_off_alt759

chat_bubble_outline29

repeat134

shareShare

UC Berkeley RDI

@berkeleyrdi

7 months ago

What if AI could generate mathematical proofs that can be verified rigorously by machines? 🤖🧮 In Lecture 9 of Advanced LLM Agents MOOC, Kaiyu Yang (Meta FAIR) explores how large language models merge with formal systems like Lean to deliver fully verifiable math! #FormalMath

What if AI could generate mathematical proofs that can be verified rigorously by machines? 🤖🧮 In Lecture 9 of Advanced LLM Agents MOOC, <a href="/KaiyuYang4/">Kaiyu Yang</a> (<a href="/Meta/">Meta</a> FAIR) explores how large language models merge with formal systems like Lean to deliver fully verifiable math! #FormalMath

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Dawn Song

@dawnsongtweets

7 months ago

🔥 Really excited to announce close to 1,000 teams already registered for #AgentX—building the future of Agentic AI across Entrepreneurship & Research tracks! 🚀 💰 Prize pool now $125K+, with total prizes/resources surpassing $400K! 🏆 Highlights: 💸 $40K CASH AWARDS sponsored

thumb_up_off_alt82

chat_bubble_outline7

repeat15

shareShare

Sean Welleck

@wellecks

7 months ago

I was honored to give a talk on AI for theorem proving for the Berkeley Advanced LLM Agents course! "Bridging Informal and Formal Mathematical Reasoning with AI" Youtube: youtube.com/live/Gy5Nm17l9… Slides: wellecks.com/data/welleck20… It covers three themes from our recent work: -

thumb_up_off_alt228

chat_bubble_outline3

repeat38

shareShare

Zhaoyu Li

@_zhaoyu_li_

6 months ago

Come join our AI for Math & Theorem Proving social at #ICLR2025! Looking forward to talking with everyone interested in LLMs for reasoning!

thumb_up_off_alt63

chat_bubble_outline0

repeat9

shareShare

Dawn Song

@dawnsongtweets

6 months ago

📣 Today 4/21 at 10:10 AM PT, join us for the 11th Advanced LLM Agents MOOC lecture on Program Verification & Generating Verified Code by Swarat Chaudhuri UT Austin. 🌐 Join the thriving community of the LLM Agents MOOC series, with 23K+ registered learners & 10K+ members on

thumb_up_off_alt47

chat_bubble_outline2

repeat13

shareShare

Chi Jin

@chijinml

5 months ago

Writing math proof in Lean is surprisingly addictive. Watching Terence Tao formalize Lean proofs feels like watching a top-tier gamer playing on Twitch. :-) youtube.com/watch?v=c1ixXM…

thumb_up_off_alt145

chat_bubble_outline1

repeat16

shareShare

Dawn Song

@dawnsongtweets

5 months ago

🌟 Excited to announce our esteemed panel of judges for the #AgentX competition by UC Berkeley RDI UC Berkeley. Huge thanks to Xinyun Chen Chi Wang Google DeepMind; Kaiyu Yang Meta; Jay Rodge Zhiding Yu NVIDIA; Somil Aggarwal Schmidt Sciences; Samuel Barry Mistral AI;

🌟 Excited to announce our esteemed panel of judges for the #AgentX competition by <a href="/BerkeleyRDI/">UC Berkeley RDI</a> <a href="/UCBerkeley/">UC Berkeley</a>. Huge thanks to <a href="/xinyun_chen_/">Xinyun Chen</a> <a href="/Chi_Wang_/">Chi Wang</a> <a href="/GoogleDeepMind/">Google DeepMind</a>; <a href="/KaiyuYang4/">Kaiyu Yang</a> <a href="/Meta/">Meta</a>; <a href="/jayrodge15/">Jay Rodge</a> <a href="/ZhidingYu/">Zhiding Yu</a> <a href="/nvidia/">NVIDIA</a>; <a href="/Somil_Agg/">Somil Aggarwal</a> <a href="/schmidtsciences/">Schmidt Sciences</a>; Samuel Barry <a href="/MistralAI/">Mistral AI</a>;

thumb_up_off_alt122

chat_bubble_outline6

repeat29

shareShare

George Tsoukalas

@gtsoukal

5 months ago

DeepSeekProverV2 solves 47/657 problems on PutnamBench! The model represents a substantial advance in theorem proving. The previous best model only solved 10 problems! I'm excited to see DeepSeek's performance on IMO 2025 :)

thumb_up_off_alt28

chat_bubble_outline2

repeat3

shareShare

MIT Technology Review

@techreview

5 months ago

What’s next for AI and math trib.al/9cahZIc

thumb_up_off_alt29

chat_bubble_outline2

repeat9

shareShare

Pan Lu

@lupantech

5 months ago

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle

thumb_up_off_alt180

chat_bubble_outline11

repeat40

shareShare

Christian Szegedy

@chrszegedy

5 months ago

A mathematical paper autoformalized for the first time: amazing work by Morph, presented today at the Big Proof conference by Jared Duker Lichtman and Jesse Michael Han. I am very impressed by the blazing fast progress of the morph team. Especially by Leyan Pan and _.

thumb_up_off_alt236

chat_bubble_outline7

repeat44

shareShare

Dawn Song

@dawnsongtweets

4 months ago

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖

thumb_up_off_alt333

chat_bubble_outline13

repeat108

shareShare