Zhensu Sun (@v587su) 's Twitter Profile
Zhensu Sun

@v587su

A Ph.D. student @sgSMU. My research focuses on improving coding productivity with AI techniques.

ID: 796228920995500032

linkhttps://v587su.github.io/ calendar_today09-11-2016 05:52:22

116 Tweet

158 Followers

300 Following

Manish Shetty (@slimshetty_) 's Twitter Profile Photo

Want to turn your own GitHub Repos into a playground for 🤖 coding agents? 📢📢 Introducing R2E: Repository to Environment 📈 Scalable, dynamic, real-world repo-level benchmarks 💡 Generate Equivalence Tests Harnesses 🔗 r2e.dev | Accepted @ ICML '24 🧵

Want to turn your own GitHub Repos into a playground for 🤖 coding agents?

📢📢 Introducing R2E: Repository to Environment

📈 Scalable, dynamic, real-world repo-level benchmarks
💡 Generate Equivalence Tests Harnesses
🔗 r2e.dev | Accepted @ ICML '24

🧵
Guillaume Lample @ NeurIPS 2024 (@guillaumelample) 's Twitter Profile Photo

Today we are releasing two small models: Mathstral 7B and Codestral Mamba 7B. On the MATH benchmark, Mathstral 7B obtains 56.6% pass@1, outperforming Minerva 540B by more than 20%. Mathstral scores 68.4% on MATH with majority voting@64, and 74.6% using a reward model. Codestral

Today we are releasing two small models: Mathstral 7B and Codestral Mamba 7B.

On the MATH benchmark, Mathstral 7B obtains 56.6% pass@1, outperforming Minerva 540B by more than 20%. Mathstral scores 68.4% on MATH with majority voting@64, and 74.6% using a reward model.

Codestral
Robert Scoble (@scobleizer) 's Twitter Profile Photo

Wow. Jando just showed me a prompt humans can’t read but LLMs understand this language better. The San Francisco AI people are designing a new language. In stealth. You are first to see it.

Wow.

<a href="/Jandodev/">Jando</a> just showed me a prompt humans can’t read but LLMs understand this language better. 

The San Francisco AI people are designing a new language. 

In stealth. You are first to see it.
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

AI is not making any progress"? Look closer. 🙄 GPT-4 level models got 240x cheaper in just 2 years! AI progress isn't linear and is just about bigger models. BERT -> DistilBERT Llama 2 70B -> Llama 3 8B GPT-4 -> GPT-4o-mini Llama 3 405B → Llama 4 70B?? 🤔 Models get bigger,

AI is not making any progress"? Look closer. 🙄 GPT-4 level models got 240x cheaper in just 2 years! AI progress isn't linear and is just about bigger models.

BERT -&gt; DistilBERT
Llama 2 70B -&gt; Llama 3 8B
GPT-4 -&gt; GPT-4o-mini
Llama 3 405B → Llama 4 70B?? 🤔

Models get bigger,
FORGE (@confforge) 's Twitter Profile Photo

🎉 Exciting News! 🎉 We are thrilled to announce that ACM SIGSOFT has officially upgraded FORGE from an ICSE Special Event to an ICSE Co-Located Conference! 🚀 We can’t wait to see your submissions for FORGE 2025! See more below👇 #FORGE #FORGE2025 ICSE

FORGE (@confforge) 's Twitter Profile Photo

🚨 Big Announcement! 🚨 We’re thrilled to welcome two distinguished keynote speakers to #FORGE2025! ✨ Prem Devanbu பேராசிரியர் Prem Devanbu (@UCDavis Professor) 🔗 cs.ucdavis.edu/~devanbu/ ✨ Graham Neubig Graham Neubig (Carnegie Mellon University Associate Professor ) 🔗 phontron.com

Zhensu Sun (@v587su) 's Twitter Profile Photo

Want to save your LLM budget without sacrificing performance? Here's a useful trick: removing non-essential code formatting, like indentations, newlines, and extra whitespaces, cuts input tokens by an average of 24.5%! Check out our full study: arxiv.org/abs/2508.13666

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Stripping code formatting cuts LLM token cost without hurting accuracy. Average input tokens drop by 24.5%, with output quality basically unchanged. The core issue is simple, indentation, spaces, and newlines help humans read but they inflate tokens that models pay to process.

Stripping code formatting cuts LLM token cost without hurting accuracy.

Average input tokens drop by 24.5%, with output quality basically unchanged. 

The core issue is simple, indentation, spaces, and newlines help humans read but they inflate tokens that models pay to process.
Zhensu Sun (@v587su) 's Twitter Profile Photo

How could two ICSE reviewers think my paper is novel while the remaining one think the paper is incremental? It doesn't make sense😮‍💨