Benjamin Thérien (@benjamintherien) 's Twitter Profile
Benjamin Thérien

@benjamintherien

Ph.D. student at UdeM & Mila | Continually pre-training LLMs & creating learned optimizers that generalize

ID: 1060314964718772224

linkhttps://github.com/bentherien calendar_today07-11-2018 23:36:02

174 Tweet

252 Followers

436 Following

Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

Inspired by “minimal implementation“ projects in AI such as Andrej Karpathy’s nanoGPT, I worked to bring this concept to the HPC world! I’ve built a minimal implementation of an MPI library called nanoMPI, which focuses on clarity, simplicity, and easy installation.

Majdi Hassan (@majdi_has) 's Twitter Profile Photo

(1/n)🚨You can train a model solving DFT for any geometry almost without training data!🚨 Introducing Self-Refining Training for Amortized Density Functional Theory — a variational framework for learning a DFT solver that predicts the ground-state solutions for different

Emiliano Penaloza (@emilianopp_) 's Twitter Profile Photo

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9

Luke Rowe (@luke22r) 's Twitter Profile Photo

🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025! Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).

🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025!

Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

Ashwinee Panda (@pandaashwinee) 's Twitter Profile Photo

our paper on CPT of MoEs was rejected from #COLM2025 w/scores of 8775. the only reject said "I decide between 5 and 6". we emailed PCs, but just got "We are sorry, but the venue simply does not have the capacity to provide feedback at a more granular level." from Yoav Artzi. 🙁

our paper on CPT of MoEs was rejected from #COLM2025 w/scores of 8775. the only reject said "I decide between 5 and 6". we emailed PCs, but just got "We are sorry, but the venue simply does not have the capacity to provide feedback at a more granular level." from <a href="/yoavartzi/">Yoav Artzi</a>. 🙁
Massimo Caccia (@masscaccia) 's Twitter Profile Photo

🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠 We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞

🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠

We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞
Andrei Mircea (@mirandrom) 's Twitter Profile Photo

Step 1: Understand how scaling improves LLMs. Step 2: Directly target underlying mechanism. Step 3: Improve LLMs independent of scale. Profit. In our ACL 2025 paper we look at Step 1 in terms of training dynamics. Project: mirandrom.github.io/zsl Paper: arxiv.org/pdf/2506.05447

Step 1: Understand how scaling improves LLMs.
Step 2: Directly target underlying mechanism.
Step 3: Improve LLMs independent of scale. Profit.

In our ACL 2025 paper we look at Step 1 in terms of training dynamics.

Project: mirandrom.github.io/zsl 
Paper: arxiv.org/pdf/2506.05447