Songlin Yang (@songlinyang4) 's Twitter Profile
Songlin Yang

@songlinyang4

Ph.D. student @MIT_CSAIL. Working on scalable and principled methods in #ML & #LLM. she/her/hers. In 🐳 and open-sourcing I trust

ID: 1345247812666081280

linkhttps://sustcsonglin.github.io/ calendar_today02-01-2021 05:57:32

1,1K Tweet

6,6K Followers

2,2K Following

Dan Fu (@realdanfu) 's Twitter Profile Photo

Super excited to share Chipmunk 🐿️- training-free acceleration of diffusion transformers (video, image generation) with dynamic attention & MLP sparsity! Led by Austin Silveria, soham - 3.7x faster video gen, 1.6x faster image gen. Kernels written in TK ⚡️🐱 1/

Jiatao Gu (@thoma_gu) 's Twitter Profile Photo

I will be attending #ICLR2025 in person during Apr 24-28, and presenting our research: DART: Denoising Autoregressive Transformer 📌Fri 25 Apr 3 p.m. +08 — 5:30 p.m. +08 This is my first time visiting Singapore, and I am looking forward to chatting with old and new friends!

Dan Fu (@realdanfu) 's Twitter Profile Photo

I’ll be at #ICLR2025! 🛫🇸🇬 - ThunderKittens (spotlight) w Benjamin F Spector Thu 3pm - I’ll be at the Together AI booth Fri afternoon - we’re hiring aggressively for kernels! Please reach out if you’d like to chat kernels🌽, TK⚡️🐱, Chipmunk🐿️, or anything performance! DMs open!

I’ll be at #ICLR2025! 🛫🇸🇬
- ThunderKittens (spotlight) w <a href="/bfspector/">Benjamin F Spector</a> Thu 3pm
- I’ll be at the <a href="/togethercompute/">Together AI</a> booth Fri afternoon - we’re hiring aggressively for kernels!

Please reach out if you’d like to chat kernels🌽, TK⚡️🐱, Chipmunk🐿️, or anything performance!

DMs open!
Simran Arora (@simran_s_arora) 's Twitter Profile Photo

Another exciting data point on the "Are Transformers the end game?" front. Go chat with Jerry at ICLR about using polynomial networks to learn numerical algorithms to high precision!!

Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables: 🚀Pre-caching Contexts for Fast Inference 🐍Re-using Positions for Long Context Our poster session is located in Hall 3 and Hall 2B,

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables:
🚀Pre-caching Contexts for Fast Inference
🐍Re-using Positions for Long Context

Our poster session is located in Hall 3 and Hall 2B,
Albert Gu (@_albertgu) 's Twitter Profile Photo

We started off investigating applications of SSMs to PDEs but evolved to a broader question of understanding memory in modeling PDEs, finding when combining a sequence model (e.g. S4) with a Markovian neural operator (e.g. FNO) has advantages. Led by CMU students Ricardo and

Michael Poli (@michaelpoli6) 's Twitter Profile Photo

We showcase the first example of a model architecture optimized for smartphones: Hyena Edge. We used our automated model design framework (STAR, Oral at ICLR 2025) to sift through convolution-based multi-hybrid architectures. STAR iteratively evolved the population of designs,

We showcase the first example of a model architecture optimized for smartphones: Hyena Edge.

We used our automated model design framework (STAR, Oral at ICLR 2025) to sift through convolution-based multi-hybrid architectures. 

STAR iteratively evolved the population of designs,
Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs.

We performed the most comprehensive study on training-free sparse attention to date.

Here is what we found:
Kaifeng Lyu (@vfleaking) 's Twitter Profile Photo

Excited to present our paper this morning at ICLR 2025, revealing the gap in CoT reasoning between RNNs and Transformers! Poster Presentation: 🗓 Saturday, April 26 📷 10:00 AM – 12:30 PM 📍 Hall 2, Poster #640

TianyLin (@tianylin) 's Twitter Profile Photo

Announcing 𝐟𝐥𝐚𝐬𝐡-𝐦𝐮𝐨𝐧: a 🐍 pkg with customized CUDA kernel that aims to boost Muon optimizer: github.com/nil0x9/flash-m… 1/n

Liangzu Peng (@pengliangzu) 's Twitter Profile Photo

arxiv.org/abs/2504.17963 Our recent tutorial on mathematical aspects of continual learning in light of its connection to "adaptive filtering" (1960 - now). There are a few reasons why you might like it. #ICLR2025 CoLLAs 2025 ICLR 2025

arxiv.org/abs/2504.17963 Our recent tutorial on mathematical aspects of continual learning in light of its connection to "adaptive filtering" (1960 - now). There are a few reasons why you might like it. #ICLR2025 <a href="/CoLLAs_Conf/">CoLLAs 2025</a> <a href="/iclr_conf/">ICLR 2025</a>
DataSig (@data_sig) 's Twitter Profile Photo

Songlin Yang is our Rough Path Interest Group (RPIG) speaker on Wed 30 Apr at 16:00 BST. She will be speaking on the topic of 'Advances in Scalable Linear RNNs: DeltaNet and Its Variants'. Don't miss! To join the RPIG see turing.ac.uk/research/inter………

Songlin Yang is our Rough Path Interest Group (RPIG) speaker on Wed 30 Apr at 16:00 BST. She will be speaking on the topic of 'Advances in Scalable Linear RNNs: DeltaNet and Its Variants'. Don't miss!  To join the RPIG see turing.ac.uk/research/inter………
Beidi Chen (@beidichen) 's Twitter Profile Photo

Very excited to try this video AR model, but it takes 4.5B model 17min to generate 2s videos on our L40 🤯 Then I dig from the tech report the “real-time” (which is 2.3s so🤔) costs 24H100 for 24B model Orz Sand.ai