Songlin Yang (@songlinyang4) Twitter Tweets • TwiCopy

Songlin Yang

@songlinyang4

+ Follow

Ph.D. student @MIT_CSAIL. Working on scalable and principled methods in #ML & #LLM. she/her/hers. In 🐳 and open-sourcing I trust

ID: 1345247812666081280

linkhttps://sustcsonglin.github.io/ calendar_today02-01-2021 05:57:32

1,1K Tweet

6,6K Followers

2,2K Following

Dan Fu

@realdanfu

6 months ago

Super excited to share Chipmunk 🐿️- training-free acceleration of diffusion transformers (video, image generation) with dynamic attention & MLP sparsity! Led by Austin Silveria, soham - 3.7x faster video gen, 1.6x faster image gen. Kernels written in TK ⚡️🐱 1/

thumb_up_off_alt52

chat_bubble_outline3

repeat15

shareShare

Jiatao Gu

@thoma_gu

6 months ago

I will be attending #ICLR2025 in person during Apr 24-28, and presenting our research: DART: Denoising Autoregressive Transformer 📌Fri 25 Apr 3 p.m. +08 — 5:30 p.m. +08 This is my first time visiting Singapore, and I am looking forward to chatting with old and new friends!

thumb_up_off_alt80

chat_bubble_outline2

repeat8

shareShare

Dan Fu

@realdanfu

6 months ago

I’ll be at #ICLR2025! 🛫🇸🇬 - ThunderKittens (spotlight) w Benjamin F Spector Thu 3pm - I’ll be at the Together AI booth Fri afternoon - we’re hiring aggressively for kernels! Please reach out if you’d like to chat kernels🌽, TK⚡️🐱, Chipmunk🐿️, or anything performance! DMs open!

I’ll be at #ICLR2025! 🛫🇸🇬
- ThunderKittens (spotlight) w <a href="/bfspector/">Benjamin F Spector</a> Thu 3pm
- I’ll be at the <a href="/togethercompute/">Together AI</a> booth Fri afternoon - we’re hiring aggressively for kernels!

Please reach out if you’d like to chat kernels🌽, TK⚡️🐱, Chipmunk🐿️, or anything performance!

DMs open!

thumb_up_off_alt116

chat_bubble_outline2

repeat8

shareShare

Songlin Yang

@songlinyang4

6 months ago

thumb_up_off_alt37

chat_bubble_outline1

repeat5

shareShare

Simran Arora

@simran_s_arora

6 months ago

Another exciting data point on the "Are Transformers the end game?" front. Go chat with Jerry at ICLR about using polynomial networks to learn numerical algorithms to high precision!!

thumb_up_off_alt29

chat_bubble_outline0

repeat3

shareShare

Xinyu Yang

@xinyu2ml

6 months ago

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables: 🚀Pre-caching Contexts for Fast Inference 🐍Re-using Positions for Long Context Our poster session is located in Hall 3 and Hall 2B,

thumb_up_off_alt50

chat_bubble_outline0

repeat22

shareShare

Albert Gu

@_albertgu

6 months ago

We started off investigating applications of SSMs to PDEs but evolved to a broader question of understanding memory in modeling PDEs, finding when combining a sequence model (e.g. S4) with a Markovian neural operator (e.g. FNO) has advantages. Led by CMU students Ricardo and

thumb_up_off_alt55

chat_bubble_outline1

repeat5

shareShare

X. Dong

@simonxindong

6 months ago

Check Hymba at ICLR.

thumb_up_off_alt69

chat_bubble_outline1

repeat16

shareShare

Michael Poli

@michaelpoli6

6 months ago

We showcase the first example of a model architecture optimized for smartphones: Hyena Edge. We used our automated model design framework (STAR, Oral at ICLR 2025) to sift through convolution-based multi-hybrid architectures. STAR iteratively evolved the population of designs,

thumb_up_off_alt41

chat_bubble_outline2

repeat6

shareShare

Songlin Yang

@songlinyang4

6 months ago

Check out poster #131 at 10am on scaling up Stick-breaking Attention!

thumb_up_off_alt30

chat_bubble_outline0

repeat2

shareShare

Simran Arora

@simran_s_arora

6 months ago

go chat with michael at iclr to learn about our 405b attention-free model built on an academic budget 🚀

thumb_up_off_alt53

chat_bubble_outline1

repeat4

shareShare

Piotr Nawrot

@p_nawrot

6 months ago

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:

thumb_up_off_alt596

chat_bubble_outline5

repeat102

shareShare

Kaifeng Lyu

@vfleaking

6 months ago

Excited to present our paper this morning at ICLR 2025, revealing the gap in CoT reasoning between RNNs and Transformers! Poster Presentation: 🗓 Saturday, April 26 📷 10:00 AM – 12:30 PM 📍 Hall 2, Poster #640

thumb_up_off_alt28

chat_bubble_outline0

repeat4

shareShare

Foundation Models in the Wild @ ICLR 2025

@fm_in_wild

6 months ago

🤩 It's happening today! Join us at the 2nd Workshop on Foundation Models in the Wild — Hall 4, #6, Singapore EXPO! 🔥 10 amazing invited talks 🔥 12 exciting oral presentations 🔥 Cutting-edge ideas and lively discussions 🚀 Don't miss it — come say hi and explore the future

thumb_up_off_alt25

chat_bubble_outline0

repeat13

shareShare

TianyLin

@tianylin

6 months ago

Announcing 𝐟𝐥𝐚𝐬𝐡-𝐦𝐮𝐨𝐧: a 🐍 pkg with customized CUDA kernel that aims to boost Muon optimizer: github.com/nil0x9/flash-m… 1/n

thumb_up_off_alt232

chat_bubble_outline5

repeat33

shareShare

Liangzu Peng

@pengliangzu

6 months ago

arxiv.org/abs/2504.17963 Our recent tutorial on mathematical aspects of continual learning in light of its connection to "adaptive filtering" (1960 - now). There are a few reasons why you might like it. #ICLR2025 CoLLAs 2025 ICLR 2025

thumb_up_off_alt87

chat_bubble_outline3

repeat15

shareShare

DataSig

@data_sig

6 months ago

Songlin Yang is our Rough Path Interest Group (RPIG) speaker on Wed 30 Apr at 16:00 BST. She will be speaking on the topic of 'Advances in Scalable Linear RNNs: DeltaNet and Its Variants'. Don't miss! To join the RPIG see turing.ac.uk/research/inter………

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Songlin Yang

@songlinyang4

6 months ago

Join the link on April 30 if you're interested in DeltaNet :)

thumb_up_off_alt62

chat_bubble_outline1

repeat6

shareShare

Beidi Chen

@beidichen

6 months ago

Very excited to try this video AR model, but it takes 4.5B model 17min to generate 2s videos on our L40 🤯 Then I dig from the tech report the “real-time” (which is 2.3s so🤔) costs 24H100 for 24B model Orz Sand.ai

thumb_up_off_alt26

chat_bubble_outline0

repeat4

shareShare