Vimal Thilak🦉🐒 (@aggieinca) 's Twitter Profile
Vimal Thilak🦉🐒

@aggieinca

Proverbs 17:28. I’m not learned. I'm AGI.

ID: 204454102

calendar_today18-10-2010 18:44:00

2,2K Tweet

514 Followers

493 Following

Mustafa Shukor (@mustafashukor1) 's Twitter Profile Photo

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

We release a large scale study to answer the following:
- Is late fusion inherently better than early fusion for multimodal models?
- How do native multimodal models scale compared to LLMs.
- How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵
Alaa El-Nouby (@alaa_nouby) 's Twitter Profile Photo

We have been thinking a lot about how to train truly native multimodal models: (1) what arch to use (early-fusion, late-fusion, MoEs)? (2) the impact of data mixtures (interleaved, img-cap, text data) We took a stab at answering these questions (and more) in this preprint ...

We have been thinking a lot about how to train truly native multimodal models:

(1) what arch to use (early-fusion, late-fusion, MoEs)?
(2) the impact of data mixtures (interleaved, img-cap, text data)

We took a stab at answering these questions (and more) in this preprint ...
Vimal Thilak🦉🐒 (@aggieinca) 's Twitter Profile Photo

Check out this post that has information about research from Apple that will be presented at ICLR 2025 in 🇸🇬 this week. I will be at ICLR and will be presenting some of our work (led by Samira Abnar) at SLLM Sparsity in LLMs Workshop at ICLR 2025 workshop. Happy to chat about JEPAs as well!

Jason Ramapuram (@jramapuram) 's Twitter Profile Photo

Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens): -

Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! 

We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens):

-
Harshay Shah (@harshays_) 's Twitter Profile Photo

If you’re at #ICLR2025, go watch Vimal Thilak🦉🐒 give an oral presentation at the @SparseLLMs workshop on scaling laws for pretraining MoE LMs! Had a great time co-leading this project with Samira Abnar & Vimal Thilak🦉🐒 at Apple MLR last summer. When: Sun Apr 27, 9:30a Where: Hall 4-07

Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

In less than 24h! Zoom will be open to all! Program below: 9am -> 9:10am opening 9:10am -> 9:55am Phillip Isola 9:55am -> 10:40am Thomas Serre 11am -> 11:45am Eero Simoncelli 11:45am -> 12:30pm Yi Ma 1:30pm -> 2:15pm Yann LeCun 2:15pm -> 3pm

In less than 24h! Zoom will be open to all! Program below:

9am       -> 9:10am    opening
9:10am -> 9:55am    Phillip Isola
9:55am -> 10:40am Thomas Serre
11am       -> 11:45am  Eero Simoncelli
11:45am -> 12:30pm Yi Ma
1:30pm -> 2:15pm     Yann LeCun
2:15pm -> 3pm
Miguel Angel Bautista (@itsbautistam) 's Twitter Profile Photo

We will be presenting this work at ICML25 in Vancouver! Great work by Yuyang Wang leading this project! I’m curious about what would the diffusion/fm community want to see this type of model do? (Besides getting better FID on ImageNet 😂)

Shuangfei Zhai (@zhaisf) 's Twitter Profile Photo

Proud to report that TarFlow is accepted to #ICML2025 as a Spotlight 🎉 I’m really looking forward to new ideas and applications enabled by powerful Normalizing Flow models 🚀

Vimal Thilak🦉🐒 (@aggieinca) 's Twitter Profile Photo

Yep. What is meant by image-like here? 🤔 The problem, more like the frustrating aspect, of empirical work is we have no idea what is optimal as anytime I dare make a claim, hypertuner bros laugh at me and release new SoTA :).

Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile Photo

Excited to share code & models for FastVLM — our blazing-fast Vision-Language Model appearing at #CVPR2025 Run it on-device with inference code optimized for Apple Silicon using #mlx. Code: github.com/apple/ml-fastv… Updated paper & results coming soon. Stay tuned! 👀

Tianqi Chen (@tqchenml) 's Twitter Profile Photo

ML systems infrastructure (compilers, inference engines, GPU accelerations, and more) are at the heart of the AI revolution . One thing I love about #MLSys2025 is that it comes with high density of talents and a shared mindset in these directions. Starting next Monday!

Vimal Thilak🦉🐒 (@aggieinca) 's Twitter Profile Photo

Ahmad started a very interesting discussion . I wish we had openreview-like thing to archive these discussions:). TMLR is a good venue. Correctness cover subjectives (novelty!?) has a better chance of being useful down the road. Also Ecclesiastes 1:9

Mohammed Adnan (@adnan_ahmad1306) 's Twitter Profile Photo

1/10 🧵 🔍Can weight symmetry provide insights into sparse training and the Lottery Ticket Hypothesis? 🧐We dive deep into this question in our latest paper, "Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry", accepted at #ICML2025

1/10 🧵
🔍Can weight symmetry provide insights into sparse training and the Lottery Ticket Hypothesis?

🧐We dive deep into this question in our latest paper, "Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry", accepted at #ICML2025