Roger Waleffe (@rwaleffe) 's Twitter Profile
Roger Waleffe

@rwaleffe

Computer Sciences PhD student at the University of Wisconsin-Madison

ID: 1276277563342626821

linkhttp://rogerwaleffe.com calendar_today25-06-2020 22:14:38

26 Tweet

77 Followers

25 Following

Theo Rekatsinas (@thodrek) 's Twitter Profile Photo

1/3 Super exciting new result by Roger (Roger Waleffe) on how to find small networks that exhibit the same performance as overparameterized networks! We show that no expensive iterative pruning is needed to find lottery tickets x.com/StatMLPapers/s…

Theo Rekatsinas (@thodrek) 's Twitter Profile Photo

2/3 The secret sauce: Hidden layer activations in wide networks live in small subspaces! Train your wide-net for a few epochs, run PCA on the activations, project the weights on the PCA basis, and continue training to find your new state-of-the-art subnetwork.

Theo Rekatsinas (@thodrek) 's Twitter Profile Photo

3/3 We term these networks Principal Component Networks (PCNs). Practical results: We show that converting wide networks to their equivalent PCN outperforms deeper networks. For example, we find that Wide ResNet-50 PCN outperforms ResNet-152 on ImageNet.

Theo Rekatsinas (@thodrek) 's Twitter Profile Photo

Accepted to #OSDI21: @JasonMohoney & Roger Waleffe show how to train massive graph embeddings in a 𝘀𝗶𝗻𝗴𝗹𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲; don't burn $$$$ on cloud providers. 1/n works on graph learning w. the amazing Shivaram Venkataraman. open-sourcing soon #marius arxiv.org/abs/2101.08358

Accepted to #OSDI21: @JasonMohoney &amp; <a href="/RWaleffe/">Roger Waleffe</a> show how to train massive graph embeddings in a 𝘀𝗶𝗻𝗴𝗹𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲; don't burn $$$$ on cloud providers. 1/n works on graph learning w. the amazing Shivaram Venkataraman. open-sourcing soon #marius arxiv.org/abs/2101.08358
Theo Rekatsinas (@thodrek) 's Twitter Profile Photo

Scalability is a key factor limiting the use of Graph Neural Networks (GNNs) over large graphs; w/ Roger Waleffe, @JasonMohoney , and Shiv, we introduce Marius++ (arxiv.org/abs/2202.02365), a system for *out-of-core* GNN mini-batch training over billion-scale graphs. (1/5)

Immanuel Trummer (@immanueltrummer) 's Twitter Profile Photo

Roger Waleffe (Roger Waleffe) shows how to train over billion-scale graphs on a single machine! Join us at 1 PM ET via Zoom! Link: tinyurl.com/2p8uv2j8 Details: itrummer.github.io/cornelldbsemin… Wisconsin DB Group (@wiscdb.bsky.social) UW–Madison Computer Sciences Theo Rekatsinas #ML #AI #Databases #GraphData #CornellDBseminar

Roger Waleffe (<a href="/RWaleffe/">Roger Waleffe</a>) shows how to train over billion-scale graphs on a single machine! 
Join us at 1 PM ET via Zoom!
Link: tinyurl.com/2p8uv2j8
Details: itrummer.github.io/cornelldbsemin…
<a href="/wiscdb/">Wisconsin DB Group (@wiscdb.bsky.social)</a> <a href="/WisconsinCS/">UW–Madison Computer Sciences</a> <a href="/thodrek/">Theo Rekatsinas</a> #ML #AI #Databases #GraphData #CornellDBseminar
PyKEEN (@keenuniverse) 's Twitter Profile Photo

Marius, another amazing KGE (and more) library is now auto-formatting its code with black as of github.com/marius-team/ma… 🚀 @JasonMohoney Roger Waleffe nice job :)

Disseminate: The Computer Science Research Podcast (@disseminatepod) 's Twitter Profile Photo

🚨 "MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks" with Roger Waleffe is available now! 🎧 Listen on Spotify ➡️ open.spotify.com/show/6IQIF9oRS… ☕️ Support the podcast ➡️ buymeacoffee.com/disseminate

Theo Rekatsinas (@thodrek) 's Twitter Profile Photo

Data pruning to reduce pertaining costs is hot, but fancy pruning can take just as long to select data as to train on all of it! Patrik, @Rwaleffe, and @vmageirakos's work at #ICLR2024 tomorrow shows how a simple, low-cost tweak to random sampling outperforms trendy methods!

Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset:
* 7% attention, the rest is Mamba2
* MMLU jumps from 50 to 53.6%
* Training efficiency is the same
* Inference cost is much less
arxiv.org/pdf/2406.07887
Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs.
* Hybrid architecture means up to 3X faster at the same accuracy
* Trained in FP8
* Great for VLMs
* Weights and instruct versions to come soon.

research.nvidia.com/labs/adlr/nemo…