Roger Waleffe (@rwaleffe) Twitter Tweets • TwiCopy

Stat.ML Papers

@statmlpapers

5 years ago

Principal Component Networks: Parameter Reduction Early in Training. (arXiv:2006.13347v1 [cs.LG]) ift.tt/2VdFEXG

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

1/3 Super exciting new result by Roger (Roger Waleffe) on how to find small networks that exhibit the same performance as overparameterized networks! We show that no expensive iterative pruning is needed to find lottery tickets x.com/StatMLPapers/s…

thumb_up_off_alt32

chat_bubble_outline1

repeat5

shareShare

Theo Rekatsinas

@thodrek

5 years ago

2/3 The secret sauce: Hidden layer activations in wide networks live in small subspaces! Train your wide-net for a few epochs, run PCA on the activations, project the weights on the PCA basis, and continue training to find your new state-of-the-art subnetwork.

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Theo Rekatsinas

@thodrek

5 years ago

3/3 We term these networks Principal Component Networks (PCNs). Practical results: We show that converting wide networks to their equivalent PCN outperforms deeper networks. For example, we find that Wide ResNet-50 PCN outperforms ResNet-152 on ImageNet.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Theo Rekatsinas

@thodrek

4 years ago

Accepted to #OSDI21: @JasonMohoney & Roger Waleffe show how to train massive graph embeddings in a 𝘀𝗶𝗻𝗴𝗹𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲; don't burn $$$$ on cloud providers. 1/n works on graph learning w. the amazing Shivaram Venkataraman. open-sourcing soon #marius arxiv.org/abs/2101.08358

Accepted to #OSDI21: @JasonMohoney & <a href="/RWaleffe/">Roger Waleffe</a> show how to train massive graph embeddings in a 𝘀𝗶𝗻𝗴𝗹𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲; don't burn $$$$ on cloud providers. 1/n works on graph learning w. the amazing Shivaram Venkataraman. open-sourcing soon #marius arxiv.org/abs/2101.08358

thumb_up_off_alt49

chat_bubble_outline0

repeat9

shareShare

Theo Rekatsinas

@thodrek

4 years ago

Scalability is a key factor limiting the use of Graph Neural Networks (GNNs) over large graphs; w/ Roger Waleffe, @JasonMohoney , and Shiv, we introduce Marius++ (arxiv.org/abs/2202.02365), a system for *out-of-core* GNN mini-batch training over billion-scale graphs. (1/5)

thumb_up_off_alt33

chat_bubble_outline2

repeat5

shareShare

Immanuel Trummer

@immanueltrummer

3 years ago

Roger Waleffe (Roger Waleffe) shows how to train over billion-scale graphs on a single machine! Join us at 1 PM ET via Zoom! Link: tinyurl.com/2p8uv2j8 Details: itrummer.github.io/cornelldbsemin… Wisconsin DB Group (@wiscdb.bsky.social) UW–Madison Computer Sciences Theo Rekatsinas #ML #AI #Databases #GraphData #CornellDBseminar

Roger Waleffe (<a href="/RWaleffe/">Roger Waleffe</a>) shows how to train over billion-scale graphs on a single machine!
Join us at 1 PM ET via Zoom!
Link: tinyurl.com/2p8uv2j8
Details: itrummer.github.io/cornelldbsemin…
<a href="/wiscdb/">Wisconsin DB Group (@wiscdb.bsky.social)</a> <a href="/WisconsinCS/">UW–Madison Computer Sciences</a> <a href="/thodrek/">Theo Rekatsinas</a> #ML #AI #Databases #GraphData #CornellDBseminar

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Immanuel Trummer

@immanueltrummer

3 years ago

Roger Waleffe (Roger Waleffe) from Wisconsin DB Group (@wiscdb.bsky.social) introduces the Marius++ system! Check out the talk: youtu.be/BVDQauRb4gQ UW–Madison Computer Sciences #CornellDBseminar #Databases #GraphData #ML #AI #DataScience #BigData #DataAnalysis

thumb_up_off_alt4

chat_bubble_outline2

repeat2

shareShare

PyKEEN

@keenuniverse

3 years ago

Marius, another amazing KGE (and more) library is now auto-formatting its code with black as of github.com/marius-team/ma… 🚀 @JasonMohoney Roger Waleffe nice job :)

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Disseminate: The Computer Science Research Podcast

@disseminatepod

2 years ago

🚨 "MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks" with Roger Waleffe is available now! 🎧 Listen on Spotify ➡️ open.spotify.com/show/6IQIF9oRS… ☕️ Support the podcast ➡️ buymeacoffee.com/disseminate

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Theo Rekatsinas

@thodrek

a year ago

Data pruning to reduce pertaining costs is hot, but fancy pruning can take just as long to select data as to train on all of it! Patrik, @Rwaleffe, and @vmageirakos's work at #ICLR2024 tomorrow shows how a simple, low-cost tweak to random sampling outperforms trendy methods!

thumb_up_off_alt15

chat_bubble_outline2

repeat4

shareShare

Bryan Catanzaro

@ctnzr

a year ago

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887

thumb_up_off_alt440

chat_bubble_outline18

repeat77

shareShare

Bryan Catanzaro

@ctnzr

6 months ago

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…

thumb_up_off_alt639

chat_bubble_outline19

repeat102

shareShare