Tony Wu (@quarky93) Twitter Tweets • TwiCopy

1/n What’s the speed limit for GPUs? Why do some teams achieve 10x proving speedups while others hit 100x on the same hardware and algorithms? NVIDIA calls it "Speed of Light Analysis", a topic near and dear to my heart. Let’s unpack it.

thumb_up_off_alt34

chat_bubble_outline2

repeat15

shareShare

Tony Wu

@quarky93

10 months ago

NVIDIA GB10 - 10P + 10E core ARM CPU - RTX 5070 class GPU - 128GB Unified Memory Now put this in a 16-inch laptop with a Sensel haptic touchpad. Take my money 💸

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Tony Wu

@quarky93

10 months ago

👀 Initial testing on RTX 5090... Real world data transfer speeds over PCIe 5.0 - 52GB/s It's so fast that on a desktop machine, you start worrying that your system RAM won't be fast enough to keep up. Make sure you turn on XMP/EXPO on your DDR5! #rtx5090

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

Tony Wu

@quarky93

10 months ago

My new development workhorse: - Ryzen 9950x (16-core Zen 5, full AVX512 datapath!) - 192GB of DDR5 - 100GbE Nvidia Connectx-6 Network card (running at 25Gb since I want all my PCIe lanes to the GPU) - Sipeed NanoKVM for remote management Just waiting for my RTX 5090 now

thumb_up_off_alt9

chat_bubble_outline2

repeat1

shareShare

Tony Wu

@quarky93

9 months ago

A pretty nice #locallama machine! Looking forward to what Alex Cheema - e/acc does with this one 🙂. Note for cryptography, in particular #ZKProofs : - 128GB unified memory eliminates CPU<>GPU transfer bottleneck - GPU about on par with RTX 4070 - However, AMD GPUs is half as efficient

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Tony Wu

@quarky93

9 months ago

So this is where the entire supply went 😂

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Tony Wu

@quarky93

9 months ago

The amount of open research Ingonyama does is awesome! Check out more Ingo papers here: github.com/ingonyama-zk/p…

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Tony Wu

@quarky93

9 months ago

The new Mac Studio is the perfect machine for running Deepseek R1 locally. Now we await Nvidia's answer, if they care to.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Tony Wu

@quarky93

9 months ago

Super excited about Mojo language. - Ever wanted to write GPU kernels in Python? - Easily leverage SIMD without the code turning into a horrible mess? - Import regular Python libraries into your high performance compiled code without FFI boilerplate?

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Ingonyama

@ingo_zk

8 months ago

Hardware-friendliness of HyperPlonk, Part2 hackmd.io/@Ingonyama/Har…

thumb_up_off_alt19

chat_bubble_outline0

repeat2

shareShare

Tony Wu

@quarky93

8 months ago

Is this the first time that AMD will be on the leading edge node before Apple?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Tony Wu

@quarky93

7 months ago

My RTX 5090 finally arrived!! 🤤 1.8 TB/s memory bandwidth!

thumb_up_off_alt12

chat_bubble_outline4

repeat0

shareShare

Omer Shlomovits

@omershlomovits

7 months ago

First benchmarks on the NVIDIA 5090 are in. Most meaningful data point: Groth16 on 5090 is now 1.5× faster compared to 4090. Same code, no GPU-specific tuning. Measured with ICICLE-Snark, the current state-of-the-art Groth16 GPU implementation. H/t Emir Soytürk , Tony Wu

thumb_up_off_alt23

chat_bubble_outline0

repeat5

shareShare

Tony Wu

@quarky93

5 months ago

Amazing! GPU accelerated RTL simulator. Quick, someone wrap Cocotb around this!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Niall Emmart

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Tony Wu

Ingonyama

Tony Wu

Tony Wu

Omer Shlomovits

Tony Wu