Tony Wu (@quarky93) 's Twitter Profile
Tony Wu

@quarky93

accelerating cryptography @Ingo_zk

ID: 1496065615924137987

calendar_today22-02-2022 10:13:46

500 Tweet

239 Followers

346 Following

Niall Emmart (@niall_emmart) 's Twitter Profile Photo

1/n What’s the speed limit for GPUs? Why do some teams achieve 10x proving speedups while others hit 100x on the same hardware and algorithms? NVIDIA calls it "Speed of Light Analysis", a topic near and dear to my heart. Let’s unpack it.

Tony Wu (@quarky93) 's Twitter Profile Photo

NVIDIA GB10 - 10P + 10E core ARM CPU - RTX 5070 class GPU - 128GB Unified Memory Now put this in a 16-inch laptop with a Sensel haptic touchpad. Take my money 💸

Tony Wu (@quarky93) 's Twitter Profile Photo

👀 Initial testing on RTX 5090... Real world data transfer speeds over PCIe 5.0 - 52GB/s It's so fast that on a desktop machine, you start worrying that your system RAM won't be fast enough to keep up. Make sure you turn on XMP/EXPO on your DDR5! #rtx5090

👀
Initial testing on RTX 5090...
Real world data transfer speeds over PCIe 5.0 - 52GB/s
It's so fast that on a desktop machine, you start worrying that your system RAM won't be fast enough to keep up.
Make sure you turn on XMP/EXPO on your DDR5!
#rtx5090
Tony Wu (@quarky93) 's Twitter Profile Photo

My new development workhorse: - Ryzen 9950x (16-core Zen 5, full AVX512 datapath!) - 192GB of DDR5 - 100GbE Nvidia Connectx-6 Network card (running at 25Gb since I want all my PCIe lanes to the GPU) - Sipeed NanoKVM for remote management Just waiting for my RTX 5090 now

My new development workhorse:
- Ryzen 9950x (16-core Zen 5, full AVX512 datapath!)
- 192GB of DDR5
- 100GbE Nvidia Connectx-6 Network card (running at 25Gb since I want all my PCIe lanes to the GPU)
- <a href="/SipeedIO/">Sipeed</a> NanoKVM for remote management
Just waiting for my RTX 5090 now
Tony Wu (@quarky93) 's Twitter Profile Photo

A pretty nice #locallama machine! Looking forward to what Alex Cheema - e/acc does with this one 🙂. Note for cryptography, in particular #ZKProofs : - 128GB unified memory eliminates CPU<>GPU transfer bottleneck - GPU about on par with RTX 4070 - However, AMD GPUs is half as efficient

Tony Wu (@quarky93) 's Twitter Profile Photo

The new Mac Studio is the perfect machine for running Deepseek R1 locally. Now we await Nvidia's answer, if they care to.

Tony Wu (@quarky93) 's Twitter Profile Photo

Super excited about Mojo language. - Ever wanted to write GPU kernels in Python? - Easily leverage SIMD without the code turning into a horrible mess? - Import regular Python libraries into your high performance compiled code without FFI boilerplate?

Omer Shlomovits (@omershlomovits) 's Twitter Profile Photo

First benchmarks on the NVIDIA 5090 are in. Most meaningful data point: Groth16 on 5090 is now 1.5× faster compared to 4090. Same code, no GPU-specific tuning. Measured with ICICLE-Snark, the current state-of-the-art Groth16 GPU implementation. H/t Emir Soytürk , Tony Wu