Ross Wightman (@wightmanr) 's Twitter Profile
Ross Wightman

@wightmanr

Computer Vision @ ๐Ÿค—. Ex head of Software, Firmware Engineering at a Canadian ๐Ÿฆ„. Currently building ML, AI systems or investing in startups that do it better.

ID: 557902603

linkhttp://rwightman.com/ calendar_today19-04-2012 17:34:53

4,4K Tweet

21,21K Followers

1,1K Following

PyTorch (@pytorch) 's Twitter Profile Photo

Update from the PyTorch maintainers: 2.7 is out now. ๐Ÿ”น Support for NVIDIA Blackwell (CUDA 12.8) ๐Ÿ”น Mega Cache ๐Ÿ”น torch.compile for Function Modes ๐Ÿ”น FlexAttention updates ๐Ÿ”น Intel GPU perf boost ๐Ÿ”— Blog: hubs.la/Q03jBPSL0 ๐Ÿ“„ Release notes: hubs.la/Q03jBPlW0 #PyTorch

Update from the PyTorch maintainers: 2.7 is out now.
๐Ÿ”น Support for NVIDIA Blackwell (CUDA 12.8)
๐Ÿ”น Mega Cache
๐Ÿ”น torch.compile for Function Modes
๐Ÿ”น FlexAttention updates
๐Ÿ”น Intel GPU perf boost
๐Ÿ”— Blog: hubs.la/Q03jBPSL0
๐Ÿ“„ Release notes: 
hubs.la/Q03jBPlW0
#PyTorch
Ross Wightman (@wightmanr) 's Twitter Profile Photo

This sort of thing is such an own goal for the USA, hard to fathom. Also sucks for anyone caught up in it, to have you life suddenly uprooted for no good reason. But as a Canadian, can't help but be a little hopeful it might lead to some of our best a brightest sticking around or

Ross Wightman (@wightmanr) 's Twitter Profile Photo

I thought I knew PyTorch but found a bug in some recent code today and learned something new... did you know that these two lines are different? One works as I expected, and one is a sneaky bug... x[indices, :seq_len] += pos_embed[:, :seq_len] x[indices,

Ross Wightman (@wightmanr) 's Twitter Profile Photo

I decided to see which LLM would pick up the difference here without too much leading... * Sonnet 3.5 and 3.7 were both wrong, stating that add_ works properly modifying original tensor in-place * 4o, o4-mini-high, and Gemini 2.5 Pro were similar, correct though could have used

Ross Wightman (@wightmanr) 's Twitter Profile Photo

o3 reminds me of a dev I had to fire many many moons ago after two weeks on the job. Signs of talent but so full of himself and unable to admit to any wrong. Would lie to your face that he got the job done and made it 10x faster than you asked. When in reality it was a steaming

Cihang Xie (@cihangxie) 's Twitter Profile Photo

Still relying on OpenAIโ€™s CLIP โ€” a model released 4 years ago with limited architecture configurations โ€” for your Multimodal LLMs? ๐Ÿšง Weโ€™re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAIโ€™s CLIP and

Still relying on OpenAIโ€™s CLIP โ€” a model released 4 years ago with limited architecture configurations โ€” for your Multimodal LLMs? ๐Ÿšง

Weโ€™re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAIโ€™s CLIP and
Pablo Montalvo (@m_olbap) 's Twitter Profile Photo

Had the pleasure of speaking last week at PyTorch Day France about PyTorch ๐Ÿ”ฅ, the ML community, vLLM, and ๐Ÿค— Transformers! Iโ€™ve pushed my slides to the Hub directly โ€” much easier to share with practitioners ๐Ÿ“ค.

Had the pleasure of speaking last week at <a href="/PyTorch/">PyTorch</a> Day France about PyTorch ๐Ÿ”ฅ, the ML community, <a href="/vllm_project/">vLLM</a>, and ๐Ÿค— Transformers!

Iโ€™ve pushed my slides to the Hub directly โ€” much easier to share with practitioners ๐Ÿ“ค.
Vladimir Iglovikov (@viglovikov) 's Twitter Profile Photo

1๏ธโƒฃ / 4๏ธโƒฃ ๐Ÿ“Š GitHub Computer Vision Stars - May 2025 Update githublb.vercel.app/computer-vision Key highlights from the top 0.001% packages (1000 out of 100,000,000): ๐Ÿ”น # 34 transformers by Hugging Face +0 ๐Ÿ”น # 102,OpenCV Live +0 ๐Ÿ”น# 143, Stable Diffusion by Stability AI +0

Mike A. Merrill (@mike_a_merrill) 's Twitter Profile Photo

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? Weโ€™re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? 

Weโ€™re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
Alex Zhang (@a1zhang) 's Twitter Profile Photo

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? ๐—ฉ๐—ถ๐—ฑ๐—ฒ๐—ผ๐—š๐—ฎ๐—บ๐—ฒ๐—•๐—ฒ๐—ป๐—ฐ๐—ต evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! ๐Ÿงต๐Ÿ‘‡

Ross Wightman (@wightmanr) 's Twitter Profile Photo

Sometimes o3 + canvas mode really goes off the rails... after the third retry to output all of the code instead of erasing everything but one fn, how about I throw little arabic in your attention mask refactoring?

Sometimes o3 + canvas mode really goes off the rails... after the third retry to output all of the code instead of erasing everything but one fn, how about I throw little arabic in your attention mask refactoring?
Aaron Defazio (@aaron_defazio) 's Twitter Profile Photo

Why do gradients increase near the end of training? Read the paper to find out! We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training. arxiv.org/abs/2506.02285

Why do gradients increase near the end of training? 
Read the paper to find out!
We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training.
arxiv.org/abs/2506.02285
Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

๐Ÿ“ˆ Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth ๐Ÿจ๐Ÿ ๐ŸŒป๐Ÿฆ ) and found emergent properties capturing hidden regularities in nature. ๐Ÿงต

๐Ÿ“ˆ Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world!

We trained a foundation model on 214M images of ~1M species (50% of named species on Earth ๐Ÿจ๐Ÿ ๐ŸŒป๐Ÿฆ ) and found emergent properties capturing hidden regularities in nature.

๐Ÿงต