DeepSpeed (@deepspeedai) 's Twitter Profile
DeepSpeed

@deepspeedai

Official account for DeepSpeed, a library that enables unprecedented scale and speed for deep learning training + inference.

日本語 : @DeepSpeedAI_JP

ID: 1262854060320755715

linkhttps://www.deepspeed.ai/ calendar_today19-05-2020 21:14:20

81 Tweet

3,3K Followers

88 Following

Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

🪁 Today we're releasing the code to train your very own Zephyr models! We've worked hard to make this as accessible as possible, so you can run: 🏋️‍♂️ Full fine-tuning with @MSFTDeepSpeed ZeRO-3 on A100s 🐭 LoRA or QLoRA fine-tuning on consumer GPUs Code: github.com/huggingface/al…

🪁 Today we're releasing the code to train your very own Zephyr models!

We've worked hard to make this as accessible as possible, so you can run:

🏋️‍♂️ Full fine-tuning with @MSFTDeepSpeed ZeRO-3 on A100s
🐭 LoRA or QLoRA fine-tuning on consumer GPUs

Code: github.com/huggingface/al…
Stas Bekman (@stasbekman) 's Twitter Profile Photo

Hear, hear, AMD MI300Xs have started to emerge much sooner than expected. Here is a 2-part benchmarks report on performing BLOOM-176B inference using @MSFTDeepSpeed optimized for AMD MI300X. 1. evp.cloud/post/diving-de… 2. evp.cloud/post/diving-de… This was published in response

Simo Ryu (@cloneofsimo) 's Twitter Profile Photo

So you've had your fun with Andrej Karpathy 's mingpt. Now its time to scale : introducing min-max-gpt: really small codebase that scales with help of @MSFTDeepSpeed . No huggingface accelerate, transformer. Just deepspeed + torch: maximum hackability github.com/cloneofsimo/mi…

So you've had your fun with <a href="/karpathy/">Andrej Karpathy</a> 's mingpt. Now its time to scale : introducing min-max-gpt:  really small codebase that scales with help of @MSFTDeepSpeed . No huggingface accelerate, transformer. Just deepspeed + torch: maximum hackability

github.com/cloneofsimo/mi…
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

#DeepSpeed joins forces with University of Sydney to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver: 🚀 1.5x performance boost for #LLMs serving on #GPUs 🚀 Innovative (4+2)-bit system design 🚀 Quality-preserving quantization link: github.com/microsoft/Deep…

#DeepSpeed joins forces with <a href="/Sydney_Uni/">University of Sydney</a>  to unveil an exciting tech #FP6.  Just supply your FP16 models, and we deliver:
🚀 1.5x performance boost for #LLMs serving on #GPUs
🚀 Innovative (4+2)-bit system design
🚀 Quality-preserving quantization 
link: github.com/microsoft/Deep…
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Introducing Universal Checkpointing for boosting training efficiency. - Change parallelism (PP, SP, TP, ZeRO-DP) or GPU count mid-stream - Improve resilience by scaling down to healthy nodes💪 - Increase throughput by scaling up to elastic nodes🚀 Blog: rb.gy/aup3pn

Introducing Universal Checkpointing for boosting training efficiency.
- Change parallelism (PP, SP, TP, ZeRO-DP) or GPU count mid-stream
- Improve resilience by scaling down to healthy nodes💪
- Increase throughput by scaling up to elastic nodes🚀

Blog: rb.gy/aup3pn
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Introducing DeepNVMe, a suite of optimizations for fast and efficient I/O operations in DL applications. - POSIX-style APIs - Direct HBM/NVMe xfers via NVIDIA GDS - Cheap Inference scaling via NVMe-Offload Blog: shorturl.at/l7Oue Microsoft Azure NVIDIA Data Center #FMS24 #GPUDirect

Introducing DeepNVMe, a suite of optimizations for fast and efficient I/O operations in DL applications. 
- POSIX-style APIs
- Direct HBM/NVMe xfers via NVIDIA GDS
- Cheap Inference scaling via NVMe-Offload
 
Blog: shorturl.at/l7Oue

<a href="/Azure/">Microsoft Azure</a> 
<a href="/NVIDIADC/">NVIDIA Data Center</a> 
#FMS24
#GPUDirect
Comet (@cometml) 's Twitter Profile Photo

💡Check out Comet’s latest integration with DeepSpeed, a deep learning optimization library! 🤝With the @MSFTDeepSpeed + Comet integration automatically start logging training metrics generated by DeepSpeed. Try the quick-start Colab to get started: colab.research.google.com/github/comet-m…

DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: shorturl.at/a7TF8

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks  DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. 
- HF Inference &amp; Finetuning
- LoRA
- CPU Offload

Blog: shorturl.at/a7TF8
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings. - Near-complete communication hiding - Novel multi-node scalable TP solution Blog: github.com/microsoft/Deep…

Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings.

- Near-complete communication hiding
- Novel multi-node scalable TP solution 

Blog: github.com/microsoft/Deep…
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: shorturl.at/Spx6Y Tutorial: shorturl.at/bAWu5

🚀Introducing Ulysses-Offload🚀

- Unlock the power of long context LLM training and finetuning with our latest system optimizations 
- Train LLaMA3-8B on 2M tokens context using 4xA100-80GB
-  Achieve over 55% MFU

Blog: shorturl.at/Spx6Y
Tutorial: shorturl.at/bAWu5
LF AI & Data Foundation (@lfaidatafdn) 's Twitter Profile Photo

🚀 Excited to introduce DeepSpeed, a deep learning optimization library from Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 hubs.la/Q0351DJC0 #DeepSpeed #AI #OpenSource #LFAIData

🚀 Excited to introduce DeepSpeed, a deep learning optimization library from <a href="/Microsoft/">Microsoft</a>! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. 

Learn more 👉 hubs.la/Q0351DJC0

#DeepSpeed #AI #OpenSource #LFAIData
xr-5 🐀 (@xariusrke) 's Twitter Profile Photo

1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.

1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

AutoTP + ZeRO Training for HF Models - Enhance HF post-training with larger models, batches, & contexts - 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1 - No code changes needed Blog: tinyurl.com/5n8nfs2w

AutoTP + ZeRO Training for HF Models
- Enhance HF post-training with larger models, batches, &amp; contexts
- 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1
- No code changes needed

Blog: tinyurl.com/5n8nfs2w
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading tinyurl.com/8cys28xk

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations.
- Automatic parallelization &amp; profile-guided optimizations
- Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes 
- 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading

tinyurl.com/8cys28xk
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Come hear all the exciting DeepSpeed updates at the upcoming PyTorch Day France 2025 DeepSpeed – Efficient Training Scalability for Deep Learning Models - sched.co/21nyy Sched

PyTorch (@pytorch) 's Twitter Profile Photo

PyTorch Foundation has expanded into an umbrella foundation. vLLM and DeepSpeed have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: AMD, Arm, Amazon Web Services, Google, Huawei,

PyTorch Foundation has expanded into an umbrella foundation. <a href="/vllm_project/">vLLM</a> and <a href="/DeepSpeedAI/">DeepSpeed</a> have been accepted as hosted projects, advancing community-driven AI across the full lifecycle.

Supporting quotes provided by the following members: <a href="/AMD/">AMD</a>, <a href="/Arm/">Arm</a>, <a href="/AWS/">Amazon Web Services</a>, <a href="/Google/">Google</a>, <a href="/Huawei/">Huawei</a>,
Stas Bekman (@stasbekman) 's Twitter Profile Photo

My first project at Snowflake AI Research is complete! I present to you Arctic Long Sequence Training (ALST) Paper: arxiv.org/abs/2506.13996 Blog: snowflake.com/en/engineering… ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million

My first project at <a href="/Snowflake/">Snowflake</a> AI Research is complete! 

I present to you Arctic Long Sequence Training (ALST) 

Paper: arxiv.org/abs/2506.13996
Blog: snowflake.com/en/engineering…

ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Kudos to Xinyu for giving an excellent presentation of DeepSpeed Universal Checkpointing (UCP) paper at USENIX ATC 2015.