DeepSeek (@deepseek_ai) 's Twitter Profile
DeepSeek

@deepseek_ai

Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.

ID: 1714580962569588736

linkhttps://www.deepseek.com/ calendar_today18-10-2023 09:55:45

139 Tweet

975,975K Followers

0 Following

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐ŸŽ‰ Excited to see everyoneโ€™s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience: โ€ข No system prompt โ€ข Temperature: 0.6 โ€ข Official prompts for search & file upload: bit.ly/4hyH8np โ€ข Guidelines to mitigate model bypass

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: โ€ข Dynamic hierarchical sparse strategy โ€ข Coarse-grained token compression โ€ข Fine-grained token selection ๐Ÿ’ก With

๐Ÿš€ Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
โ€ข Dynamic hierarchical sparse strategy
โ€ข Coarse-grained token compression
โ€ข Fine-grained token selection

๐Ÿ’ก With
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 0: Warming up for #OpenSourceWeek! We're a tiny team DeepSeek exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. โœ… BF16 support โœ… Paged KV cache (block size 64) โšก 3000 GB/s memory-bound & 580 TFLOPS

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. โœ… Efficient and optimized all-to-all communication โœ… Both intranode and internode support with NVLink and RDMA โœ…

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. โšก Up to 1350+ FP8 TFLOPS on Hopper GPUs โœ… No heavy dependency, as clean as a tutorial โœ… Fully Just-In-Time compiled

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿšจ Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30โ€“00:30 UTC daily: ๐Ÿ”น DeepSeek-V3 at 50% off ๐Ÿ”น DeepSeek-R1 at a massive 75% off Maximize your resources smarter โ€” save more during these high-value hours!

๐Ÿšจ Off-Peak Discounts Alert!

Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30โ€“00:30 UTC daily:

๐Ÿ”น DeepSeek-V3 at 50% off
๐Ÿ”น DeepSeek-R1 at a massive 75% off

Maximize your resources smarter โ€” save more during these high-value hours!
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies โœ… DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. ๐Ÿ”— github.com/deepseek-ai/Duโ€ฆ โœ… EPLB - an expert-parallel load balancer for V3/R1. ๐Ÿ”—

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. โšก 6.6 TiB/s aggregate read throughput in a 180-node cluster โšก 3.66 TiB/min

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 6 of #OpenSourceWeek: One More Thing โ€“ DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: ๐Ÿ”ง Cross-node EP-powered batch scaling ๐Ÿ”„ Computation-communication overlap โš–๏ธ Load balancing Statistics of DeepSeek's Online Service: โšก 73.7k/14.8k

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ DeepSeek-V3-0324 is out now! ๐Ÿ”น Major boost in reasoning performance ๐Ÿ”น Stronger front-end development skills ๐Ÿ”น Smarter tool-use capabilities โœ… For non-complex reasoning tasks, we recommend using V3 โ€” just turn off โ€œDeepThinkโ€ ๐Ÿ”Œ API usage remains unchanged ๐Ÿ“œ Models are

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ DeepSeek-R1-0528 is here! ๐Ÿ”น Improved benchmark performance ๐Ÿ”น Enhanced front-end capabilities ๐Ÿ”น Reduced hallucinations ๐Ÿ”น Supports JSON output & function calling โœ… Try it now: chat.deepseek.com ๐Ÿ”Œ No change to API usage โ€” docs here: api-docs.deepseek.com/guides/reasoniโ€ฆ ๐Ÿ”—