DeepSeek (@deepseek_ai) 's Twitter Profile
DeepSeek

@deepseek_ai

Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.

ID: 1714580962569588736

linkhttps://www.deepseek.com/ calendar_today18-10-2023 09:55:45

139 Tweet

975,975K Followers

0 Following

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸŽ‰ Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience: β€’ No system prompt β€’ Temperature: 0.6 β€’ Official prompts for search & file upload: bit.ly/4hyH8np β€’ Guidelines to mitigate model bypass

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: β€’ Dynamic hierarchical sparse strategy β€’ Coarse-grained token compression β€’ Fine-grained token selection πŸ’‘ With

πŸš€ Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
β€’ Dynamic hierarchical sparse strategy
β€’ Coarse-grained token compression
β€’ Fine-grained token selection

πŸ’‘ With
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 0: Warming up for #OpenSourceWeek! We're a tiny team DeepSeek exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. βœ… BF16 support βœ… Paged KV cache (block size 64) ⚑ 3000 GB/s memory-bound & 580 TFLOPS

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. βœ… Efficient and optimized all-to-all communication βœ… Both intranode and internode support with NVLink and RDMA βœ…

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚑ Up to 1350+ FP8 TFLOPS on Hopper GPUs βœ… No heavy dependency, as clean as a tutorial βœ… Fully Just-In-Time compiled

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚨 Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily: πŸ”Ή DeepSeek-V3 at 50% off πŸ”Ή DeepSeek-R1 at a massive 75% off Maximize your resources smarter β€” save more during these high-value hours!

🚨 Off-Peak Discounts Alert!

Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily:

πŸ”Ή DeepSeek-V3 at 50% off
πŸ”Ή DeepSeek-R1 at a massive 75% off

Maximize your resources smarter β€” save more during these high-value hours!
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies βœ… DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. πŸ”— github.com/deepseek-ai/Du… βœ… EPLB - an expert-parallel load balancer for V3/R1. πŸ”—

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚑ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚑ 3.66 TiB/min

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: πŸ”§ Cross-node EP-powered batch scaling πŸ”„ Computation-communication overlap βš–οΈ Load balancing Statistics of DeepSeek's Online Service: ⚑ 73.7k/14.8k

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ DeepSeek-V3-0324 is out now! πŸ”Ή Major boost in reasoning performance πŸ”Ή Stronger front-end development skills πŸ”Ή Smarter tool-use capabilities βœ… For non-complex reasoning tasks, we recommend using V3 β€” just turn off β€œDeepThink” πŸ”Œ API usage remains unchanged πŸ“œ Models are

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

πŸš€ DeepSeek-R1-0528 is here! πŸ”Ή Improved benchmark performance πŸ”Ή Enhanced front-end capabilities πŸ”Ή Reduced hallucinations πŸ”Ή Supports JSON output & function calling βœ… Try it now: chat.deepseek.com πŸ”Œ No change to API usage β€” docs here: api-docs.deepseek.com/guides/reasoni… πŸ”—