TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile
TNG Technology Consulting GmbH

@tngtech

TNG, aka "The Nerd Group", is a consulting partnership focused on high end information technology, particularly AI. 916 employees, 99.9% academics, ~55% PhDs.

ID: 224374031

linkhttp://www.tngtech.com/en calendar_today08-12-2010 20:58:13

1,1K Tweet

816 Followers

80 Following

meng shao (@shao__meng) 's Twitter Profile Photo

DeepSeek-R1T-Chimera,结合 DeepSeek R1 的智能性和 V3 的token 效率,由 TNG Technology Consulting GmbH 团队开发 主要特点 - 规模:拥有 685B 参数,属于超大规模模型 - 类型:Text Generation Transformers - 架构:基于 DeepSeek-MoE Transformer 架构 技术特点 - 模型合并项目,将 DeepSeek-R1 和 DeepSeek-V3(0324)

DeepSeek-R1T-Chimera,结合 DeepSeek R1 的智能性和 V3 的token 效率,由 <a href="/tngtech/">TNG Technology Consulting GmbH</a> 团队开发

主要特点
- 规模:拥有 685B 参数,属于超大规模模型
- 类型:Text Generation Transformers
- 架构:基于 DeepSeek-MoE Transformer 架构

技术特点
- 模型合并项目,将 DeepSeek-R1 和 DeepSeek-V3(0324)
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Oh man, lucky day 😅 R1T-Chimera is ranked the #2 trending model on OpenRouter Sure, "trending" is a temporary attention metric, not reflecting total usage. And the world spins fast: everybody talks about Qwen3 now. Still a nice screenshot with Google AI, Microsoft,

Oh man, lucky day 😅

R1T-Chimera is ranked the #2 trending model on <a href="/OpenRouterAI/">OpenRouter</a>

Sure, "trending" is a temporary attention metric, not reflecting total usage.

And the world spins fast: everybody talks about Qwen3 now. Still a nice screenshot with <a href="/GoogleAI/">Google AI</a>, <a href="/Microsoft/">Microsoft</a>,
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

DeepSeek uploaded a new model on huggingface: DeepSeek-Prover-V2 It seems, the architecture is identical to V3 and R1 models, because: model_config.py shows no difference, also the safetensor index files are the same. One minor diff is a new experimental feature in

DeepSeek uploaded a new model on huggingface: DeepSeek-Prover-V2

It seems, the architecture is identical to V3 and R1 models, because:

model_config.py shows no difference, also the safetensor index files are the same.

One minor diff is a new experimental feature in
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

There's a new paper circulating looking in detail at LMArena leaderboard: "The Leaderboard Illusion" arxiv.org/abs/2504.20879 I first became a bit suspicious when at one point a while back, a Gemini model scored #1 way above the second best, but when I tried to switch for a few

TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Eight new AMD MI325X GPUs joined our compute cluster of NVIDIA H100s. The new Supermicro server is an AI machine with spectacular 2 Terabytes of total GPU memory in one ~10kW node. ROCm worked right away with full VRAM and GPU utilization, allowing new types of

Eight new <a href="/AMD/">AMD</a> MI325X GPUs joined our compute cluster of <a href="/nvidia/">NVIDIA</a> H100s.

The new <a href="/Supermicro_SMCI/">Supermicro</a> server is an AI machine with spectacular 2 Terabytes of total GPU memory in one ~10kW node.

ROCm worked right away with full VRAM and GPU utilization, allowing new types of
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Hello #USA 🇺🇸 TNG Technology Consulting USA Inc. is now incorporated in #Austin, #Texas. Thanks to our existing clients in #SiliconValley and #NewYork. We look forward to meeting more interesting people, fast companies and hard #IT problems to solve.

Hello #USA 🇺🇸 
TNG Technology Consulting USA Inc. is now incorporated in #Austin, #Texas. Thanks to our existing clients in #SiliconValley and #NewYork. We look forward to meeting more interesting people, fast companies and hard #IT problems to solve.
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

More evidence for the effectiveness of the Chimera construction method: Taking DeepSeek's R1-0528 release, we started benchmarking new Chimera variants on AIME-24 and SimpleQA. R1-0528 significantly improves AIME performance from 79.8 to 91.4 while doubling the amount of output

More evidence for the effectiveness of the Chimera construction method:

Taking DeepSeek's R1-0528 release, we started benchmarking new Chimera variants on AIME-24 and SimpleQA.

R1-0528 significantly improves AIME performance from 79.8 to 91.4 while doubling the amount of output