Zhewei Yao (@yao_zhewei) Twitter Tweets • TwiCopy

Excited to welcome Snowflake-Arctic on the Berkeley Function Calling Leaderboard ❄️ How does Snowflake-arctic-instruct, an apache-2.0 licensed, 480B parameter MoE model perform on invoking functions (aka tools)? Attached is a quick comparison with gpt-4-0125-preview (yellow).

thumb_up_off_alt35

chat_bubble_outline1

repeat10

shareShare

Together AI

@togethercompute

2 years ago

With over 20K downloads per month, community engagement with the RedPajama-V2 dataset has been incredible. The 30 trillion tokens of data have been used to train leading models like the recently released SnowflakeDB Arctic LLM. We've compiled a list of FAQs for using it here:

thumb_up_off_alt32

chat_bubble_outline1

repeat8

shareShare

Aurick Qiao

@aurickq

a year ago

We are excited to share SwiftKV, our recent work at SnowflakeDB AI Research! SwiftKV reduces the pre-fill compute for enterprise LLM inference by up to 2x, resulting in higher serving throughput for input-heavy workloads. 🧵

We are excited to share SwiftKV, our recent work at <a href="/SnowflakeDB/">SnowflakeDB</a> AI Research! SwiftKV reduces the pre-fill compute for enterprise LLM inference by up to 2x, resulting in higher serving throughput for input-heavy workloads. 🧵

thumb_up_off_alt44

chat_bubble_outline4

repeat16

shareShare

Fangyu Lei

@fangyu_lei

a year ago

Wow, congratulations 🎉! A team achieved a performance of 24.68% on Spider 2.0-Snow. Are there any better methods out there? 🧐 spider2-sql.github.io

thumb_up_off_alt83

chat_bubble_outline2

repeat19

shareShare

Snowflake

@snowflakedb

10 months ago

Introducing Snowflake-Llama models with SwiftKV optimizations! SwiftKV optimizations developed and integrated into vLLM improve LLM inference throughput to lower the cost. Snowflake-derived models, based on Meta’s Llama 3.3 70B and Llama 3.1 405B base models, are now available

thumb_up_off_alt40

chat_bubble_outline3

repeat12

shareShare

Stas Bekman

@stasbekman

9 months ago

Do you want ArcticTraining at SnowflakeDB to add an ability to post-train DeepSeek V3/R1 models with DPO using just a few GPU nodes? Please vote here and tell others about it: github.com/snowflakedb/Ar… ArcticTraining is an open-source, easy to use post-training framework

thumb_up_off_alt18

chat_bubble_outline0

repeat6

shareShare

Canwen Xu

@xucanwen

8 months ago

Snowflake's new Arctic Text2SQL model ❄️ sets a new standard for natural language to SQL accuracy! 🚀 Using execution-guided CoT & DPO, it outperforms top models. 💪 Dive into the details: snowflake.com/en/engineering… 📄 #Text2SQL #AI #MachineLearning 🧠

thumb_up_off_alt20

chat_bubble_outline0

repeat6

shareShare

Aurick Qiao

@aurickq

7 months ago

Excited to share our work on Speculative Decoding Snowflake AI Research! 🚀 4x faster LLM inference for coding agents like OpenHands All Hands AI 💬 2.4x faster LLM inference for interactive chat 💻 Open-source via Arctic Inference as a plugin for vLLM 🧵

Excited to share our work on Speculative Decoding <a href="/Snowflake/">Snowflake</a> AI Research!

🚀 4x faster LLM inference for coding agents like OpenHands <a href="/allhands_ai/">All Hands AI</a>

💬 2.4x faster LLM inference for interactive chat

💻 Open-source via Arctic Inference as a plugin for <a href="/vllm_project/">vLLM</a>

🧵

thumb_up_off_alt164

chat_bubble_outline3

repeat38

shareShare

Zhewei Yao

@yao_zhewei

7 months ago

🚀 Big news! Our collab w/ Snowflake, UCSD & UMD topped the BIRD leaderboard — beating prior SOTA by 2.8% in Text-to-SQL reasoning! RL was tough, but worth it. 📢 Best model coming soon. #AI #LLM #TextToSQL #ReinforcementLearning #Snowflake #UCSD #UMD #NLP #BIRDLeaderboard

thumb_up_off_alt32

chat_bubble_outline1

repeat9

shareShare

Aurick Qiao

@aurickq

6 months ago

Very proud of our recent work at Snowflake AI Research, which spans from the systems layer to the application layer. Check out this article from VentureBeat which highlights two of our major initiatives: LLM inference performance and Text-to-SQL!

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Yusuf Ozuysal

@yusufozuysal

6 months ago

How do faster inference (up to 16x for embedding models!) and better Text2SQL through RL sound? A jam-packed launch from Snowflake AI research team detailing the technologies bundled in our ArcticInference framework and also diving deeper into how the model at the top of the

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Łukasz Borchmann

@lukaszborchmann

6 months ago

Is complex reward engineering essential for state-of-the-art text-to-SQL? We've just released Arctic-Text2SQL-R1, which answers this question "NO." See 🧵 and snowflake.com/en/engineering… #Text2SQL #LLM

thumb_up_off_alt52

chat_bubble_outline1

repeat5

shareShare

Jeff Rasley

@jeffra45

6 months ago

🧵1/ New release from Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇

thumb_up_off_alt72

chat_bubble_outline1

repeat18

shareShare

Stas Bekman

@stasbekman

5 months ago

A deep dive in activation memory offloading: Activation checkpointing helps to save a ton of GPU memory, but those checkpoint tensors are still huge when long sequence length is used. Why not offload those to CPU memory? The attached memory profiler diagram shows the memory

thumb_up_off_alt222

chat_bubble_outline5

repeat30

shareShare

Zhewei Yao

clem 🤗

Zhewei Yao

Zhewei Yao

Shishir Patil

Together AI

Aurick Qiao

Fangyu Lei

Snowflake

Stas Bekman

Canwen Xu

Aurick Qiao

Zhewei Yao

Aurick Qiao

Yusuf Ozuysal

Łukasz Borchmann

Jeff Rasley

Stas Bekman