Zhewei Yao (@yao_zhewei) 's Twitter Profile
Zhewei Yao

@yao_zhewei

Working on AI at @snowflakedb, @MSFTDeepSpeed core-contributor, @UCBerkeley Ph.D.

ID: 1240722392902590464

calendar_today19-03-2020 19:30:51

24 Tweet

104 Followers

91 Following

Shishir Patil (@shishirpatil_) 's Twitter Profile Photo

Excited to welcome Snowflake-Arctic on the Berkeley Function Calling Leaderboard ❄️ How does Snowflake-arctic-instruct, an apache-2.0 licensed, 480B parameter MoE model perform on invoking functions (aka tools)? Attached is a quick comparison with gpt-4-0125-preview (yellow).

Excited to welcome Snowflake-Arctic on the Berkeley Function Calling Leaderboard ❄️

How does Snowflake-arctic-instruct, an apache-2.0 licensed, 480B parameter MoE model perform on invoking functions (aka tools)? Attached is a quick comparison with gpt-4-0125-preview (yellow).
Together AI (@togethercompute) 's Twitter Profile Photo

With over 20K downloads per month, community engagement with the RedPajama-V2 dataset has been incredible. The 30 trillion tokens of data have been used to train leading models like the recently released SnowflakeDB Arctic LLM. We've compiled a list of FAQs for using it here:

With over 20K downloads per month, community engagement with the RedPajama-V2 dataset has been incredible. 

The 30 trillion tokens of data have been used to train leading models like the recently released <a href="/SnowflakeDB/">SnowflakeDB</a> Arctic LLM.

We've compiled a list of FAQs for using it here:
Aurick Qiao (@aurickq) 's Twitter Profile Photo

We are excited to share SwiftKV, our recent work at SnowflakeDB AI Research! SwiftKV reduces the pre-fill compute for enterprise LLM inference by up to 2x, resulting in higher serving throughput for input-heavy workloads. 🧵

We are excited to share SwiftKV, our recent work at <a href="/SnowflakeDB/">SnowflakeDB</a> AI Research! SwiftKV reduces the pre-fill compute for enterprise LLM inference by up to 2x, resulting in higher serving throughput for input-heavy workloads. 🧵
Fangyu Lei (@fangyu_lei) 's Twitter Profile Photo

Wow, congratulations 🎉! A team achieved a performance of 24.68% on Spider 2.0-Snow. Are there any better methods out there? 🧐 spider2-sql.github.io

Wow, congratulations 🎉! A team achieved a performance of 24.68% on Spider 2.0-Snow. Are there any better methods out there? 🧐
spider2-sql.github.io
Snowflake (@snowflakedb) 's Twitter Profile Photo

Introducing Snowflake-Llama models with SwiftKV optimizations! SwiftKV optimizations developed and integrated into vLLM improve LLM inference throughput to lower the cost. Snowflake-derived models, based on Meta’s Llama 3.3 70B and Llama 3.1 405B base models, are now available

Introducing Snowflake-Llama models with SwiftKV optimizations!

SwiftKV optimizations developed and integrated into vLLM improve LLM inference throughput to lower the cost. Snowflake-derived models, based on <a href="/Meta/">Meta</a>’s Llama 3.3 70B and Llama 3.1 405B base models, are now available
Stas Bekman (@stasbekman) 's Twitter Profile Photo

Do you want ArcticTraining at SnowflakeDB to add an ability to post-train DeepSeek V3/R1 models with DPO using just a few GPU nodes? Please vote here and tell others about it: github.com/snowflakedb/Ar… ArcticTraining is an open-source, easy to use post-training framework

Canwen Xu (@xucanwen) 's Twitter Profile Photo

Snowflake's new Arctic Text2SQL model ❄️ sets a new standard for natural language to SQL accuracy! 🚀 Using execution-guided CoT & DPO, it outperforms top models. 💪 Dive into the details: snowflake.com/en/engineering… 📄 #Text2SQL #AI #MachineLearning 🧠

Aurick Qiao (@aurickq) 's Twitter Profile Photo

Excited to share our work on Speculative Decoding Snowflake AI Research! 🚀 4x faster LLM inference for coding agents like OpenHands All Hands AI 💬 2.4x faster LLM inference for interactive chat 💻 Open-source via Arctic Inference as a plugin for vLLM 🧵

Excited to share our work on Speculative Decoding <a href="/Snowflake/">Snowflake</a> AI Research!

🚀 4x faster LLM inference for coding agents like OpenHands <a href="/allhands_ai/">All Hands AI</a>

💬 2.4x faster LLM inference for interactive chat 

💻 Open-source via Arctic Inference as a plugin for <a href="/vllm_project/">vLLM</a> 

🧵
Zhewei Yao (@yao_zhewei) 's Twitter Profile Photo

🚀 Big news! Our collab w/ Snowflake, UCSD & UMD topped the BIRD leaderboard — beating prior SOTA by 2.8% in Text-to-SQL reasoning! RL was tough, but worth it. 📢 Best model coming soon. #AI #LLM #TextToSQL #ReinforcementLearning #Snowflake #UCSD #UMD #NLP #BIRDLeaderboard

🚀 Big news! Our collab w/ Snowflake, UCSD &amp; UMD topped the BIRD leaderboard — beating prior SOTA by 2.8% in Text-to-SQL reasoning! RL was tough, but worth it.
📢 Best model coming soon.
#AI #LLM #TextToSQL #ReinforcementLearning #Snowflake #UCSD #UMD #NLP #BIRDLeaderboard
Aurick Qiao (@aurickq) 's Twitter Profile Photo

Very proud of our recent work at Snowflake AI Research, which spans from the systems layer to the application layer. Check out this article from VentureBeat which highlights two of our major initiatives: LLM inference performance and Text-to-SQL!

Yusuf Ozuysal (@yusufozuysal) 's Twitter Profile Photo

How do faster inference (up to 16x for embedding models!) and better Text2SQL through RL sound? A jam-packed launch from Snowflake AI research team detailing the technologies bundled in our ArcticInference framework and also diving deeper into how the model at the top of the

Łukasz Borchmann (@lukaszborchmann) 's Twitter Profile Photo

Is complex reward engineering essential for state-of-the-art text-to-SQL? We've just released Arctic-Text2SQL-R1, which answers this question "NO." See 🧵 and snowflake.com/en/engineering… #Text2SQL #LLM

Is complex reward engineering essential for state-of-the-art text-to-SQL? We've just released Arctic-Text2SQL-R1, which answers this question "NO."

See 🧵 and snowflake.com/en/engineering… 

#Text2SQL #LLM
Jeff Rasley (@jeffra45) 's Twitter Profile Photo

🧵1/ New release from Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇

Stas Bekman (@stasbekman) 's Twitter Profile Photo

A deep dive in activation memory offloading: Activation checkpointing helps to save a ton of GPU memory, but those checkpoint tensors are still huge when long sequence length is used. Why not offload those to CPU memory? The attached memory profiler diagram shows the memory

A deep dive in activation memory offloading: 

Activation checkpointing helps to save a ton of GPU memory, but those checkpoint tensors are still huge when long sequence length is used. Why not offload those to CPU memory? The attached memory profiler diagram shows the memory