Ashish Vaswani (@ashvaswani) 's Twitter Profile
Ashish Vaswani

@ashvaswani

ID: 874887507174981633

calendar_today14-06-2017 07:13:31

80 Tweet

22,22K Followers

1,1K Following

Ashish Vaswani (@ashvaswani) 's Twitter Profile Photo

We are hiring our first product engineers. Come join an incredible team advancing Human AI interaction! Please check our jobs page for all the open positions jobs.ashbyhq.com/essentialai

Essential AI (@essential_ai) 's Twitter Profile Photo

🗞️ We just launched our new landing page and dropped a fresh blog post on how LLMs learn to reflect and revise their thinking: In order to advance reasoning, it's vital to measure and understand its constituents, such as reflection. More to come - essential.ai

Ashish Vaswani (@ashvaswani) 's Twitter Profile Photo

Please check out our thorough study on the advantages of Muon. Second-order optimization is a promising path to more efficient LLM pretraining.

Yash Vanjani (@yashvanjani) 's Twitter Profile Photo

Super excited to share how we achieved significant performance gains in Muon optimizer for Pre-training LLMs at Essential AI! This was a great team effort — special thanks to Philip Monk , whose core ideas were instrumental in driving this improvement, and to Ashish Vaswani

Daniel Campos (@spacemanidol) 's Twitter Profile Photo

I am trying out this Thought-Boi Thing. Give it a read. The Hidden Cost of Augmentation: Every Tool You Use Changes You. open.substack.com/pub/spacemanid…

Aurko Roy (@happylemon56775) 's Twitter Profile Photo

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales

Excited to share what I worked on during my time at Meta.

- We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention

- We show how to adapt RoPE to tri-linear forms

- We show 2-simplicial attention scales
clem 🤗 (@clementdelangue) 's Twitter Profile Photo

We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!