Akshat Shrivastava (@akshats07) 's Twitter Profile
Akshat Shrivastava

@akshats07

Co-founder & CTO @perceptroninc; ex Research Scientist @MetaAI (FAIR, AR, Assistant)

ID: 1932483559

linkhttp://akshatsh.github.io calendar_today04-10-2013 00:00:34

138 Tweet

736 Followers

314 Following

Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

Physical world modeling introduces a set of challenges around designing the right interaction space for our model and building the right/scalable data strategy. Reach out to [email protected] if you're interested!

Jeremy Dohmann (@jecdohmann) 's Twitter Profile Photo

I’m very excited to announce that I’ll be joining Perceptron AI (perceptron.inc?) as a researcher and founding member of the technical staff. I’ll be working with Akshat Shrivastava and Armen Aghajanyan to create the world’s first visual language foundation models specifically

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyone spreading conspiracy theories around R1/DeepSeek in general. (1/9)

Apoorv Khandelwal (@apoorvkh) 's Twitter Profile Photo

I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environments using the latest tools. These methods have saved me a lot of grief. Also a short guide to CUDA in appendix :) blog.apoorvkh.com/posts/project-…

Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

MoE's have been a key driver in improving performance for LLMs when memory is abundant, but what happens when we get to resource constrained devices? Checkout our latest work led by Patrick Huber exploring design decisions in making MoE's optimal for on-device deployment!

Aritra R G (@arig23498) 's Twitter Profile Photo

Bringing Efficiency to LLMs with Fine-Tuning LayerSkip, introduced in the 2024 paper by Mostafa Elhoushi et al. (arXiv:2404.16710), is a brilliant technique to accelerate large language model (LLM) inference without compromising accuracy. By training models with layer dropout and

Maciej Kilian (@kilian_maciej) 's Twitter Profile Photo

fun debugging journey w/Akshat Shrivastava: be careful around FP8 w. activation checkpointing activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation

fun debugging journey w/<a href="/AkshatS07/">Akshat Shrivastava</a>: be careful around FP8 w. activation checkpointing

activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation
Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

When Maciej Kilian and I first started talking about alignment and parameterization, he introduced several ideas presented in this blog post. As we continue to scale foundation models (esp multimodal), and with data-aware, scale-aware parameterization becoming more prevalent ,

Maciej Kilian (@kilian_maciej) 's Twitter Profile Photo

very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common. section 5.3 arxiv.org/pdf/2405.13218

very cool. we found similar results in diffusion model training where EMA on model weights &amp; const LR is more common.

section 5.3 arxiv.org/pdf/2405.13218
Charlie Hou (@hou_char) 's Twitter Profile Photo

[#ICML2025] Have you ever wanted to train LLMs on distributed private data but were blocked by model size or privacy constraints 😔? Here’s a solution: Introducing 🌸POPri (Policy Optimization for Private Data)! Poster 🗓️ today at 4:30pm PT, 📍East Exhibition Hall A-B E-1006

[#ICML2025] Have you ever wanted to train LLMs on distributed private data but were blocked by model size or privacy constraints 😔? Here’s a solution: Introducing 🌸POPri (Policy Optimization for Private Data)! Poster  🗓️ today at 4:30pm PT, 📍East Exhibition Hall A-B E-1006
Jeremy Dohmann (@jecdohmann) 's Twitter Profile Photo

I'm excited to be in ICML this week :-) Perceptron AI is co-sponsoring the Assessing World Models workshop this Friday. Come see some great talks from Jacob Andreas Naomi Saphra and more; topics include mechanistic interpretability, intuitive physics, LLMs for generating scientific