Akshat Shrivastava (@akshats07) Twitter Tweets • TwiCopy

Akshat Shrivastava

@akshats07

+ Follow

Co-founder & CTO @perceptroninc; ex Research Scientist @MetaAI (FAIR, AR, Assistant)

ID: 1932483559

linkhttp://akshatsh.github.io calendar_today04-10-2013 00:00:34

138 Tweet

736 Followers

314 Following

Akshat Shrivastava

@akshats07

9 months ago

Physical world modeling introduces a set of challenges around designing the right interaction space for our model and building the right/scalable data strategy. Reach out to [email protected] if you're interested!

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Jeremy Dohmann

@jecdohmann

8 months ago

I’m very excited to announce that I’ll be joining Perceptron AI (perceptron.inc?) as a researcher and founding member of the technical staff. I’ll be working with Akshat Shrivastava and Armen Aghajanyan to create the world’s first visual language foundation models specifically

thumb_up_off_alt57

chat_bubble_outline1

repeat10

shareShare

Armen Aghajanyan

@armenagha

7 months ago

There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyone spreading conspiracy theories around R1/DeepSeek in general. (1/9)

thumb_up_off_alt5,5K

chat_bubble_outline126

repeat496

shareShare

Apoorv Khandelwal

@apoorvkh

7 months ago

I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environments using the latest tools. These methods have saved me a lot of grief. Also a short guide to CUDA in appendix :) blog.apoorvkh.com/posts/project-…

thumb_up_off_alt73

chat_bubble_outline6

repeat8

shareShare

Akshat Shrivastava

@akshats07

6 months ago

MoE's have been a key driver in improving performance for LLMs when memory is abundant, but what happens when we get to resource constrained devices? Checkout our latest work led by Patrick Huber exploring design decisions in making MoE's optimal for on-device deployment!

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Aritra R G

@arig23498

5 months ago

Bringing Efficiency to LLMs with Fine-Tuning LayerSkip, introduced in the 2024 paper by Mostafa Elhoushi et al. (arXiv:2404.16710), is a brilliant technique to accelerate large language model (LLM) inference without compromising accuracy. By training models with layer dropout and

thumb_up_off_alt17

chat_bubble_outline3

repeat4

shareShare

Maciej Kilian

@kilian_maciej

5 months ago

fun debugging journey w/Akshat Shrivastava: be careful around FP8 w. activation checkpointing activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation

fun debugging journey w/<a href="/AkshatS07/">Akshat Shrivastava</a>: be careful around FP8 w. activation checkpointing

activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation

thumb_up_off_alt77

chat_bubble_outline4

repeat11

shareShare

Maciej Kilian

@kilian_maciej

5 months ago

stay with me now

thumb_up_off_alt21

chat_bubble_outline0

repeat4

shareShare

Akshat Shrivastava

@akshats07

5 months ago

When Maciej Kilian and I first started talking about alignment and parameterization, he introduced several ideas presented in this blog post. As we continue to scale foundation models (esp multimodal), and with data-aware, scale-aware parameterization becoming more prevalent ,

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Maciej Kilian

@kilian_maciej

4 months ago

very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common. section 5.3 arxiv.org/pdf/2405.13218

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Akshat Shrivastava

@akshats07

2 months ago

I'll be at ICML this week, reach out if you want to chat!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Charlie Hou

@hou_char

2 months ago

[#ICML2025] Have you ever wanted to train LLMs on distributed private data but were blocked by model size or privacy constraints 😔? Here’s a solution: Introducing 🌸POPri (Policy Optimization for Private Data)! Poster 🗓️ today at 4:30pm PT, 📍East Exhibition Hall A-B E-1006

thumb_up_off_alt8

chat_bubble_outline1

repeat5

shareShare

Jeremy Dohmann

@jecdohmann

2 months ago

I'm excited to be in ICML this week :-) Perceptron AI is co-sponsoring the Assessing World Models workshop this Friday. Come see some great talks from Jacob Andreas Naomi Saphra and more; topics include mechanistic interpretability, intuitive physics, LLMs for generating scientific

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare