Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile
Vaishnavh Nagarajan

@_vaishnavh

Research scientist at Google || Prev: CS PhD at Carnegie Mellon ||

Shifting to vaishnavh@ on the blue app

foundations of AI

ID: 874847778119266304

linkhttp://vaishnavh.github.io calendar_today14-06-2017 04:35:38

480 Tweet

2,2K Followers

582 Following

Zhengyang Geng (@zhengyanggeng) 's Twitter Profile Photo

Excited to share our work with my amazing collaborators, Goodeat, Xingjian Bai, Zico Kolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

Excited to share our work with my amazing collaborators, <a href="/Goodeat258/">Goodeat</a>, <a href="/SimulatedAnneal/">Xingjian Bai</a>, <a href="/zicokolter/">Zico Kolter</a>, and Kaiming.

In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,
Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

📢 New paper on creativity &amp; multi-token prediction! We design minimal open-ended tasks to argue:

→ LLMs are limited in creativity since they learn to predict the next token

→ creativity can be improved via multi-token learning &amp; injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: making it a habit to read about the ✨past ✨ and learn from it to make sense of the present

Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: 

making it a habit to read about the ✨past ✨ and learn from it to make sense of the present
Tanya Marwah (@__tm__157) 's Twitter Profile Photo

This is the first step in a direction that I am very excited about! Using LLMs to solve scientific computing problems and potentially discover faster (or new) algorithms. #AI4Science #ML4PDEs We show that LLMs can write PDE solver code, choose appropriate algorithms, and produce

Tanya Marwah (@__tm__157) 's Twitter Profile Photo

Yep what Junhong Shen said. I started working on ML for PDEs during my PhD. And the first three years was just reading books and appreciating the beauty and the difficulty of the subject!

Eugene Vinitsky 🍒🦋 (@eugenevinitsky) 's Twitter Profile Photo

We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.

Chenghao Yang (@chrome1996) 's Twitter Profile Photo

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Interesting alternative to multi-token prediction, though the figure is a bit unintuitive. Instead of attaching a head for each +d'th prediction, pass a dummy input token for each extra prediction through the model. This is A LOT more expensive, e.g. doing 2-step prediction

Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

Lucas Beyer (bl16) I was about to add that the first instance of dummy token based multi-token approach was in this paper that called it "parallel prediction" until I just noticed the author list! way ahead of its time! arxiv.org/abs/2306.07915

<a href="/giffmana/">Lucas Beyer (bl16)</a> I was about to add that the first instance of dummy token based multi-token approach was in this paper that called it "parallel prediction" until I just noticed the author list! way ahead of its time!

 arxiv.org/abs/2306.07915