Vaishnavh Nagarajan (@_vaishnavh) Twitter Tweets • TwiCopy

Vaishnavh Nagarajan

@_vaishnavh

+ Follow

Research scientist at Google || Prev: CS PhD at Carnegie Mellon ||

Shifting to vaishnavh@ on the blue app

foundations of AI

ID: 874847778119266304

linkhttp://vaishnavh.github.io calendar_today14-06-2017 04:35:38

480 Tweet

2,2K Followers

582 Following

Zhengyang Geng

@zhengyanggeng

6 months ago

Excited to share our work with my amazing collaborators, Goodeat, Xingjian Bai, Zico Kolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

Excited to share our work with my amazing collaborators, <a href="/Goodeat258/">Goodeat</a>, <a href="/SimulatedAnneal/">Xingjian Bai</a>, <a href="/zicokolter/">Zico Kolter</a>, and Kaiming.

In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

thumb_up_off_alt111

chat_bubble_outline4

repeat28

shareShare

Vaishnavh Nagarajan

@_vaishnavh

6 months ago

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

thumb_up_off_alt137

chat_bubble_outline1

repeat35

shareShare

Vaishnavh Nagarajan

@_vaishnavh

5 months ago

Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: making it a habit to read about the ✨past ✨ and learn from it to make sense of the present

thumb_up_off_alt96

chat_bubble_outline1

repeat13

shareShare

Vaishnavh Nagarajan

@_vaishnavh

5 months ago

loved many of the points here! (loved the colors too)

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Tanya Marwah

@__tm__157

5 months ago

This is the first step in a direction that I am very excited about! Using LLMs to solve scientific computing problems and potentially discover faster (or new) algorithms. #AI4Science #ML4PDEs We show that LLMs can write PDE solver code, choose appropriate algorithms, and produce

thumb_up_off_alt33

chat_bubble_outline0

repeat10

shareShare

Tanya Marwah

@__tm__157

5 months ago

Yep what Junhong Shen said. I started working on ML for PDEs during my PhD. And the first three years was just reading books and appreciating the beauty and the difficulty of the subject!

thumb_up_off_alt18

chat_bubble_outline1

repeat1

shareShare

Eugene Vinitsky 🍒🦋

@eugenevinitsky

5 months ago

We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.

thumb_up_off_alt185

chat_bubble_outline5

repeat30

shareShare

Chenghao Yang

@chrome1996

5 months ago

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and

thumb_up_off_alt88

chat_bubble_outline1

repeat25

shareShare

Puneesh Deora

@puneeshdeora

5 months ago

🚨 New paper drop! 🚨 🤔 When a transformer sees a sequence that could be explained by many rules, which rule does it pick? It chooses the simplest sufficient one! 🧵👇

thumb_up_off_alt348

chat_bubble_outline5

repeat49

shareShare

Lucas Beyer (bl16)

@giffmana

5 months ago

Interesting alternative to multi-token prediction, though the figure is a bit unintuitive. Instead of attaching a head for each +d'th prediction, pass a dummy input token for each extra prediction through the model. This is A LOT more expensive, e.g. doing 2-step prediction

thumb_up_off_alt311

chat_bubble_outline12

repeat20

shareShare

Vaishnavh Nagarajan

@_vaishnavh

5 months ago

Lucas Beyer (bl16) I was about to add that the first instance of dummy token based multi-token approach was in this paper that called it "parallel prediction" until I just noticed the author list! way ahead of its time! arxiv.org/abs/2306.07915

<a href="/giffmana/">Lucas Beyer (bl16)</a> I was about to add that the first instance of dummy token based multi-token approach was in this paper that called it "parallel prediction" until I just noticed the author list! way ahead of its time!

arxiv.org/abs/2306.07915

thumb_up_off_alt20

chat_bubble_outline2

repeat2

shareShare