Mansheej Paul (@mansiege) Twitter Tweets • TwiCopy

Dan Biderman

a year ago

✨Paper out in final form: exciting results from our semi-supervised pose estimation package, Lightning Pose, which is now adopted by a number of great neuroscience labs. Please give it a whirl: github.com/danbider/light…

thumb_up_off_alt74

chat_bubble_outline2

repeat21

shareShare

Cody Blakeney

@code_star

a year ago

If you want to learn more about how the Llama3 team used annealing to assess data quality check out our paper! At ICML? go chat with Mansheej Paul about it!

If you want to learn more about how the Llama3 team used annealing to assess data quality check out our paper! At ICML? go chat with <a href="/mansiege/">Mansheej Paul</a> about it!

thumb_up_off_alt64

chat_bubble_outline2

repeat10

shareShare

Matei Zaharia

@matei_zaharia

a year ago

Awesome to see so much open science shared in the Llama 3.1 paper, including a shoutout to Cody Blakeney and Mansheej Paul's work. There are also great details on RLHF and other aspects of Llama 3.1.

thumb_up_off_alt48

chat_bubble_outline0

repeat16

shareShare

Mansheej Paul

@mansiege

a year ago

Pretraining data ablations are expensive: how can we measure data quality fast and cheap? If you're at ICML, come find out at the ES-FoMo poster session today in Lehar 2 at 1 pm: icml.cc/virtual/2024/w…

thumb_up_off_alt41

chat_bubble_outline0

repeat14

shareShare

Zack Ankner

@zackankner

a year ago

thumb_up_off_alt31

chat_bubble_outline0

repeat5

shareShare

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

a year ago

LLM as a judge works well by burning extra Inference compute on chain of thought and self critiques. Reward models work well due to Bradley Terry style objectives being a good fit for most current preference datasets Now you can have the best of both worlds!

thumb_up_off_alt74

chat_bubble_outline1

repeat9

shareShare

Mansheej Paul

@mansiege

a year ago

Code and models for our latest work Critique-out-Loud (CLoud) Reward models is now released! Check out our paper (arxiv.org/abs/2408.11791) for more details on using reward models to reason before predicting a reward score.

thumb_up_off_alt22

chat_bubble_outline3

repeat2

shareShare

Zack Ankner

@zackankner

a year ago

Agreed ;) But in all seriousness, its cool to see everyone converging on reward models that perform explicit reasoning by critiquing out loud. Super excited to see how people build on top of these works.

thumb_up_off_alt52

chat_bubble_outline2

repeat9

shareShare

Cody Blakeney

@code_star

10 months ago

If you want to read more about the curriculum training used in OLMo 2 checkout our (Mansheej Paul Brett Larsen Sean Owen) paper! Congrats on the release to everyone at AI2! (but especially Luca Soldaini 🎀 and Kyle Lo <3 data ) arxiv.org/abs/2406.03476

If you want to read more about the curriculum training used in OLMo 2 checkout our (<a href="/mansiege/">Mansheej Paul</a> <a href="/_BrettLarsen/">Brett Larsen</a> Sean Owen) paper!

Congrats on the release to everyone at AI2! (but especially <a href="/soldni/">Luca Soldaini 🎀</a> and <a href="/kylelostat/">Kyle Lo</a> <3 data )

arxiv.org/abs/2406.03476

thumb_up_off_alt50

chat_bubble_outline1

repeat8

shareShare

Zack Ankner

@zackankner

8 months ago

Critique out loud reward models made it into the Kimi k1.5 technical report! Super cool to see someone scale it up to 800k inputs and to see how much better reward modeling it led to!

thumb_up_off_alt62

chat_bubble_outline2

repeat8

shareShare

Core Francisco Park

@corefpark

7 months ago

💥New Paper! Algorithmic Phases of In-Context Learning: We show that transformers learn a superposition of different algorithmic solutions depending on the data diversity, training time and context length! 1/n

thumb_up_off_alt433

chat_bubble_outline7

repeat64

shareShare

Dan Biderman

@dan_biderman

7 months ago

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost

thumb_up_off_alt600

chat_bubble_outline34

repeat165

shareShare

Davis Blalock

@davisblalock

3 months ago

Deep learning training is a mathematical dumpster fire. But it turns out that if you *fix* the math, everything kinda just works…fp8 training, hyperparameter transfer, training stability, and more. [1/n]

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat148

shareShare

Mansheej Paul

@mansiege

3 months ago

Imagine if boats had twitter. They’d be like “@dock is this true?”

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Mansheej Paul

@mansiege

3 months ago

Imagine if threads had twitter. They’d be like “@lock can I do?”

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Misha Laskin

@mishalaskin

2 months ago

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at Reflection AI. The best-in-class code research agent, built for teams and organizations.