Mansheej Paul (@mansiege) 's Twitter Profile
Mansheej Paul

@mansiege

mansheej.github.io

ID: 914392220651810816

calendar_today01-10-2017 07:31:08

171 Tweet

596 Followers

694 Following

Dan Biderman (@dan_biderman) 's Twitter Profile Photo

✨Paper out in final form: exciting results from our semi-supervised pose estimation package, Lightning Pose, which is now adopted by a number of great neuroscience labs. Please give it a whirl: github.com/danbider/light…

Cody Blakeney (@code_star) 's Twitter Profile Photo

If you want to learn more about how the Llama3 team used annealing to assess data quality check out our paper! At ICML? go chat with Mansheej Paul about it!

If you want to learn more about how the Llama3 team used annealing to assess data quality check out our paper! At ICML? go chat with <a href="/mansiege/">Mansheej Paul</a> about it!
Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Awesome to see so much open science shared in the Llama 3.1 paper, including a shoutout to Cody Blakeney and Mansheej Paul's work. There are also great details on RLHF and other aspects of Llama 3.1.

Mansheej Paul (@mansiege) 's Twitter Profile Photo

Pretraining data ablations are expensive: how can we measure data quality fast and cheap? If you're at ICML, come find out at the ES-FoMo poster session today in Lehar 2 at 1 pm: icml.cc/virtual/2024/w…

Prithviraj (Raj) Ammanabrolu (@rajammanabrolu) 's Twitter Profile Photo

LLM as a judge works well by burning extra Inference compute on chain of thought and self critiques. Reward models work well due to Bradley Terry style objectives being a good fit for most current preference datasets Now you can have the best of both worlds!

LLM as a judge works well by burning extra Inference compute on chain of thought and self critiques. Reward models work well due to Bradley Terry style objectives being a good fit for most current preference datasets

Now you can have the best of both worlds!
Mansheej Paul (@mansiege) 's Twitter Profile Photo

Code and models for our latest work Critique-out-Loud (CLoud) Reward models is now released! Check out our paper (arxiv.org/abs/2408.11791) for more details on using reward models to reason before predicting a reward score.

Zack Ankner (@zackankner) 's Twitter Profile Photo

Agreed ;) But in all seriousness, its cool to see everyone converging on reward models that perform explicit reasoning by critiquing out loud. Super excited to see how people build on top of these works.

Agreed ;)

But in all seriousness, its cool to see everyone converging on reward models that perform explicit reasoning by critiquing out loud. Super excited to see how people build on top of these works.
Cody Blakeney (@code_star) 's Twitter Profile Photo

If you want to read more about the curriculum training used in OLMo 2 checkout our (Mansheej Paul Brett Larsen Sean Owen) paper! Congrats on the release to everyone at AI2! (but especially Luca Soldaini 🎀 and Kyle Lo <3 data ) arxiv.org/abs/2406.03476

If you want to read more about the curriculum training used in OLMo 2 checkout our (<a href="/mansiege/">Mansheej Paul</a> <a href="/_BrettLarsen/">Brett Larsen</a> Sean Owen) paper! 

Congrats on the release to everyone at AI2! (but especially <a href="/soldni/">Luca Soldaini 🎀</a> and <a href="/kylelostat/">Kyle Lo</a> &lt;3 data )
 
arxiv.org/abs/2406.03476
Zack Ankner (@zackankner) 's Twitter Profile Photo

Critique out loud reward models made it into the Kimi k1.5 technical report! Super cool to see someone scale it up to 800k inputs and to see how much better reward modeling it led to!

Critique out loud reward models made it into the Kimi k1.5 technical report! Super cool to see someone scale it up to 800k inputs and to see how much better reward modeling it led to!
Core Francisco Park (@corefpark) 's Twitter Profile Photo

💥New Paper! Algorithmic Phases of In-Context Learning: We show that transformers learn a superposition of different algorithmic solutions depending on the data diversity, training time and context length! 1/n

💥New Paper!
Algorithmic Phases of In-Context Learning:

We show that transformers learn a superposition of different algorithmic solutions depending on the data diversity, training time and context length!

1/n
Dan Biderman (@dan_biderman) 's Twitter Profile Photo

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost

Davis Blalock (@davisblalock) 's Twitter Profile Photo

Deep learning training is a mathematical dumpster fire. But it turns out that if you *fix* the math, everything kinda just works…fp8 training, hyperparameter transfer, training stability, and more. [1/n]

Deep learning training is a mathematical dumpster fire.

But it turns out that if you *fix* the math, everything kinda just works…fp8 training, hyperparameter transfer, training stability, and more. [1/n]
Misha Laskin (@mishalaskin) 's Twitter Profile Photo

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at Reflection AI. The best-in-class code research agent, built for teams and organizations.