
AllenNLP
@ai2_allennlp
The AllenNLP team works on language-centered AI that equitably serves humanity. We deliver high-impact research and open-source tools to accelerate progress.
ID: 1026903001431138304
https://allenai.org/allennlp 07-08-2018 18:48:49
264 Tweet
14,14K Followers
38 Following



My teammates Costa Huang and Hamish Ivison have uploaded intermediate checkpoints for our recent RL models at Ai2. Hopefully this helps seed some research into how RL finetuning is impacting the weights! As we move towards full reasoner models we'll continue this. Models with it: OLMo




Heading to NAACL? With "verification being the key to AI" you should go to the poster session Friday, 9-10:30am to chat with my star colleagues Valentina Pyatkin + Jacob Morrison about RewardBench (and really RewardBench 2, evaluation, and reward models in post-training).






Super excited that our second reward model evaluation is out. It's substantially harder, much cleaner, and well correlated with downstream PPO/BoN sampling. Happy hillclimbing! Huge congrats to Saumya Malik who lead the project with a total commitment to excellence.







