Aryaman Arora (@aryaman2020) 's Twitter Profile
Aryaman Arora

@aryaman2020

member of technical staff @stanfordnlp

ID: 1070455434304200705

linkhttp://aryaman.io/ calendar_today05-12-2018 23:10:38

12,12K Tweet

6,6K Followers

2,2K Following

Justus Mattern (@matternjustus) 's Twitter Profile Photo

Excited to launch SYNTHETIC-2! Beyond the amazing inference infra work that went into this, we've also made a bunch of changes to our data curation. Other than for SYNTHETIC-1, where we mainly focused on SFT, we've particularly taken care of making this dataset useful for RL;

Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Aryaman Arora (@aryaman2020) 's Twitter Profile Photo

i was cautiously optimistic on CLTs/sparse feature circuits but further reflection makes me feel the error nodes (which are non-linear + example-specific) make them totally cooked

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Neil Rathi (@neil_rathi) 's Twitter Profile Photo

new paper 🌟 interpretation of uncertainty expressions like "i think" differs cross-linguistically. we show that (1) llms are sensitive to these differences but (2) humans overrely on their outputs across languages

new paper 🌟

interpretation of uncertainty expressions like "i think" differs cross-linguistically. we show that (1) llms are sensitive to these differences but (2) humans overrely on their outputs across languages