Zihao Wang (@wzihao12) 's Twitter Profile
Zihao Wang

@wzihao12

PhD student at UChicago Stat

ID: 1499749716535521281

linkhttps://zihao12.github.io/ calendar_today04-03-2022 14:13:12

37 Tweet

173 Followers

339 Following

Zihao Wang (@wzihao12) 's Twitter Profile Photo

Excited to be at #NeurIPS2023 and looking forward to meeting new and old friends! Interested in concept control for text-to-image models? Find our poster tomorrow (Tue) at 10:45 AM arxiv.org/abs/2302.03693 w/ Lin Gui Jeffrey Negrea and Victor Veitch 🔸

AK (@_akhaliq) 's Twitter Profile Photo

Transforming and Combining Rewards for Aligning Large Language Models paper page: huggingface.co/papers/2402.00… A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the

Transforming and Combining Rewards for Aligning Large Language Models

paper page: huggingface.co/papers/2402.00…

A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the
Victor Veitch 🔸 (@victorveitch) 's Twitter Profile Photo

LLM best-of-n sampling works great in practice---but why? Turns out: it's the best possible policy for maximizing win rate over the base model! Then: we use this to get a truly sweet alignment scheme: easy tweaks, huge gains w Lin Gui Cristina Garbacea arxiv.org/abs/2406.00832

Victor Veitch 🔸 (@victorveitch) 's Twitter Profile Photo

Fundamentally, high-level concepts group into categorical variables---mammal, reptile, fish, bird---with a semantic hierarchy---poodle is a dog is a mammal is an animal. How do LLMs internally represent this structure? arxiv.org/abs/2406.01506

Yibo Jiang (@yibophd) 's Twitter Profile Photo

Are LLMs just doing next token predictions? It is believed that if an LLM can accurately predict the next tokens in a Wikipedia entry, it essentially "learns" the information. But do pre-trained LLMs actually need to understand context sentences to solve this task? The answer is

Are LLMs just doing next token predictions? It is believed that if an LLM can accurately predict the next tokens in a Wikipedia entry, it essentially "learns" the information.

But do pre-trained LLMs actually need to understand context sentences to solve this task? The answer is
Zihao Wang (@wzihao12) 's Twitter Profile Photo

Excited to present my paper at #ICML2024 on transforming and combining reward models for RLHF! Join me on Wed, July 24, 11:30 a.m. - 1 p.m. CEST at Hall C 4-9 #2710.

David Reber (@davidpreber) 's Twitter Profile Photo

🧵 RATE: Score Reward Models with Imperfect Rewrites of Rewrites 1/ How do you measure whether a reward model incentivizes helpfulness without accidentally measuring length, complexity, etc? Rewrites of rewrites give good counterfactuals, without needing to list all confounders!

🧵 RATE: Score Reward Models with Imperfect Rewrites of Rewrites
1/ How do you measure whether a reward model incentivizes helpfulness without accidentally measuring length, complexity, etc?
Rewrites of rewrites give good counterfactuals, without needing to list all confounders!
Victor Veitch 🔸 (@victorveitch) 's Twitter Profile Photo

Semantics in language is naturally hierarchical, but attempts to interpret LLMs often ignore this. Turns out: baking semantic hierarchy into sparse autoencoders can give big jumps in interpretability and efficiency. Thread + bonus musings on the value of SAEs:

Semantics in language is naturally hierarchical, but attempts to interpret LLMs often ignore this. 

Turns out: baking semantic hierarchy into sparse autoencoders can give big jumps in interpretability and efficiency. 

Thread + bonus musings on the value of SAEs: