Zihao Wang (@wzihao12) Twitter Tweets • TwiCopy

Zihao Wang

@wzihao12

+ Follow

PhD student at UChicago Stat

ID: 1499749716535521281

linkhttps://zihao12.github.io/ calendar_today04-03-2022 14:13:12

37 Tweet

173 Followers

339 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Excited to be at #NeurIPS2023 and looking forward to meeting new and old friends! Interested in concept control for text-to-image models? Find our poster tomorrow (Tue) at 10:45 AM arxiv.org/abs/2302.03693 w/ Lin Gui Jeffrey Negrea and Victor Veitch 🔸

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

AK

@_akhaliq

a year ago

Transforming and Combining Rewards for Aligning Large Language Models paper page: huggingface.co/papers/2402.00… A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the

thumb_up_off_alt167

chat_bubble_outline1

repeat40

shareShare

Victor Veitch 🔸

@victorveitch

a year ago

LLM best-of-n sampling works great in practice---but why? Turns out: it's the best possible policy for maximizing win rate over the base model! Then: we use this to get a truly sweet alignment scheme: easy tweaks, huge gains w Lin Gui Cristina Garbacea arxiv.org/abs/2406.00832

thumb_up_off_alt82

chat_bubble_outline5

repeat20

shareShare

Victor Veitch 🔸

@victorveitch

a year ago

Fundamentally, high-level concepts group into categorical variables---mammal, reptile, fish, bird---with a semantic hierarchy---poodle is a dog is a mammal is an animal. How do LLMs internally represent this structure? arxiv.org/abs/2406.01506

thumb_up_off_alt621

chat_bubble_outline11

repeat123

shareShare

Yibo Jiang

@yibophd

a year ago

Are LLMs just doing next token predictions? It is believed that if an LLM can accurately predict the next tokens in a Wikipedia entry, it essentially "learns" the information. But do pre-trained LLMs actually need to understand context sentences to solve this task? The answer is

thumb_up_off_alt193

chat_bubble_outline5

repeat42

shareShare

Zihao Wang

@wzihao12

a year ago

Excited to present my paper at #ICML2024 on transforming and combining reward models for RLHF! Join me on Wed, July 24, 11:30 a.m. - 1 p.m. CEST at Hall C 4-9 #2710.

thumb_up_off_alt10

chat_bubble_outline4

repeat4

shareShare

David Reber

@davidpreber

9 months ago

🧵 RATE: Score Reward Models with Imperfect Rewrites of Rewrites 1/ How do you measure whether a reward model incentivizes helpfulness without accidentally measuring length, complexity, etc? Rewrites of rewrites give good counterfactuals, without needing to list all confounders!

thumb_up_off_alt15

chat_bubble_outline1

repeat10

shareShare

Victor Veitch 🔸

@victorveitch

2 months ago

Semantics in language is naturally hierarchical, but attempts to interpret LLMs often ignore this. Turns out: baking semantic hierarchy into sparse autoencoders can give big jumps in interpretability and efficiency. Thread + bonus musings on the value of SAEs:

thumb_up_off_alt300

chat_bubble_outline12

repeat64

shareShare