He He (@hhexiy) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

I'm on the academic job market! I develop autonomous systems for: programming, research-level question answering, finding sec vulnerabilities & other useful+challenging tasks. I do this by building frontier-pushing benchmarks and agents that do well on them. See you at NeurIPS!

thumb_up_off_alt230

chat_bubble_outline9

repeat39

shareShare

Chuanyang Jin

@chuanyang_jin

7 months ago

❓Most reward models are trained using binary judgments—can they effectively capture diverse preferences? 💡Short answer: No, particularly when the training samples are subjective.

thumb_up_off_alt23

chat_bubble_outline3

repeat3

shareShare

Vishakh Padmakumar

@vishakh_pk

7 months ago

Had a lot of fun poking holes at how LLMs capture diverse preferences with Chuanyang Jin, Hannah Rose Kirk and He He 🧐! Not all is lost though, a simple regularizing term can help prevent overfitting to binary judgments. Check out our paper SoLaR @ NeurIPS2024 to find out more 😉

thumb_up_off_alt22

chat_bubble_outline0

repeat7

shareShare

Richard Pang

@yzpang_

7 months ago

🚨🔔Foundational graph search task as testbed: for some distribution, transformers can learn to search (100% acc). We interpreted their algo!! But as graph size ↑, transformers struggle. Scaling up # params does not help; CoT does not help. 1.5 years of learning in 10 pages!

thumb_up_off_alt113

chat_bubble_outline2

repeat27

shareShare

Hannah Rose Kirk

@hannahrosekirk

7 months ago

A real honour and career dream that PRISM has won a NeurIPS Conference best paper award! 🌈 One year ago I was sat in a 13,000+ person audience of NeurIPs '23 having just finished data collection. Safe to say I've gone from feeling #stressed to very #blessed 😁

thumb_up_off_alt413

chat_bubble_outline26

repeat37

shareShare

He He

@hhexiy

7 months ago

Unbelievable. This quote is blatantly false and unnecessary for the argument. And she surely had expected the backlash with the patronizing NOTE. This is racism, not "cultural generalization". NeurIPS Conference

thumb_up_off_alt289

chat_bubble_outline7

repeat21

shareShare

Chenyan Xiong

@xiongchenyan

7 months ago

MIT's graduate student union has an open letter about Rosalind's violation of code of conduct: mitgsu.org/news/open-lett…

thumb_up_off_alt388

chat_bubble_outline12

repeat56

shareShare

Manos Koukoumidis

@koukoumidis

6 months ago

If AI isn’t truly open, it will fail us. We can’t close in a black box our greatest invention yet just so that a few can freely monetize. AI needs its Linux moment, and so we started working towards it. This can only succeed if we all work together! #oumi #opensource

thumb_up_off_alt83

chat_bubble_outline7

repeat32

shareShare

Jane Pan

@janepan_

5 months ago

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ Ryan Shar, Jacob Pfau, Ameet Talwalkar, He He, and Valerie Chen! [1/6]

thumb_up_off_alt51

chat_bubble_outline2

repeat13

shareShare

NYU Center for Data Science

@nyudatascience

4 months ago

CDS is hiring a Clinical Professor of Data Science. Teach ML, programming, and specialized courses in our 60 5th Ave building. Renewable contracts with promotion opportunities. Apply by April 1, 2025. For details, see: apply.interfolio.com/155349 #MachineLearning #ML #AIjobs

thumb_up_off_alt15

chat_bubble_outline0

repeat11

shareShare

Naomi Saphra hiring a lab 🧈🪰

@nsaphra

4 months ago

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a burgeoning supergroup w/ Najoung Kim 🫠 Aaron Mueller. Looking for my first students, so apply and reach out!

thumb_up_off_alt430

chat_bubble_outline46

repeat23

shareShare

Yulin Chen

@yulinchen99

3 months ago

Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below

thumb_up_off_alt387

chat_bubble_outline9

repeat77

shareShare

Jane Pan

@janepan_

3 months ago

Do reasoning models know when their answers are right?🤔 Really excited about this work led by Anqi and Yulin Chen. Check out this thread below!

thumb_up_off_alt66

chat_bubble_outline0

repeat5

shareShare

Yulin Chen

@yulinchen99

3 months ago

We're excited to receive wide attention from the community—thank you for your support! We release code, trained probes, and the generated CoT data👇 github.com/AngelaZZZ-611/… We have labeled answer data on its way. Stay tuned!

thumb_up_off_alt44

chat_bubble_outline1

repeat10

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

3 months ago

I'll present this paper tomorrow (10:00-12:30 am, poster at Hall 3 #300). Let's chat about reward hacking against real humans, not just proxy rewards.

thumb_up_off_alt16

chat_bubble_outline0

repeat7

shareShare

Vishakh Padmakumar

@vishakh_pk

3 months ago

What does it mean for #LLM output to be novel? In work w/ John(Yueh-Han) Chen, Jane Pan, Valerie Chen, He He we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

What does it mean for #LLM output to be novel?
In work w/ <a href="/jcyhc_ai/">John(Yueh-Han) Chen</a>, <a href="/JanePan_/">Jane Pan</a>, <a href="/valeriechen_/">Valerie Chen</a>, <a href="/hhexiy/">He He</a> we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

thumb_up_off_alt82

chat_bubble_outline2

repeat22

shareShare

He He

@hhexiy

a month ago

Automating AI research is bottlenecked by verification speed (running experiments takes time). Our new paper explores whether LLMs can tell which ideas will work before executing them, and they appear to have better research intuition than human researchers.

thumb_up_off_alt114

chat_bubble_outline4

repeat11

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

a month ago

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat153

shareShare

Percy Liang

@percyliang

a month ago

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

thumb_up_off_alt3,3K

chat_bubble_outline31

repeat323

shareShare

He He

@hhexiy

a month ago

Talking to ChatGPT isn’t like talking to a collaborator yet. It doesn’t track what you really want to do—only what you just said. Check out work led by John (Yueh-Han) Chen and @rico_angel that shows how attackers can exploit this, and a simple fix: just look at more context!

thumb_up_off_alt18

chat_bubble_outline0

repeat5

shareShare

He He

Gate.io

Ofir Press

Chuanyang Jin

Vishakh Padmakumar

Richard Pang

Hannah Rose Kirk

He He

Chenyan Xiong

Manos Koukoumidis

Jane Pan

NYU Center for Data Science

Naomi Saphra hiring a lab 🧈🪰

Yulin Chen

Jane Pan

Yulin Chen

Jiaxin Wen @ICLR2025

Vishakh Padmakumar

He He

Jiaxin Wen @ICLR2025

Percy Liang

He He