Hao Peng (@haopeng_nlp) 's Twitter Profile
Hao Peng

@haopeng_nlp

Assistant Professor at UIUC CS

ID: 1316236065271758849

calendar_today14-10-2020 04:35:22

35 Tweet

573 Followers

100 Following

Xingyao Wang (@xingyaow_) 's Twitter Profile Photo

Large Language Model (LLM) agents promise to free us from mundane tasks, but how should they best interact with our world? Introducing CodeAct, an agent {framework, instruction-tuning dataset, model}, employs executable Python code to unify the actions of LLM agents. 🧵1/

Large Language Model (LLM) agents promise to free us from mundane tasks, but how should they best interact with our world? Introducing CodeAct, an agent {framework, instruction-tuning dataset, model}, employs executable Python code to unify the actions of LLM agents.
🧵1/
Yao Fu (@francis_yao_) 's Twitter Profile Photo

Frontier models all have at least 100k context length, Gemini 1.5 has even 1m context. What about research and open source? Introducing Long Context Data Engineering, a data driven method achieving the first 128k context open source model matching GPT4-level Needle in a

Frontier models all have at least 100k context length, Gemini 1.5 has even 1m context. What about research and open source? 

Introducing Long Context Data Engineering, a data driven method achieving the first 128k context open source model matching GPT4-level Needle in a
John Yang (@jyangballin) 's Twitter Profile Photo

SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code github.com/princeton-nlp/…

SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source!

We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code
github.com/princeton-nlp/…
Zhaofeng Wu @ ICLR (@zhaofeng_wu) 's Twitter Profile Photo

Want to train an aligned LM in a new language 🌏 but don’t have preference data for training the reward model (RM)? 💡 Just use a RM for another language: it often works well, sometimes even BETTER than if you had a RM in your target language! 🤯 arxiv.org/abs/2404.12318

Want to train an aligned LM in a new language 🌏 but don’t have preference data for training the reward model (RM)?

💡 Just use a RM for another language: it often works well, sometimes even BETTER than if you had a RM in your target language! 🤯 arxiv.org/abs/2404.12318
Yao Fu (@francis_yao_) 's Twitter Profile Photo

From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality

From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality
Yue Guo (@yueguo10) 's Twitter Profile Photo

I'm joining the UIUC University of Illinois this fall as an Assistant Professor in the iSchool, with an affiliation in Computer Science! My research passion lies in the intersection of NLP and the medical domain. I'm recruiting students for 2025! Check more info: yueguo-50.github.io.

Yangyi Chen (on job market) (@yangyichen6666) 's Twitter Profile Photo

🎯 Introducing SOLO, a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder. Paper: arxiv.org/abs/2407.06438 Code: github.com/Yangyi-Chen/SO…

🎯 Introducing SOLO, a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.

Paper: arxiv.org/abs/2407.06438
Code: github.com/Yangyi-Chen/SO…
Hao Peng (@haopeng_nlp) 's Twitter Profile Photo

Language models excel at undergraduate exams, but how do they fare in research? SciCode challenges models with real research coding problems. Even the best models solve less than 5%. Very proud of Minyang Tian and Luyu Gao for leading the charge!

Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

What If LLMs can cite the pre-training source(s) supporting their parametric knowledge? Won't this dramatically improve verifiability and trustworthiness? We aimed to answer this during my internship Ai2 Paper: arxiv.org/abs/2404.01019 To be presented at #COLM Thread👇👇

Bingyi Kang (@bingyikang) 's Twitter Profile Photo

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises

Zhaofeng Wu @ ICLR (@zhaofeng_wu) 's Twitter Profile Photo

💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, & audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986

💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, & audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986
Akari Asai (@akariasai) 's Twitter Profile Photo

🚨 I’m on the job market this year! 🚨 I’m completing my Allen School Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵

🚨 I’m on the job market this year! 🚨
I’m completing my <a href="/uwcse/">Allen School</a> Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵
Ofir Press (@ofirpress) 's Twitter Profile Photo

I'm on the academic job market! I develop autonomous systems for: programming, research-level question answering, finding sec vulnerabilities & other useful+challenging tasks. I do this by building frontier-pushing benchmarks and agents that do well on them. See you at NeurIPS!

I'm on the academic job market! 
I develop autonomous systems for: programming, research-level question answering, finding sec vulnerabilities &amp; other useful+challenging tasks.
I do this by building frontier-pushing benchmarks and agents that do well on them.
See you at NeurIPS!
Lifan Yuan (@lifan__yuan) 's Twitter Profile Photo

Wanna train PRMs but process labels, annotated manually or automatically, sound too expensive to you😖? Introduce Implicit PRM🚀 – Get your model free process rewards by training an ORM on the cheaper response-level data, with a simple parameterization at no additional cost💰!

Wanna train PRMs but process labels, annotated manually or automatically, sound too expensive to you😖? 
Introduce Implicit PRM🚀 – Get your model free process rewards by training an ORM on the cheaper response-level data, with a simple parameterization at no additional cost💰!
Sagnik Mukherjee (@saagnikkk) 's Twitter Profile Photo

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models”

From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮
And this isn’t a one-off. The pattern holds across RL algorithms and models.
🧵A Deep Dive
Shivam Agarwal (@shivamag12) 's Twitter Profile Photo

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮 At inference EM can beat GPT4o Claude 3 opus & Gemini 1.5 pro on challenging scientific coding w/o any data/model update

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮

At inference EM can beat GPT4o Claude 3 opus &amp; Gemini 1.5 pro on challenging scientific coding w/o any data/model update
Ganqu Cui (@charlesfornlp) 's Twitter Profile Photo

So many works talking about entropy, but what is the **mechanism** of entropy in RL for LLMs? 🤔 Our work gives a principled understanding, as well as two tricks that get entropy **controlled** 🧵

So many works talking about entropy, but what is the **mechanism** of entropy in RL for LLMs? 🤔

Our work gives a principled understanding, as well as two tricks that get entropy **controlled** 🧵