Kevin Feng (@kjfeng_) 's Twitter Profile
Kevin Feng

@kjfeng_

PhD student @hcdeUW @SocFuturesLab; social computing & human-AI interaction; prev @allen_ai @MSFTResearch @PrincetonCITP @PrincetonCS; also kjfeng.me on 🦋

ID: 1007908570543902720

linkhttp://kjfeng.me calendar_today16-06-2018 08:51:44

269 Tweet

542 Followers

420 Following

Seth Lazar (@sethlazar) 's Twitter Profile Photo

In 1999, engineers at Georgia Tech proposed the "Aware Home." It was a vision of ubiquitous computing designed to serve its inhabitants, giving them full knowledge and control over their own data. The dream was user empowerment—using data (and ultimately AI) to fill in the gaps

Hussein Mozannar (@hsseinmzannar) 's Twitter Profile Photo

Excited to release my first lead project Magentic-UI at Microsoft Research, an OS web agent application designed for efficient human-agent interaction. CUA agents are cool but they're not so useful yet, Magentic-UI helps us study how to get value from them. github.com/microsoft/mage…

sam manning (@sj_manning) 's Twitter Profile Photo

"Extending 'GPTs Are GPTs' to Firms" is now out in AEA Papers & Proceedings. Lots of talk about AI's impact on employment this week. One way AI will influence labor demand is through firm-level impacts. We built initial descriptive statistics that can shed light on these

"Extending 'GPTs Are GPTs' to Firms" is now out in AEA Papers & Proceedings.

Lots of talk about AI's impact on employment this week. One way AI will influence labor demand is through firm-level impacts. We built initial descriptive statistics that can shed light on these
Peiling Jiang (@peilingjiang) 's Twitter Profile Photo

AI should not replace browsing, but scale it. #Orca turns the web into your canvas and personal workspace. Work across dozens of pages, delegate to AI agents by your side, and synthesize on the fly. Welcome to 𝗕𝗿𝗼𝘄𝘀𝗶𝗻𝗴 𝗮𝘁 𝗦𝗰𝗮𝗹𝗲 hci.ucsd.edu/orca w/Haijun Xia

Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓
Jacqueline He (@jcqln_h) 's Twitter Profile Photo

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content.

We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.
Hussein Mozannar (@hsseinmzannar) 's Twitter Profile Photo

Curious how to build agents that can control a browser? I just wrote up a full tutorial on how to do it completely from scratch and with Magentic-UI. My goal is to demystify browser-use and CUA agents, it's fun to follow along! Link: husseinmozannar.github.io/#/blog/web_age… Jupyter notebook:

Curious how to build agents that can control a browser? I just wrote up a full tutorial on how to do it completely from scratch and with Magentic-UI. My goal is to demystify browser-use and CUA agents, it's fun to follow along!

Link: husseinmozannar.github.io/#/blog/web_age…

Jupyter notebook:
Lujain Ibrahim لجين إبراهيم (@lujainmibrahim) 's Twitter Profile Photo

How can we foster meaningful cooperation between the US & China on AI risks? In our #FAccT2025 paper, led by Saad Siddiqui, we review 40+ documents from the 🇺🇸 & 🇨🇳 to identify areas of common ground and provide recommendations for future bilateral dialogues (🔗in next tweet).

How can we foster meaningful cooperation between the US & China on AI risks?

In our #FAccT2025 paper, led by <a href="/Saad97Siddiqui/">Saad Siddiqui</a>, we review 40+ documents from the 🇺🇸 &amp; 🇨🇳 to identify areas of common ground and provide recommendations for future bilateral dialogues (🔗in next tweet).
Alan Chan (@_achan96_) 's Twitter Profile Photo

New blog post! AI agents are becoming increasingly capable, but will need new protocols and systems in order to work effectively and safely. Who should build such protocols and systems?

New blog post!

AI agents are becoming increasingly capable, but will need new protocols and systems in order to work effectively and safely.

Who should build such protocols and systems?
Ai2 (@allen_ai) 's Twitter Profile Photo

Today we released SciArena, an open evaluation platform where researchers can compare and vote on foundation models for scientific literature tasks. 👇

Aviv Ovadya 🥦 (@metaviv) 's Twitter Profile Photo

🚨 NEW PAPER: “Democratic AI is Possible: The Democracy Levels Framework Shows How It Might Work” #icml2025 AI is reshaping our world. How should we steer its development? We introduce the Democracy Levels Framework to define concrete milestones toward meaningfully

🚨 NEW PAPER: “Democratic AI is Possible: The Democracy Levels Framework Shows How It Might Work” #icml2025 

AI is reshaping our world. 
How should we steer its development? 

We introduce the Democracy Levels Framework to define concrete milestones toward meaningfully
Kevin Feng (@kjfeng_) 's Twitter Profile Photo

Advanced AI is making its way out of labs and into the real world. Whether it actually yields the many benefits we envision will depend on our ability to democratically govern this technology. Check out our #icml2025 position paper for some ideas to achieve this!

METR (@metr_evals) 's Twitter Profile Photo

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Séb Krier (@sebkrier) 's Twitter Profile Photo

Good paper by Kevin Feng differentiating agency (capacity to act) and autonomy (range of actions an agent can perform w/o user involvment), with five different levels of autonomy. arxiv.org/pdf/2506.12469 See also Iason Gabriel and Atoosa Kasirzadeh's excellent paper unpacking autonomy,

Good paper by <a href="/kjfeng_/">Kevin Feng</a> differentiating agency (capacity to act) and autonomy (range of actions an agent can perform w/o user involvment), with  five different levels of autonomy. arxiv.org/pdf/2506.12469

See also <a href="/IasonGabriel/">Iason Gabriel</a> and <a href="/Dr_Atoosa/">Atoosa Kasirzadeh</a>'s excellent paper unpacking autonomy,
Kevin Feng (@kjfeng_) 's Twitter Profile Photo

Very nice work by METR. For me, this shows 1) HCI methods & working with real humans are indispensable to understand AI’s real-world impact, and 2) uplift studies are key tools for turning capability threshold setting in AI safety from guesswork to an empirical science.

Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown.

I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.
Smitha Milli (@smithamilli) 's Twitter Profile Photo

Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵

Arvind Narayanan (@random_walker) 's Twitter Profile Photo

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much

Ben Murphy (@benjaminmmurphy) 's Twitter Profile Photo

some things that immediately stand out: - new export controls on the supply chain for semiconductors, not just the finished product - DARPA doing interpretability work(!) - new categorical exemptions to NEPA for datacenter construction + federal lands for DCs (like Anthropic