Kevin Feng (@kjfeng_) Twitter Tweets • TwiCopy

Seth Lazar

4 months ago

In 1999, engineers at Georgia Tech proposed the "Aware Home." It was a vision of ubiquitous computing designed to serve its inhabitants, giving them full knowledge and control over their own data. The dream was user empowerment—using data (and ultimately AI) to fill in the gaps

thumb_up_off_alt19

chat_bubble_outline1

repeat3

shareShare

Hussein Mozannar

@hsseinmzannar

3 months ago

Excited to release my first lead project Magentic-UI at Microsoft Research, an OS web agent application designed for efficient human-agent interaction. CUA agents are cool but they're not so useful yet, Magentic-UI helps us study how to get value from them. github.com/microsoft/mage…

thumb_up_off_alt55

chat_bubble_outline1

repeat9

shareShare

sam manning

@sj_manning

3 months ago

"Extending 'GPTs Are GPTs' to Firms" is now out in AEA Papers & Proceedings. Lots of talk about AI's impact on employment this week. One way AI will influence labor demand is through firm-level impacts. We built initial descriptive statistics that can shed light on these

thumb_up_off_alt105

chat_bubble_outline4

repeat28

shareShare

Peiling Jiang

@peilingjiang

3 months ago

AI should not replace browsing, but scale it. #Orca turns the web into your canvas and personal workspace. Work across dozens of pages, delegate to AI agents by your side, and synthesize on the fly. Welcome to 𝗕𝗿𝗼𝘄𝘀𝗶𝗻𝗴 𝗮𝘁 𝗦𝗰𝗮𝗹𝗲 hci.ucsd.edu/orca w/Haijun Xia

thumb_up_off_alt93

chat_bubble_outline2

repeat17

shareShare

Amy Zhang

@amyxzh

3 months ago

Our lab will be starting up work on building tools for governing AI agent autonomy this next year! 👩‍💻

thumb_up_off_alt38

chat_bubble_outline0

repeat2

shareShare

Chaitanya Malaviya

@cmalaviya11

3 months ago

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

thumb_up_off_alt75

chat_bubble_outline1

repeat17

shareShare

Jacqueline He

@jcqln_h

3 months ago

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

thumb_up_off_alt43

chat_bubble_outline1

repeat18

shareShare

Hussein Mozannar

@hsseinmzannar

3 months ago

Curious how to build agents that can control a browser? I just wrote up a full tutorial on how to do it completely from scratch and with Magentic-UI. My goal is to demystify browser-use and CUA agents, it's fun to follow along! Link: husseinmozannar.github.io/#/blog/web_age… Jupyter notebook:

thumb_up_off_alt137

chat_bubble_outline3

repeat34

shareShare

Lujain Ibrahim لجين إبراهيم

@lujainmibrahim

2 months ago

How can we foster meaningful cooperation between the US & China on AI risks? In our #FAccT2025 paper, led by Saad Siddiqui, we review 40+ documents from the 🇺🇸 & 🇨🇳 to identify areas of common ground and provide recommendations for future bilateral dialogues (🔗in next tweet).

How can we foster meaningful cooperation between the US & China on AI risks?

In our #FAccT2025 paper, led by <a href="/Saad97Siddiqui/">Saad Siddiqui</a>, we review 40+ documents from the 🇺🇸 & 🇨🇳 to identify areas of common ground and provide recommendations for future bilateral dialogues (🔗in next tweet).

thumb_up_off_alt34

chat_bubble_outline4

repeat10

shareShare

Alan Chan

@_achan96_

2 months ago

New blog post! AI agents are becoming increasingly capable, but will need new protocols and systems in order to work effectively and safely. Who should build such protocols and systems?

thumb_up_off_alt65

chat_bubble_outline3

repeat15

shareShare

Ai2

@allen_ai

2 months ago

Today we released SciArena, an open evaluation platform where researchers can compare and vote on foundation models for scientific literature tasks. 👇

thumb_up_off_alt95

chat_bubble_outline2

repeat12

shareShare

Aviv Ovadya 🥦

@metaviv

2 months ago

🚨 NEW PAPER: “Democratic AI is Possible: The Democracy Levels Framework Shows How It Might Work” #icml2025 AI is reshaping our world. How should we steer its development? We introduce the Democracy Levels Framework to define concrete milestones toward meaningfully

thumb_up_off_alt61

chat_bubble_outline1

repeat20

shareShare

Kevin Feng

@kjfeng_

2 months ago

Advanced AI is making its way out of labs and into the real world. Whether it actually yields the many benefits we envision will depend on our ability to democratically govern this technology. Check out our #icml2025 position paper for some ideas to achieve this!

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

METR

@metr_evals

2 months ago

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

thumb_up_off_alt5,5K

chat_bubble_outline200

repeat1,1K

shareShare

Séb Krier

@sebkrier

2 months ago

Good paper by Kevin Feng differentiating agency (capacity to act) and autonomy (range of actions an agent can perform w/o user involvment), with five different levels of autonomy. arxiv.org/pdf/2506.12469 See also Iason Gabriel and Atoosa Kasirzadeh's excellent paper unpacking autonomy,

Good paper by <a href="/kjfeng_/">Kevin Feng</a> differentiating agency (capacity to act) and autonomy (range of actions an agent can perform w/o user involvment), with five different levels of autonomy. arxiv.org/pdf/2506.12469

See also <a href="/IasonGabriel/">Iason Gabriel</a> and <a href="/Dr_Atoosa/">Atoosa Kasirzadeh</a>'s excellent paper unpacking autonomy,

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

Kevin Feng

@kjfeng_

2 months ago

Very nice work by METR. For me, this shows 1) HCI methods & working with real humans are indispensable to understand AI’s real-world impact, and 2) uplift studies are key tools for turning capability threshold setting in AI safety from guesswork to an empirical science.

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Quentin Anthony

@quentinanthon15

2 months ago

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

thumb_up_off_alt3,3K

chat_bubble_outline98

repeat423

shareShare

Smitha Milli

@smithamilli

2 months ago

Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵

thumb_up_off_alt316

chat_bubble_outline11

repeat69

shareShare

Arvind Narayanan

@random_walker

2 months ago

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much

thumb_up_off_alt118

chat_bubble_outline8

repeat28

shareShare

Ben Murphy

@benjaminmmurphy

a month ago

some things that immediately stand out: - new export controls on the supply chain for semiconductors, not just the finished product - DARPA doing interpretability work(!) - new categorical exemptions to NEPA for datacenter construction + federal lands for DCs (like Anthropic

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare