Tom Hartvigsen (@tom_hartvigsen) Twitter Tweets • TwiCopy

Sasha Luccioni, PhD 🦋🌎✨🤗

a year ago

I keep getting asked about my take on these CO2 estimates for the o3 model by the press and members of the community, so I'll interrupt my vacation to comment 🤓 TL;DR- any kind of estimate is a proxy, and instead of wasting our time and energy, we should demand accountability.

thumb_up_off_alt172

chat_bubble_outline5

repeat42

shareShare

Maarten Sap (he/him)

@maartensap

a year ago

CMU LTI is hosting predoc interns this summer, centered around "Language Technologies for All"! Please apply and circulate! lti.cs.cmu.edu/news-and-event…

thumb_up_off_alt85

chat_bubble_outline4

repeat19

shareShare

Liam McCoy, MD MSc

@liamgmccoy

a year ago

What do we want to know about LLMs in research? Out today in Nature Medicine, we take a stab at developing a comprehensive, living taxonomy of LLM use-cases and reporting.

thumb_up_off_alt21

chat_bubble_outline1

repeat7

shareShare

Hannah Kerner

@hannah_kerner

a year ago

#ICML2025 includes a new track on Application-Driven Machine Learning (innovative ML techniques, problems, and datasets driven by the needs of end-users in real-world)! If this fits your work, consider submitting to ICML (dl: Jan 30) and checking the ADML box ✅ in OpenReview ⬇️

thumb_up_off_alt130

chat_bubble_outline0

repeat36

shareShare

Antonios Mamalakis

@antoniosmamala2

a year ago

Dear Climate and AI community! We are hiring 😀 a postdoc to join UVA Environmental Institute at UVA and work with Chirag Agarwal and myself, on using multimodal AI models and explainable AI to attribute extreme precipitation events! Fascinating stuff! Link below. Please RT!

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Tom Hartvigsen

@tom_hartvigsen

a year ago

Excited to share our work on keeping LLMs up-to-date by composing multiple post-training interventions was accepted to #ICLR2025 ICLR 2026! Great work led by Arinbjörn and Kyle O'Brien!

thumb_up_off_alt38

chat_bubble_outline1

repeat6

shareShare

Emily Alsentzer

@emily_alsentzer

a year ago

Medical licensing exams are convenient LLM benchmarks, but they don’t reflect real-world clinical tasks. With LLMs already in EHRs, we need benchmarks that match real-world needs. Let’s partner with hospitals piloting these tools to develop diverse, task-specific evaluations.

thumb_up_off_alt55

chat_bubble_outline3

repeat13

shareShare

Shan Chen

@shan23chen

10 months ago

More SAE papers coming! We dived deeper, looking into what is the best way to gather the SAE features for downstream classifications and also what are the potential benefits 🧐.

thumb_up_off_alt13

chat_bubble_outline2

repeat1

shareShare

Tom Hartvigsen

@tom_hartvigsen

10 months ago

I'm honored to have received a research award from Capital One to support our work developing models that reason about time series data! Thank you! Many exciting new results in this area coming soon :)

thumb_up_off_alt45

chat_bubble_outline3

repeat2

shareShare

Tom Hartvigsen

@tom_hartvigsen

10 months ago

Excited to share new works on knowledge editing for LLMs! Many recent papers find cases where editing LLMs breaks them quickly, but we find the commonly-studied editing methods are needlessly destructive. With some easy-to-use tweaks, we avoid model degradation for WAY longer!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Akshat Gupta

@akshatgupta57

10 months ago

Our work on knowledge editing got an "Outstanding Paper Award"🏆🏆 at the AAAI KnowFM Workshop!! #AAAI2025 🥳🥳🥳 Congratulations to my amazing co-authors Tom Hartvigsen Ahmed Alaa Gopala Anumanchipalli

Our work on knowledge editing got an "Outstanding Paper Award"🏆🏆 at the <a href="/RealAAAI/">AAAI</a> KnowFM Workshop!! #AAAI2025 🥳🥳🥳

Congratulations to my amazing co-authors <a href="/tom_hartvigsen/">Tom Hartvigsen</a> <a href="/_ahmedmalaa/">Ahmed Alaa</a> <a href="/GopalaSpeech/">Gopala Anumanchipalli</a>

thumb_up_off_alt44

chat_bubble_outline1

repeat6

shareShare

Tom Hartvigsen

@tom_hartvigsen

9 months ago

Really excited to share this work!

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Tom Hartvigsen

@tom_hartvigsen

8 months ago

Super excited to share our new work on keeping LLMs factually up-to-date for long periods of time🎉

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Tom Hartvigsen

@tom_hartvigsen

8 months ago

New #ICLR2025 paper to be presented by Sujay Nagaraj! It's the first method to capture how label noise can change over time in sequential classification tasks. Many cool implications, one being towards better models of labelers' behavior over time, even for non time series tasks🎉

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Priyanshu Kumar

@kpriyanshu256

8 months ago

Need a multilingual safety detector? 🚨Introducing PolyGuard🚨 ⚙️ supports 17 languages ⚙️ generates structured output for prompt safety, response safety, and model refusal 🚀 outperforms existing SOTA open and commercial safety detectors by 5.5% 📜 arxiv.org/abs/2504.04377🧵

thumb_up_off_alt18

chat_bubble_outline1

repeat7

shareShare

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

8 months ago

Introducing TALES - Text Adventure Learning Environment Suite A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve 👨‍💻pip install tale-suite

thumb_up_off_alt65

chat_bubble_outline2

repeat19

shareShare

Tom Hartvigsen

@tom_hartvigsen

7 months ago

Excited we have some papers accepted to ICML Conference in collaborations with some tremendous folks 🎉 Looking forward to Vancouver to discuss model editing for LLMs/VLMs and improving medical benchmarking!

Excited we have some papers accepted to <a href="/icmlconf/">ICML Conference</a> in collaborations with some tremendous folks 🎉

Looking forward to Vancouver to discuss model editing for LLMs/VLMs and improving medical benchmarking!

thumb_up_off_alt46

chat_bubble_outline1

repeat6

shareShare

Explainable Machine Learning

@explainableml

7 months ago

🚨Happy to announce that one paper, "Understanding the Limits of Lifelong Knowledge Editing in LLMs", is accepted at #icml2025 ! Congrats to the wonderful authors Lukas Thede , Karsten Roth , Matthias Bethge ,Zeynep Akata , and Tom Hartvigsen. 👇 Highlights in the thread

thumb_up_off_alt23

chat_bubble_outline3

repeat4

shareShare

Shan Chen

@shan23chen

7 months ago

Designing a hard but useful benchmark has always been a passion of mine. Here we present MedBrowseComp, a deep research + computer use benchmark that is easy to verify (like BrowseComp from OpenAI) but still very expandable 💊! Project page: moreirap12.github.io/mbc-browse-app/ 1/n

thumb_up_off_alt89

chat_bubble_outline2

repeat28

shareShare

Akshat Gupta

@akshatgupta57

7 months ago

Just did a major revision to our paper on Lifelong Knowledge Editing!🔍 Key takeaway (+ our new title) - "Lifelong Knowledge Editing requires Better Regularization" Fixing this leads to consistent downstream performance! Tom Hartvigsen Ahmed Alaa Gopala Anumanchipalli Berkeley AI Research

thumb_up_off_alt23

chat_bubble_outline1

repeat6

shareShare