Eric Wong (@riceric22) Twitter Tweets • TwiCopy

Shreya Havaldar

2 years ago

I'm attending SoCal NLP Symposium on Friday to present our work on the lack of cultural awareness in multilingual LMs 🌎 We collaborate w/ psychologists and show LMs don't understand cultural nuances in emotion. This work won #bestpaper at ACL’s WASSA 2026! Paper: aclanthology.org/2023.wassa-1.1…

thumb_up_off_alt20

chat_bubble_outline1

repeat8

shareShare

Shreya Havaldar

@shreyahavaldar

2 years ago

Linguistic styles (like politeness) are highly subjective across languages, and understanding this subjectivity can help us build culturally-adaptable LMs! In our #EMNLP2023 paper, we present a faithful + interpretable framework to compare styles across languages.

thumb_up_off_alt38

chat_bubble_outline2

repeat13

shareShare

Anton Xue

@antonxue

2 years ago

I will be presenting our paper on stability guarantees for feature attributions at NeurIPS Conference on Tuesday (Dec 12) at 10:45 am CST! Poster: Great Hall & Hall B1+B2 (level 1) #1625 arXiv: arxiv.org/abs/2307.05902 blog post: debugml.github.io/multiplicative… (1/3)

thumb_up_off_alt25

chat_bubble_outline1

repeat6

shareShare

Weiqiu You

@youweiqiu

2 years ago

I will be presenting our work on creating faithful groups of features for attribution at XAI_in_Action_Workshop workshop NeurIPS Conference today (Dec 16) at 4:30pm-5:30pm arxiv: arxiv.org/abs/2310.16316 blog: debugml.github.io/sum-of-parts Come to our poster for a chat!

thumb_up_off_alt9

chat_bubble_outline1

repeat4

shareShare

Eric Wong

@riceric22

2 years ago

Jailbreak algorithms have costs that aren't accessible to everyone. That's why we created open-source artifacts to benchmark jailbreak attacks & defenses. Submit artifacts and join the leaderboard at jailbreakbench.github.io, or access artifacts with `pip install jailbreakbench`!

thumb_up_off_alt29

chat_bubble_outline0

repeat4

shareShare

Eric Wong

@riceric22

2 years ago

Why do SAM segments look nice yet perform poorly in downstream tasks? We've been studying evaluation metrics for feature groups that pinpoint the underlying issues. Stop by our #ICLR2024 poster today at 10:45am (#323 Hall B), or Chaehyeon Kim's talk on Friday at 1:15pm (Hall A3).

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Eric Wong

@riceric22

a year ago

How do you train a "Neuro-GPT" pipeline that uses a neural network and API calls to GPT-4 to predict? Use differentiable program summaries of black-box components to learn such Neural Programs, end-to-end and without extra labels! Learn more at debugml.github.io/neural-program…

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

Rajeev Alur

@rajeevalur

a year ago

We are looking for postdocs to collaborate with Penn Medicine faculty for an exciting new project on Trustworthy AI for clinical decision making

thumb_up_off_alt27

chat_bubble_outline1

repeat12

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

a year ago

🚨 We are very excited to release JailbreakBench v1.0! 📄 We have substantially extended the version 0.1 that was on arXiv since March: - More attack artifacts (Prompt template with random search in addition to GCG, PAIR, and JailbreakChat): github.com/JailbreakBench…. - More

thumb_up_off_alt117

chat_bubble_outline3

repeat27

shareShare

Eric Wong

@riceric22

a year ago

Traditional concept vectors used to explain deep representations fail to compose when combined, i.e. 🐤(small) +🦢(white) =🦩(big & colorful)❌ We propose CCE: a method for extracting *composable* concepts, i.e. 🐤(small) +🦢(white) =🕊️(small & white)✅ debugml.github.io/compositional-…

thumb_up_off_alt30

chat_bubble_outline0

repeat7

shareShare

Eric Wong

@riceric22

a year ago

Why can safety rules in LLMs be jailbroken? In LogicBreaks, we study the fundamental mechanism behind rule subversion in LLMs. Our theory explains how one can force LLMs to suppress rules/knowledge and infer absurd facts--and it mirrors real jailbreaks! debugml.github.io/logicbreaks/

thumb_up_off_alt35

chat_bubble_outline0

repeat2

shareShare

AdvMLFrontiers

@advmlfrontiers

a year ago

📢 We're back with a new edition, this year at NeurIPS Conference in Vancouver! Paper deadline is August 30th, we are looking forward to your submissions!

📢 We're back with a new edition, this year at
<a href="/NeurIPSConf/">NeurIPS Conference</a> in Vancouver!

Paper deadline is August 30th, we are looking forward to your submissions!

thumb_up_off_alt15

chat_bubble_outline3

repeat6

shareShare

Adam Stein

@adamlsteinl

a year ago

I’m presenting our work on the compositionality of concept-based interpretability today at 1:30pm at ICML! Come by poster #2313 in Hall C 4-9 to learn more!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Rajeev Alur

@rajeevalur

a year ago

Thank you ARPA-H looking forward to exploring challenges in Trustworthy ML in context of clinical decision making in collaboration with Mayur Naik Eric Wong Qi Long and clinicians in PSOM

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Eric Wong

@riceric22

a year ago

To make explanations understandable to experts, we need features that experts understand. But how to measure this? Introducing FIX: a benchmark for extracting Features Interpretable to eXperts, across domains such as cosmology, surgery, and psychology! brachiolab.github.io/fix/

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare

Adam Stein

@adamlsteinl

6 months ago

🧠 Foundation models are reshaping reasoning. Do we still need specialized neuro-symbolic (NeSy) training, or can clever prompting now suffice? Our new position paper argues the road to generalizable NeSy should be paved with foundation models. 🔗 arxiv.org/abs/2505.24874 (🧵1/9)

thumb_up_off_alt12

chat_bubble_outline1

repeat7

shareShare

Weiqiu You

@youweiqiu

5 months ago

Excited to present our poster "Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Group" at #ICML2025. See you at East Exhibition Hall A-B #E-1208 Thu 17 Jul 11 a.m. - 1:30 p.m. PDT Paper: arxiv.org/abs/2310.16316 Code: github.com/BrachioLab/sop

thumb_up_off_alt21

chat_bubble_outline1

repeat3

shareShare

Adam Stein

@adamlsteinl

a month ago

Announcing our NeurIPS paper: Once Upon an Input: Reasoning via Per-Instance Program Synthesis (PIPS) 📝: arxiv.org/abs/2510.22849 Why do LLMs (and LLM agents) still struggle on hard reasoning problems which should be solvable by writing and executing code? We find that the

thumb_up_off_alt14

chat_bubble_outline2

repeat8

shareShare