Eric Wong (@riceric22) 's Twitter Profile
Eric Wong

@riceric22

Assistant professor at University of Pennsylvania. Machine learning, optimization, robustness & interpretability.

profericwong.bsky.social

ID: 53464710

linkhttps://www.cis.upenn.edu/~exwong/ calendar_today03-07-2009 18:48:34

186 Tweet

1,1K Followers

110 Following

Shreya Havaldar (@shreyahavaldar) 's Twitter Profile Photo

I'm attending SoCal NLP Symposium on Friday to present our work on the lack of cultural awareness in multilingual LMs 🌎 We collaborate w/ psychologists and show LMs don't understand cultural nuances in emotion. This work won #bestpaper at ACL’s WASSA 2026! Paper: aclanthology.org/2023.wassa-1.1…

Shreya Havaldar (@shreyahavaldar) 's Twitter Profile Photo

Linguistic styles (like politeness) are highly subjective across languages, and understanding this subjectivity can help us build culturally-adaptable LMs! In our #EMNLP2023 paper, we present a faithful + interpretable framework to compare styles across languages.

Linguistic styles (like politeness) are highly subjective across languages, and understanding this subjectivity can help us build culturally-adaptable LMs!

In our #EMNLP2023 paper, we present a faithful + interpretable framework to compare styles across languages.
Anton Xue (@antonxue) 's Twitter Profile Photo

I will be presenting our paper on stability guarantees for feature attributions at NeurIPS Conference on Tuesday (Dec 12) at 10:45 am CST! Poster: Great Hall & Hall B1+B2 (level 1) #1625 arXiv: arxiv.org/abs/2307.05902 blog post: debugml.github.io/multiplicative… (1/3)

Weiqiu You (@youweiqiu) 's Twitter Profile Photo

I will be presenting our work on creating faithful groups of features for attribution at XAI_in_Action_Workshop workshop NeurIPS Conference today (Dec 16) at 4:30pm-5:30pm arxiv: arxiv.org/abs/2310.16316 blog: debugml.github.io/sum-of-parts Come to our poster for a chat!

Eric Wong (@riceric22) 's Twitter Profile Photo

Jailbreak algorithms have costs that aren't accessible to everyone. That's why we created open-source artifacts to benchmark jailbreak attacks & defenses. Submit artifacts and join the leaderboard at jailbreakbench.github.io, or access artifacts with `pip install jailbreakbench`!

Eric Wong (@riceric22) 's Twitter Profile Photo

Why do SAM segments look nice yet perform poorly in downstream tasks? We've been studying evaluation metrics for feature groups that pinpoint the underlying issues. Stop by our #ICLR2024 poster today at 10:45am (#323 Hall B), or Chaehyeon Kim's talk on Friday at 1:15pm (Hall A3).

Eric Wong (@riceric22) 's Twitter Profile Photo

How do you train a "Neuro-GPT" pipeline that uses a neural network and API calls to GPT-4 to predict? Use differentiable program summaries of black-box components to learn such Neural Programs, end-to-end and without extra labels! Learn more at debugml.github.io/neural-program…

Rajeev Alur (@rajeevalur) 's Twitter Profile Photo

We are looking for postdocs to collaborate with Penn Medicine faculty for an exciting new project on Trustworthy AI for clinical decision making

We are looking for postdocs to collaborate with Penn Medicine faculty for an exciting new project on Trustworthy AI for clinical decision making
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨 We are very excited to release JailbreakBench v1.0! 📄 We have substantially extended the version 0.1 that was on arXiv since March: - More attack artifacts (Prompt template with random search in addition to GCG, PAIR, and JailbreakChat): github.com/JailbreakBench…. - More

🚨 We are very excited to release JailbreakBench v1.0!

📄 We have substantially extended the version 0.1 that was on arXiv since March:
- More attack artifacts (Prompt template with random search in addition to GCG, PAIR, and JailbreakChat): github.com/JailbreakBench….
- More
Eric Wong (@riceric22) 's Twitter Profile Photo

Traditional concept vectors used to explain deep representations fail to compose when combined, i.e. 🐤(small) +🦢(white) =🦩(big & colorful)❌ We propose CCE: a method for extracting *composable* concepts, i.e. 🐤(small) +🦢(white) =🕊️(small & white)✅ debugml.github.io/compositional-…

Eric Wong (@riceric22) 's Twitter Profile Photo

Why can safety rules in LLMs be jailbroken? In LogicBreaks, we study the fundamental mechanism behind rule subversion in LLMs. Our theory explains how one can force LLMs to suppress rules/knowledge and infer absurd facts--and it mirrors real jailbreaks! debugml.github.io/logicbreaks/

Why can safety rules in LLMs be jailbroken?

In LogicBreaks, we study the fundamental mechanism behind rule subversion in LLMs.

Our theory explains how one can force LLMs to suppress rules/knowledge and infer absurd facts--and it mirrors real jailbreaks!

debugml.github.io/logicbreaks/
Adam Stein (@adamlsteinl) 's Twitter Profile Photo

I’m presenting our work on the compositionality of concept-based interpretability today at 1:30pm at ICML! Come by poster #2313 in Hall C 4-9 to learn more!

Rajeev Alur (@rajeevalur) 's Twitter Profile Photo

Thank you ARPA-H looking forward to exploring challenges in Trustworthy ML in context of clinical decision making in collaboration with Mayur Naik Eric Wong Qi Long and clinicians in PSOM

Eric Wong (@riceric22) 's Twitter Profile Photo

To make explanations understandable to experts, we need features that experts understand. But how to measure this? Introducing FIX: a benchmark for extracting Features Interpretable to eXperts, across domains such as cosmology, surgery, and psychology! brachiolab.github.io/fix/

Adam Stein (@adamlsteinl) 's Twitter Profile Photo

🧠 Foundation models are reshaping reasoning. Do we still need specialized neuro-symbolic (NeSy) training, or can clever prompting now suffice? Our new position paper argues the road to generalizable NeSy should be paved with foundation models. 🔗 arxiv.org/abs/2505.24874 (🧵1/9)

🧠 Foundation models are reshaping reasoning. Do we still need specialized neuro-symbolic (NeSy) training, or can clever prompting now suffice?
Our new position paper argues the road to generalizable NeSy should be paved with foundation models. 🔗 arxiv.org/abs/2505.24874
(🧵1/9)
Weiqiu You (@youweiqiu) 's Twitter Profile Photo

Excited to present our poster "Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Group" at #ICML2025. See you at East Exhibition Hall A-B #E-1208 Thu 17 Jul 11 a.m. - 1:30 p.m. PDT Paper: arxiv.org/abs/2310.16316 Code: github.com/BrachioLab/sop

Excited to present our poster "Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Group" at #ICML2025.

See you at East Exhibition Hall A-B #E-1208
Thu 17 Jul 11 a.m. - 1:30 p.m. PDT

Paper: arxiv.org/abs/2310.16316
Code: github.com/BrachioLab/sop
Adam Stein (@adamlsteinl) 's Twitter Profile Photo

Announcing our NeurIPS paper: Once Upon an Input: Reasoning via Per-Instance Program Synthesis (PIPS) 📝: arxiv.org/abs/2510.22849 Why do LLMs (and LLM agents) still struggle on hard reasoning problems which should be solvable by writing and executing code? We find that the

Announcing our NeurIPS paper: Once Upon an Input: Reasoning via Per-Instance Program Synthesis (PIPS)
📝: arxiv.org/abs/2510.22849

Why do LLMs (and LLM agents) still struggle on hard reasoning problems which should be solvable by writing and executing code?

We find that the