Jimin Mun (@jiminmun_) 's Twitter Profile
Jimin Mun

@jiminmun_

phd student @LTIatCMU
she/her

ID: 1268416753803030530

linkhttps://jiminmun.github.io/ calendar_today04-06-2020 05:38:20

51 Tweet

230 Followers

302 Following

Clara Na (@claranahhh) 's Twitter Profile Photo

Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good? You can try out recipes👩‍🍳 iterate on vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅‍♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵

Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?

You can try out recipes👩‍🍳 iterate on vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅‍♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
Jocelyn Shen (@jocelynjshen) 's Twitter Profile Photo

Will be presenting our work next week at #EMNLP2024 in Computational Social Science + Cultural Analytics session 1 (Nov 12)!! Come say hello ☺️🌴

Shuyan Zhou (@shuyanzhxyc) 's Twitter Profile Photo

My lab at Duke has multiple Ph.D. openings! Our mission is to augment human decision-making by advancing the reasoning, comprehension, and autonomy of modern AI systems. I am attending #emnlp2024, happy to chat about PhD applications, LLM agents, evaluation etc etc!

Simran Khanuja (@simi_97k) 's Twitter Profile Photo

Thank you so much EMNLP 2025 for this wonderful recognition! I’m so honored and humbled 💕 Thanks Graham Neubig for your support throughout! We’ve been working on this for 1.5 years and everyone who has spoken with me in the recent past knows how passionately I feel about this

So Yeon (Tiffany) Min on Industry Job Market (@soyeontiffmin) 's Twitter Profile Photo

🚨🚨 Preprint Alert 🚨🚨 🚀🚀 As AI become agents 🤖, how can we reliably delegate tasks to them, if they cannot communicate their limitations😭 or ask for help or test-time compute 🧑‍🚒 when needed? We present our new pre-print **Self-Regulation and Requesting Interventions**

Chan Young Park (@chan_young_park) 's Twitter Profile Photo

⭐️Looking for a PhD Intern⭐️ Join me this summer at MSR to work on personal AI agents! We're developing innovative models to enhance personalized MS Copilot experiences. I'm seeking candidates with strong modeling skills and experience with LLM (multi-)agents/preference learning

Akhila Yerukola (@akhila_yerukola) 's Twitter Profile Photo

Did you know? Gestures to express universal concepts—like wishing for luck—vary WIDELY across cultures? 🤞means luck in US but deeply offensive in Vietnam 🚨 📣We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal cues 📜: arxiv.org/abs/2502.17710

Did you know? Gestures to express universal concepts—like wishing for luck—vary WIDELY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal cues
📜: arxiv.org/abs/2502.17710
Santiago Cortés-Gómez (@sancortes_95) 's Twitter Profile Photo

Throwback to our work on Decision-Aware Uncertainty Quantification!—excited that it will be presented at ICLR 2025! If you missed it, check it out here:[arxiv.org/abs/2410.01767] x.com/sancortes_95/s…

Danny To Eun Kim (@teknology.bsky.social) (@teknologyy) 's Twitter Profile Photo

🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for the TOT complex queries. Curious how TREC TOT track test queries are created? Check out this thread🧵 and our paper📄: arxiv.org/abs/2502.17776

Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. arxiv.org/abs/2503.19877

#NLProc 
New paper on "evaluation-time scaling", a new dimension to leverage test-time compute!

We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens.

arxiv.org/abs/2503.19877
Hyunwoo Kim (@hyunw_kim) 's Twitter Profile Photo

Humans backtrack where we should've made a better decision. How do we do this? We search and simulate alternative paths that might have led to better outcomes. Our🌈RETRO-Search mimics this process, empowering models to achieve SOTA performance AND efficient reasoning in math🌟

Omar Shaikh (@oshaikh13) 's Twitter Profile Photo

Hi! I'm gonna be presenting this at #ICLR2025 during the Thursday poster session (4/24; 3 p.m - 5:30 p.m, Hall 3 + Hall 2B #208). Come by if you want to talk about making ice cream!! (and also human-computer grounding, interacting with LMs, user models, etc.)

Hi! I'm gonna be presenting this at #ICLR2025 during the Thursday poster session (4/24; 3 p.m - 5:30 p.m, Hall 3 + Hall 2B #208).

Come by if you want to talk about making ice cream!! (and also human-computer grounding, interacting with LMs, user models, etc.)
Chan Young Park (@chan_young_park) 's Twitter Profile Photo

🚀 Excited to share our #NAACL2025 paper on Language Model Personalization! arxiv.org/abs/2410.16027 Current RLHF methods often overlook *whose* preferences are being optimized. This can cause conflicting signals and models that mainly cater to the “average” or most dominant users

🚀 Excited to share our #NAACL2025 paper on Language Model Personalization! arxiv.org/abs/2410.16027
Current RLHF methods often overlook *whose* preferences are being optimized. This can cause conflicting signals and models that mainly cater to the “average” or most dominant users
Valentina Pyatkin (@valentina__py) 's Twitter Profile Photo

📢 The SoLaR workshop will be collocated with COLM! Conference on Language Modeling SoLaR is a collaborative forum for researchers working on responsible development, deployment and use of language models. We welcome both technical and sociotechnical submissions, deadline July 5th!

📢 The SoLaR workshop will be collocated with COLM! <a href="/COLM_conf/">Conference on Language Modeling</a> 

SoLaR is a collaborative forum for researchers working on responsible development, deployment and use of language models. 

We welcome both technical and sociotechnical submissions, deadline July 5th!
Myra Cheng (@chengmyra1) 's Twitter Profile Photo

Dear ChatGPT, Am I the Asshole? While Reddit users might say yes, your favorite LLM probably won’t. We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.

Dear ChatGPT, Am I the Asshole?
While Reddit users might say yes, your favorite LLM probably won’t.
We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…