Akhila Yerukola (@akhila_yerukola) 's Twitter Profile
Akhila Yerukola

@akhila_yerukola

PhD student @LTIatCMU; Prev: Senior Research Engineer @Samsung_RA @samsungresearch; Masters @stanfordnlp | she/her

ID: 2339062536

linkhttps://akhila-yerukola.github.io/ calendar_today11-02-2014 20:46:49

330 Tweet

466 Followers

780 Following

Jocelyn Shen (@jocelynjshen) 's Twitter Profile Photo

Excited to share our #HRI2025 paper “Social Robots as Social Proxies for Fostering Connection and Empathy Towards Humanity” 🧵(1/6) 📚Preprint: arxiv.org/abs/2502.00221

Excited to share our #HRI2025 paper “Social Robots as Social Proxies for Fostering Connection and Empathy Towards Humanity” 🧵(1/6)

📚Preprint: arxiv.org/abs/2502.00221
Hao Zhu 朱昊 (@_hao_zhu) 's Twitter Profile Photo

❓Can LLM agents generate personalized persuasive language than human experts while staying truthful? 🏡 We conducted an experiment with human home buyers and the answer is YES! Learning from Zillow real estate listings, our AI Realtor wins over human experts (Elo 1315 v 947)

❓Can LLM agents generate personalized persuasive language than human experts while staying truthful?

🏡 We conducted an experiment with human home buyers and the answer is YES! 

Learning from Zillow real estate listings, our AI Realtor wins over human experts (Elo 1315 v 947)
Danny To Eun Kim (@teknology.bsky.social) (@teknologyy) 's Twitter Profile Photo

🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for the TOT complex queries. Curious how TREC TOT track test queries are created? Check out this thread🧵 and our paper📄: arxiv.org/abs/2502.17776

Joel Mire (@joel_mire) 's Twitter Profile Photo

Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔 Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉 arxiv.org/abs/2502.12858 (1/10)

Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔

Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉

arxiv.org/abs/2502.12858 (1/10)
Yiqing Xie (@yiqingxienlp) 's Twitter Profile Photo

How to construct repo-level coding environments in a scalable way? Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing (repost-code-gen.github.io) Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)

How to construct repo-level coding environments in a scalable way?

Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing (repost-code-gen.github.io)

Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)
Akhila Yerukola (@akhila_yerukola) 's Twitter Profile Photo

These days RAG systems have gotten popular for boosting LLMs—but they're brittle💔. Minor shifts in phrasing (✍️ style, politeness, typos) can wreck the pipeline. Even advanced components don’t fix the issue. Check out this extensive eval by Neel Bhandari and Tianyu (Tiya) Cao!

Akari Asai (@akariasai) 's Twitter Profile Photo

Real user queries often look different from the clean, concise ones in academic benchmarks - ambiguity, full of typos, and much less readable. We show that even strong RAG systems quickly break under these conditions. Awesome project led by Neel Bhandari and Tianyu (Tiya) Cao!!

Akhila Yerukola (@akhila_yerukola) 's Twitter Profile Photo

Check out PolyGuard 🤛 Our state-of-the-art safety moderation tool—now supporting 17 languages! Open source and built to make online spaces safer for everyone 🤩

Devansh Jain (@devanshrjain) 's Twitter Profile Photo

Excited to share PolyGuard 🛡️, our new state-of-the-art multilingual safety detector. PolyGuard supports 17 languages and outperforms all open-source and commercial moderation tools!

Xuhui Zhou (@nlpxuhui) 's Twitter Profile Photo

When you interact with ChatGPT, have you wondered if they would ever "lie" to you? We found that in scenarios where truthfulness conflicts with achieving goals, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals all models tested were truthful less than

When you interact with ChatGPT, have you wondered if they would ever "lie" to you? We found that in scenarios where truthfulness conflicts with achieving goals, LLMs often choose deception. Our new #NAACL2025  paper, "AI-LIEDAR ," reveals all models tested were truthful less than
Valentina Pyatkin (@valentina__py) 's Twitter Profile Photo

📢 The SoLaR workshop will be collocated with COLM! Conference on Language Modeling SoLaR is a collaborative forum for researchers working on responsible development, deployment and use of language models. We welcome both technical and sociotechnical submissions, deadline July 5th!

📢 The SoLaR workshop will be collocated with COLM! <a href="/COLM_conf/">Conference on Language Modeling</a> 

SoLaR is a collaborative forum for researchers working on responsible development, deployment and use of language models. 

We welcome both technical and sociotechnical submissions, deadline July 5th!
Haoyi Qiu (@haoyiqiu) 's Twitter Profile Photo

🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. ⚖️ Evaluation via CROSS-EVAL 🧨 Safety

🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark.

We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries &amp; 14 languages, revealing how LVLMs violate cultural norms in context.

⚖️ Evaluation via CROSS-EVAL
🧨 Safety
Sudharshan Suresh (@suddhus) 's Twitter Profile Photo

I'm a featured interview in our latest behind-the-scenes release! We break down the ML and perception that drives the whole-body manipulation behaviors from last year. It starts with a neat demo of Atlas's range-of-motion and our vision foundation models. youtu.be/oe1dke3Cf7I?si…

ACL 2025 (@aclmeeting) 's Twitter Profile Photo

✨ Unlock the power of synthetic data! Explore "Synthetic Data in the Era of LLMs" at #ACL2025NLP. This tutorial will build a shared understanding of recent progress, major methods, applications, and open problems in synthetic data generation for NLP. 2025.aclweb.org/program/tutori…

Devansh Jain (@devanshrjain) 's Twitter Profile Photo

Yong Zheng-Xin (Yong) Cohere Labs Super interesting work! We address these gaps by releasing multilingual safety models (PolyGuard) along with an evaluation benchmark (PolyGuardPrompts) and large-scale training dataset (PolyGuardMix): x.com/kpriyanshu256/…