Gokhan Tur (@tur_gokhan) 's Twitter Profile
Gokhan Tur

@tur_gokhan

UIUC Conversational AI

ID: 3117573194

linkhttps://www.linkedin.com/in/gokhan-tur-3294953 calendar_today30-03-2015 15:38:17

84 Tweet

591 Followers

964 Following

Emre Can Acikgoz (@emrecanacikgoz) 's Twitter Profile Photo

How can LLMs decide when to rely on tools vs. their internal parametric knowledge? 🤖🚀Excited to share our work SMART! 🧠🛠️ Inspired by human metacognition, we enhance LLM self-awareness to reduce tool overuse while boosting performance. Our experiments show that SMARTAgent

Oumi (@oumi_pbc) 's Twitter Profile Photo

Last week, Emre Can Acikgoz at ConvAI@UIUC @IllinoisCDS released CoALM 🚀 fully open-source Conversational Agentic Language Models with CoALM 8B, CoALM 70B, and CoALM 405B. Excelling in both multi-turn dialogue management & function calling, these models were trained using Oumi

Beyza Bozdag @ NAACL’25 (@nbbozdag) 's Twitter Profile Photo

[1/6] Can LLMs out-persuade each other? 🤖🧠💬 Introducing Persuade Me If You Can (PMIYC)—a new framework to evaluate (1) how persuasive LLMs are and (2) how easily they can be persuaded! 🚀 📄Arxiv: arxiv.org/abs/2503.01829 🌐Project Page: beyzabozdag.github.io/PMIYC/

[1/6] Can LLMs out-persuade each other? 🤖🧠💬

Introducing Persuade Me If You Can (PMIYC)—a new framework to evaluate (1) how persuasive LLMs are and (2) how easily they can be persuaded! 🚀

📄Arxiv: arxiv.org/abs/2503.01829
🌐Project Page: beyzabozdag.github.io/PMIYC/
dilek hakkani-tur (@dilekhakkanitur) 's Twitter Profile Photo

While persuasive models are promising for social good, they can also be misused towards harmful behavior. Recent work by Beyza Bozdag and Shuhaib Mehri aims to assess LLM persuasiveness and susceptibility towards persuasion.

Xing Han Lu (@xhluca) 's Twitter Profile Photo

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation? To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation?

To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web
Siva Reddy (@sivareddyg) 's Twitter Profile Photo

LLM alignment doesn't transfer to Web Agents. SafeArena is a simple web environment and testbed to test the safety of agents, built on WebArena. A huge team effort that was highly self-driven 💪 safearena.github.io

SIGdial (@sigdial) 's Twitter Profile Photo

Our paper submission deadline is approaching fast! ✍️ Abstract deadline: 21st April Paper deadline 28th April Come and join us in beautiful Avignon, France, to discuss discourse and dialogue 🎉 We invite submissions of original research (long papers, short papers, and demos)

Our paper submission deadline is approaching fast! ✍️

Abstract deadline: 21st April
Paper deadline 28th April

Come and join us in beautiful Avignon, France, to discuss discourse and dialogue 🎉

We invite submissions of original research (long papers, short papers, and demos)
Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Talking about "DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning" Going live at 11am PDT (i.e., 20 mins). Last minute change of plans. You might be able to see live here: youtube.com/watch?v=aO_cTI…

Sumuk (@sumukx) 's Twitter Profile Photo

we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluations work early access link in replies! (1/8)

we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluations work

early access link in replies!

(1/8)
Gokhan Tur (@tur_gokhan) 's Twitter Profile Photo

Congratulations Sumuk! The core of YourBench is the sophisticated question generation approach for a given document. Check the paper for details. Great collaboration with Hugging Face and ConvAI@UIUC teams.

Gokhan Tur (@tur_gokhan) 's Twitter Profile Photo

This is an important milestone for enabling LLM-based agents. Reward is all you need for Tool Learning! GRPO achieves significant improvements over base and SFT models for BFCL v3, API-Bank and Bamboogle agentic benchmark tasks. Congratulations Emre Can Acikgoz and Cheng Qian

Gokhan Tur (@tur_gokhan) 's Twitter Profile Photo

What do we want from "Conversational Agents" on top of language agents? What is missing in the current Conversational Agent systems? Here is our desideratum with a comprehensive survey of the recent advances in the field. Bonus is a live github collection of new papers organized

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Incredibly proud of my students Ada Tur and Gaurav Kamath for winning a SAC award at #NAACL2025 for their work on assessing how LLMs model constituent shifts. Humans have a tendency to move heavier constituents towards the end of the sentence. While LLMs unsurprisingly show

Mila - Institut québécois d'IA (@mila_quebec) 's Twitter Profile Photo

Congratulations to Mila members Ada Tur, Gaurav Kamath and Siva Reddy for their SAC award at #NAACL2025! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670

Emre Can Acikgoz (@emrecanacikgoz) 's Twitter Profile Photo

🚀Excited to share our new evaluation paper "TD-Eval: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons"! 🔄 🤖⚔️TOD systems have rapidly evolved thanks to LLMs, but traditional metrics remain insufficient in

🚀Excited to share our new evaluation paper "TD-Eval: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons"! 🔄

🤖⚔️TOD systems have rapidly evolved thanks to LLMs, but traditional metrics remain insufficient in
Beyza Bozdag @ NAACL’25 (@nbbozdag) 's Twitter Profile Photo

Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs 🤖💬 📄Arxiv: arxiv.org/pdf/2505.07775 💻 GitHub: github.com/beyzabozdag/Pe…

Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs 🤖💬
📄Arxiv: arxiv.org/pdf/2505.07775 
💻 GitHub: github.com/beyzabozdag/Pe…
Beyza Bozdag @ NAACL’25 (@nbbozdag) 's Twitter Profile Photo

Would models know more about Indian food in Hindi and Turkey’s history in Turkish? Does the language of a question affect an LLM’s answer? ✨Yes!✨ Ishika Agarwal and I are excited to announce our newest preprint in which we explore “Language Specific Knowledge (LSK)”.

Would models know more about Indian food in Hindi and Turkey’s history in Turkish? Does the language of a question affect an LLM’s answer?

✨Yes!✨

<a href="/wonderingishika/">Ishika Agarwal</a> and I are excited to announce our newest preprint in which we explore “Language Specific Knowledge (LSK)”.