Gokhan Tur (@tur_gokhan) Twitter Tweets • TwiCopy

Emre Can Acikgoz

7 months ago

How can LLMs decide when to rely on tools vs. their internal parametric knowledge? 🤖🚀Excited to share our work SMART! 🧠🛠️ Inspired by human metacognition, we enhance LLM self-awareness to reduce tool overuse while boosting performance. Our experiments show that SMARTAgent

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Oumi

@oumi_pbc

7 months ago

Last week, Emre Can Acikgoz at ConvAI@UIUC @IllinoisCDS released CoALM 🚀 fully open-source Conversational Agentic Language Models with CoALM 8B, CoALM 70B, and CoALM 405B. Excelling in both multi-turn dialogue management & function calling, these models were trained using Oumi

thumb_up_off_alt17

chat_bubble_outline0

repeat7

shareShare

Beyza Bozdag @ NAACL’25

@nbbozdag

6 months ago

[1/6] Can LLMs out-persuade each other? 🤖🧠💬 Introducing Persuade Me If You Can (PMIYC)—a new framework to evaluate (1) how persuasive LLMs are and (2) how easily they can be persuaded! 🚀 📄Arxiv: arxiv.org/abs/2503.01829 🌐Project Page: beyzabozdag.github.io/PMIYC/

thumb_up_off_alt24

chat_bubble_outline2

repeat7

shareShare

dilek hakkani-tur

@dilekhakkanitur

6 months ago

While persuasive models are promising for social good, they can also be misused towards harmful behavior. Recent work by Beyza Bozdag and Shuhaib Mehri aims to assess LLM persuasiveness and susceptibility towards persuasion.

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

Gokhan Tur

@tur_gokhan

6 months ago

Era of conversational embodied agents is here! Check the latest blogpost by Vardhan Dongre ConvAI@UIUC

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Xing Han Lu

@xhluca

6 months ago

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation? To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web

thumb_up_off_alt78

chat_bubble_outline0

repeat34

shareShare

Siva Reddy

@sivareddyg

6 months ago

LLM alignment doesn't transfer to Web Agents. SafeArena is a simple web environment and testbed to test the safety of agents, built on WebArena. A huge team effort that was highly self-driven 💪 safearena.github.io

thumb_up_off_alt44

chat_bubble_outline1

repeat14

shareShare

SIGdial

@sigdial

6 months ago

Our paper submission deadline is approaching fast! ✍️ Abstract deadline: 21st April Paper deadline 28th April Come and join us in beautiful Avignon, France, to discuss discourse and dialogue 🎉 We invite submissions of original research (long papers, short papers, and demos)

thumb_up_off_alt7

chat_bubble_outline1

repeat4

shareShare

Siva Reddy

@sivareddyg

5 months ago

Talking about "DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning" Going live at 11am PDT (i.e., 20 mins). Last minute change of plans. You might be able to see live here: youtube.com/watch?v=aO_cTI…

thumb_up_off_alt46

chat_bubble_outline1

repeat11

shareShare

Sumuk

@sumukx

5 months ago

we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluations work early access link in replies! (1/8)

thumb_up_off_alt293

chat_bubble_outline11

repeat49

shareShare

Gokhan Tur

@tur_gokhan

5 months ago

Congratulations Sumuk! The core of YourBench is the sophisticated question generation approach for a given document. Check the paper for details. Great collaboration with Hugging Face and ConvAI@UIUC teams.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Leandro von Werra

@lvwerra

5 months ago

Try it here: huggingface.co/spaces/yourben… Great work by Sumuk Clémentine Fourrier 🍊 Alina Lozovskaya Gokhan Tur dilek hakkani-tur 🔥

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Gokhan Tur

@tur_gokhan

5 months ago

This is an important milestone for enabling LLM-based agents. Reward is all you need for Tool Learning! GRPO achieves significant improvements over base and SFT models for BFCL v3, API-Bank and Bamboogle agentic benchmark tasks. Congratulations Emre Can Acikgoz and Cheng Qian

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Gokhan Tur

@tur_gokhan

4 months ago

What do we want from "Conversational Agents" on top of language agents? What is missing in the current Conversational Agent systems? Here is our desideratum with a comprehensive survey of the recent advances in the field. Bonus is a live github collection of new papers organized

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Siva Reddy

@sivareddyg

4 months ago

Incredibly proud of my students Ada Tur and Gaurav Kamath for winning a SAC award at #NAACL2025 for their work on assessing how LLMs model constituent shifts. Humans have a tendency to move heavier constituents towards the end of the sentence. While LLMs unsurprisingly show

thumb_up_off_alt65

chat_bubble_outline1

repeat10

shareShare

Mila - Institut québécois d'IA

@mila_quebec

4 months ago

Congratulations to Mila members Ada Tur, Gaurav Kamath and Siva Reddy for their SAC award at #NAACL2025! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670

thumb_up_off_alt24

chat_bubble_outline2

repeat10

shareShare

Emre Can Acikgoz

@emrecanacikgoz

4 months ago

🚀Excited to share our new evaluation paper "TD-Eval: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons"! 🔄 🤖⚔️TOD systems have rapidly evolved thanks to LLMs, but traditional metrics remain insufficient in

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Beyza Bozdag @ NAACL’25

@nbbozdag

4 months ago

Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs 🤖💬 📄Arxiv: arxiv.org/pdf/2505.07775 💻 GitHub: github.com/beyzabozdag/Pe…

thumb_up_off_alt34

chat_bubble_outline1

repeat10

shareShare

Beyza Bozdag @ NAACL’25

@nbbozdag

3 months ago

Would models know more about Indian food in Hindi and Turkey’s history in Turkish? Does the language of a question affect an LLM’s answer? ✨Yes!✨ Ishika Agarwal and I are excited to announce our newest preprint in which we explore “Language Specific Knowledge (LSK)”.

Would models know more about Indian food in Hindi and Turkey’s history in Turkish? Does the language of a question affect an LLM’s answer?

✨Yes!✨

<a href="/wonderingishika/">Ishika Agarwal</a> and I are excited to announce our newest preprint in which we explore “Language Specific Knowledge (LSK)”.

thumb_up_off_alt42

chat_bubble_outline2

repeat4

shareShare