Karolina Stanczak (@karstanczak) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Karolina Stanczak

@karstanczak

4 months ago

Excited to be organizing the VLMs4All workshop at #CVPR2025! 🎉 The workshop features fantastic speakers, a short-paper track, and two challenges, including one based on CulturalVQA. Don’t miss it!

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

🚀 Super excited to announce UI-Vision: the largest and most diverse desktop GUI benchmark for evaluating agents in real-world desktop GUIs in offline settings. 📄 Paper: arxiv.org/abs/2503.15661 🌐 Website: uivision.github.io 🧵 Key takeaways 👇

thumb_up_off_alt75

chat_bubble_outline2

repeat30

shareShare

Karolina Stanczak

@karstanczak

3 months ago

Reviewers needed! 📢 The 6th Workshop on Gender Bias in NLP at #ACL2025 (Vienna, Aug 1st) is looking for you! Sign up to review: forms.gle/VkPU4vS4EacEWs… #NLProc

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

VLMs4All - CVPR 2025 Workshop

@vlms4all

3 months ago

🔔 Reminder & Call for #VLMs4All @ #CVPR2025! Help shape the future of culturally aware & geo-diverse VLMs: ⚔️ Challenges: Deadline: Apr 15 🔗sites.google.com/view/vlms4all/… 📄 Papers (4pg): Submit work on benchmarks, methods, metrics! Deadline: Apr 22 🔗sites.google.com/view/vlms4all/… Join us!

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

Xing Han Lu

@xhluca

3 months ago

Curious about the trajectories? Check our Gradio demo on Hugging Face spaces: huggingface.co/spaces/McGill-…

thumb_up_off_alt25

chat_bubble_outline3

repeat4

shareShare

Amirhossein Kazemnejad

@a_kazemnejad

3 months ago

A key reason RL for web agents hasn’t fully taken off is the lack of robust reward models. No matter the algorithm (PPO, GRPO), we can’t reliably do RL without a reward signal. With AgentRewardBench, we introduce the first benchmark aiming to kickstart progress in this space.

thumb_up_off_alt96

chat_bubble_outline2

repeat22

shareShare

Nicholas Meade

@ncmeade

3 months ago

Check out Xing Han Lu new benchmark for evaluating reward models for web tasks! AgentRewardBench has rich human annotations of trajectories from top LLM web agents across realistic web tasks and will greatly help steer the design of future reward models.

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Karolina Stanczak

@karstanczak

3 months ago

Exciting release! AgentRewardBench offers that much-needed closer look at evaluating agent capabilities: automatic vs. human eval. Important findings here, especially on the popular LLM judges. Amazing work by Xing Han Lu & team!

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

AK

@_akhaliq

3 months ago

AgentRewardBench Evaluating Automatic Evaluations of Web Agent Trajectories

thumb_up_off_alt159

chat_bubble_outline3

repeat28

shareShare

Axel Darmouni

@adarmouni

3 months ago

Benchmarking the performance of Models as judges of Agentic Trajectories 📖 Read of the day, season 3, day 30: « AgentRewardBench: Evaluating Automatic Evaluations of Web Trajectories », by Xing Han Lu, Amirhossein Kazemnejad et al from McGill University and Mila - Institut québécois d'IA The core idea of the

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

VLMs4All - CVPR 2025 Workshop

@vlms4all

3 months ago

🚨 Deadline Extension Alert for #VLMs4All! 🚨 We have extended the challenge submission deadline 🛠️ New challenge deadline: Apr 22 Show your stuff in the CulturalVQA and GlobalRG challenges! 👉 sites.google.com/view/vlms4all/… Spread the word and keep those submissions coming! 🌍✨

thumb_up_off_alt7

chat_bubble_outline0

repeat6

shareShare

WebAgentlab

@webagentlab

3 months ago

13/🧵AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories AGENTREWARDBENCH is a benchmark designed to evaluate the effectiveness of Large Language Model judges in assessing web agent performance, revealing that while LLMs show potential, no single model

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

VLMs4All - CVPR 2025 Workshop

@vlms4all

3 months ago

📢 Deadline Extended! The paper submission deadline for #VLMs4All Workshop at CVPR 2025 has been extended to Monday Apr 28! 💡 We encourage submissions that explore multicultural perspectives in VLMs 🔗 openreview.net/group?id=thecv… 📍 Let's shape the future of globally inclusive AI!

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Mila - Institut québécois d'IA

@mila_quebec

3 months ago

Congratulations to Mila members Ada Tur, Gaurav Kamath and Siva Reddy for their SAC award at #NAACL2025! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670

thumb_up_off_alt24

chat_bubble_outline2

repeat10

shareShare

VLMs4All - CVPR 2025 Workshop

@vlms4all

2 months ago

🚀 Important Update! We're reaching out to collect email IDs of the CulturalVQA and GlobalRG challenge participants for time-sensitive communications, including informing the winning teams. ALL participating teams please fill out the forms below ASAP (ideally within 24 hours). 👇

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

VLMs4All - CVPR 2025 Workshop

@vlms4all

a month ago

🗓️ Save the date! It's official: The VLMs4All Workshop at #CVPR2025 will be held on June 12th! Get ready for a full day of speakers, posters, and a panel discussion on making VLMs more geo-diverse and culturally aware 🌐 Check out the schedule below!

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

Ziling Cheng

@ziling_cheng

a month ago

Do LLMs hallucinate randomly? Not quite. Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode — revealing how LLMs generalize using abstract classes + context cues, albeit unreliably. 📎 Paper: arxiv.org/abs/2505.22630 1/n

thumb_up_off_alt34

chat_bubble_outline1

repeat20

shareShare

Aishwarya Agrawal

@aagrawalaa

a month ago

My lab’s contributions at #CVPR2025: -- Organizing VLMs4All - CVPR 2025 Workshop workshop (with 2 challenges) sites.google.com/corp/view/vlms… -- 2 main conference papers (1 highlight, 1 poster) cvpr.thecvf.com/virtual/2025/p… (highlight) cvpr.thecvf.com/virtual/2025/p… (poster) -- 4 workshop papers (2 spotlight talks, 2

My lab’s contributions at #CVPR2025:

-- Organizing <a href="/vlms4all/">VLMs4All - CVPR 2025 Workshop</a> workshop (with 2 challenges)
sites.google.com/corp/view/vlms…

-- 2 main conference papers (1 highlight, 1 poster)
cvpr.thecvf.com/virtual/2025/p… (highlight)
cvpr.thecvf.com/virtual/2025/p… (poster)

-- 4 workshop papers (2 spotlight talks, 2

thumb_up_off_alt66

chat_bubble_outline0

repeat16

shareShare

VLMs4All - CVPR 2025 Workshop

@vlms4all

a month ago

Our VLMs4All workshop is taking place today! 📅 on Thursday, June 12 ⏲️ from 9AM CDT 🏛️in Room 104E Join us today at #CVPR2025 for amazing speakers, posters, and a panel discussion on making VLMs more geo-diverse and culturally aware! #CVPR2025

thumb_up_off_alt6

chat_bubble_outline0

repeat4

shareShare

Xing Han Lu

@xhluca

a month ago

"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

thumb_up_off_alt184

chat_bubble_outline7

repeat52

shareShare

Karolina Stanczak

Gate.io

Karolina Stanczak

P Shravan Nayak

Karolina Stanczak

VLMs4All - CVPR 2025 Workshop

Xing Han Lu

Amirhossein Kazemnejad

Nicholas Meade

Karolina Stanczak

AK

Axel Darmouni

VLMs4All - CVPR 2025 Workshop

WebAgentlab

VLMs4All - CVPR 2025 Workshop

Mila - Institut québécois d'IA

VLMs4All - CVPR 2025 Workshop

VLMs4All - CVPR 2025 Workshop

Ziling Cheng

Aishwarya Agrawal

VLMs4All - CVPR 2025 Workshop

Xing Han Lu