Raj Palleti (@rajpalleti314) Twitter Tweets • TwiCopy

Shantanu Sharma

5 months ago

Good read: The Leaderboard Illusion: alphaxiv.org/abs/2504.20879 Big Tech commercially dependent on marketing model performance for revenues putting their best models out on Chatbot Arena is not surprising. I would argue against prohibiting score retraction after submission and

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare

alphaXiv

@askalphaxiv

5 months ago

Are LLM leaderboards no longer trustworthy? cohere's deep dive reveals how Chatbot Arena scores can be gamed 🔍 Providers test 10–27 private models & submit the best 📊 Proprietary models receive 2–3× more data ⚠️ Rankings may reflect overfitting Trending on alphaXiv 📈

Are LLM leaderboards no longer trustworthy?

<a href="/cohere/">cohere</a>'s deep dive reveals how Chatbot Arena scores can be gamed

🔍 Providers test 10–27 private models & submit the best
📊 Proprietary models receive 2–3× more data
⚠️ Rankings may reflect overfitting

Trending on alphaXiv 📈

thumb_up_off_alt53

chat_bubble_outline5

repeat10

shareShare

alphaXiv

@askalphaxiv

5 months ago

🚨Bright week for agents and representation learning, notably including X-Fusion’s remarkable progress in multimodal capabilities 🚀 Check out the top 10 papers for the week👇 - From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review - Reinforcement Learning for

thumb_up_off_alt286

chat_bubble_outline5

repeat53

shareShare

alphaXiv

@askalphaxiv

5 months ago

The most comprehensive survey (800+ papers) on LLM safety is here 👀 It highlights how agents, with tools, memory & env access, dramatically expand the attack surface This paper offers a "Full-Stack" analysis of these critical, often overlooked, risks Trending on alphaXiv 🚀

thumb_up_off_alt298

chat_bubble_outline3

repeat85

shareShare

John Bohannon

@bohannon_bot

5 months ago

Trying out alphaXiv today. Loving how you can ask scientists questions within the context of their papers. The authors of the Absolute Zero paper (self-learning RL) @AndrewZ45732491 Shenzhi Wang🌟 and Qingyun Wu are kindly answering questions at alphaxiv.org/abs/2505.03335…

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

alphaXiv

@askalphaxiv

5 months ago

🚨Dive into Absolute Zero and RM-R1's major breakthrough in reasoning, unveiling another exciting week for AI.🚀 Check out the top 10 papers for the week👇 - Absolute Zero: Reinforced Self-play Reasoning with Zero Data - RM-R1: Reward Modeling as Reasoning - ZeroSearch:

thumb_up_off_alt30

chat_bubble_outline1

repeat7

shareShare

John Bohannon

@bohannon_bot

5 months ago

alphaXiv is now my favorite way to keep up with arXiv. Thank you Raj Palleti et al. for building it!

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

alphaXiv

@askalphaxiv

5 months ago

Can reward models reason? RM-R1 frames reward modeling as a reasoning task—with structured rubrics, long-form evaluations, and verifiable justifications 🧠 +13.8% accuracy on benchmarks & increased interpretability 📊 32B RM-R1 beats GPT-4o & Claude Trending on alphaXiv 📈

thumb_up_off_alt9

chat_bubble_outline1

repeat4

shareShare

alphaXiv

@askalphaxiv

5 months ago

"The S in MCP stands for Security" See this new review of MCP and its security considerations Key takeaways: ➡️ Security threats exist in ALL phases: creation, operation & update ➡️ Highlights risks of unofficial auto-installers & community servers

thumb_up_off_alt46

chat_bubble_outline1

repeat8

shareShare

Percy Liang

@percyliang

4 months ago

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

thumb_up_off_alt939

chat_bubble_outline39

repeat185

shareShare

Zhongpai Gao

@zhongpaigao

4 months ago

I just created Gaussian-Splatting community on alphaXiv for sharing the latest advances in this area. Please join the community for discussions 👏

I just created Gaussian-Splatting community on <a href="/askalphaxiv/">alphaXiv</a> for sharing the latest advances in this area. Please join the community for discussions 👏

thumb_up_off_alt6

chat_bubble_outline2

repeat2

shareShare

Dehan Kong

@dehankong285793

4 months ago

Hello everyone! Exciting news—we've just launched the "Computer Use Agent (CUA)" community on AlphaXiv! 🎉 Webagentlab WebAgentlab is thrilled to co-lead this new AlphaXiv alphaXiv rehaan community. This community is dedicated to academic discussions, engineering

Hello everyone! Exciting news—we've just launched the "Computer Use Agent (CUA)" community on AlphaXiv! 🎉

Webagentlab <a href="/webagentlab/">WebAgentlab</a> is thrilled to co-lead this new AlphaXiv <a href="/askalphaxiv/">alphaXiv</a> <a href="/rehaanahmad171/">rehaan</a> community. This community is dedicated to academic discussions, engineering

thumb_up_off_alt17

chat_bubble_outline2

repeat5

shareShare

alphaXiv

@askalphaxiv

4 months ago

Introducing Claude 4 Sonnet for understanding arXiv papers 🚀 Highlight any section of a paper to ask questions and “@” other papers to quickly add to context and compare results, benchmarks, etc.

thumb_up_off_alt202

chat_bubble_outline5

repeat27

shareShare

alphaXiv

@askalphaxiv

4 months ago

🚨Prompt-to-A* Publication has been achieved 🤖🔬A fully AI-generated research paper has been accepted to the main conference of ACL 2025! Intology's research agent, Zochi, discovered and implemented a state-of-the-art jailbreaking attack on LLMs, which has been accepted to

thumb_up_off_alt91

chat_bubble_outline4

repeat29

shareShare

alphaXiv

@askalphaxiv

4 months ago

This is pretty remarkable – AI systems learning to self-improve We're seeing a wave of research where AI isn't just learning from human feedback, it's starting to figure out how to improve itself using its own internal signals A subtle but profound shift.

thumb_up_off_alt581

chat_bubble_outline15

repeat112

shareShare

alphaXiv

@askalphaxiv

4 months ago

"Can Large Reasoning Models Self-Train?" A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT human labels - just learning from their own consistency. Early results rival models trained on ground-truth answers.

thumb_up_off_alt321

chat_bubble_outline8

repeat68

shareShare

alphaXiv

@askalphaxiv

4 months ago

🚨There’s a new ceiling for efficient reasoning with the rise of Learning to Reason without External Rewards, along with AgriFM pushing the boundaries of AI to even agriculture🚀 Check out the top 10 papers for the week👇 - Paper2Poster: Towards Multimodal Poster Automation

thumb_up_off_alt23

chat_bubble_outline1

repeat4

shareShare

alphaXiv

@askalphaxiv

4 months ago

Turns out you can jailbreak VLMs with memes A team of Korean researchers showed that pairing harmful prompts with everyday memes makes VLMs way more likely to generate dangerous content than text-only attacks across 50K+ examples

thumb_up_off_alt73

chat_bubble_outline6

repeat7

shareShare

alphaXiv

@askalphaxiv

4 months ago

LLMs amaze at what they can do, LLMs amaze at what they can't do The dichotomy is as fascinating as it is frustrating

thumb_up_off_alt79

chat_bubble_outline2

repeat12

shareShare

Intology

@intologyai

4 months ago

Introducing the Automated Research community on alphaXiv, created by Intology This Friday, we are hosting Samuel Schmidgall from DeepMind to discuss his work on automated research agents. Come chat with our team and the community! lu.ma/j9r9vf3m 🧵👇

thumb_up_off_alt11

chat_bubble_outline2

repeat5

shareShare