Nishant Balepur (@nishantbalepur) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🧵 1/N) Excited to share our recent work at Scale AI, "Jailbreaking to Jailbreak (J2)".😈 We present a novel LLM-as-red-teamer approach in which a human jailbreaks a refusal-trained LLM to make it willing to jailbreak itself or other LLMs. We refer to this process as

🧵 1/N) Excited to share our recent work at <a href="/scale_AI/">Scale AI</a>, "Jailbreaking to Jailbreak (J2)".😈 We present a novel LLM-as-red-teamer approach in which a human jailbreaks a refusal-trained LLM to make it willing to jailbreak itself or other LLMs. We refer to this process as

thumb_up_off_alt69

chat_bubble_outline5

repeat21

shareShare

Yuzhen Huang @ ICLR 2025

@yuzhenh17

2 months ago

🔍 Are Verifiers Trustworthy in RLVR? Our paper, Pitfalls of Rule- and Model-based Verifiers, exposes the critical flaws in reinforcement learning verification for mathematical reasoning. 🔑 Key findings: 1️⃣ Rule-based verifiers miss correct answers, especially when presented in

thumb_up_off_alt128

chat_bubble_outline3

repeat21

shareShare

Nathan Benaich

@nathanbenaich

2 months ago

frontier ai today

thumb_up_off_alt2,2K

chat_bubble_outline40

repeat205

shareShare

Lucy Li

@lucy3_li

2 months ago

"Tell, Don't Show" was accepted to #ACL2025 Findings! Our simple approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. A possible addition to your CSS/DH research 🛠️ box ✨📚 arxiv.org/abs/2505.23166

thumb_up_off_alt126

chat_bubble_outline4

repeat19

shareShare

Max Spero

@max_spero_

2 months ago

this is pretty egregious

thumb_up_off_alt25

chat_bubble_outline3

repeat9

shareShare

Nishant Balepur

@nishantbalepur

2 months ago

Very well-said as always! There's lots of tricks to make a paper perceived good by reviewers, but that doesn't mean it's actually good science Really interested in seeing how we can measure what "good" means :)

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Yong Zheng-Xin (Yong)

@yong_zhengxin

a month ago

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with Cohere Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇

thumb_up_off_alt59

chat_bubble_outline4

repeat29

shareShare

Ritwik Gupta 🇺🇦

@ritwik_g

a month ago

I'm excited to share that I’ll be joining Univ. of Maryland as an Assistant Professor in Computer Science, where I’ll be launching the Resilient AI and Grounded Sensing Lab. The RAGS Lab will build AI that works in chaotic environments. If you would like to partner, please DM me!

thumb_up_off_alt319

chat_bubble_outline31

repeat19

shareShare

Ai2

@allen_ai

a month ago

RewardBench 2 is here! We took a long time to learn from our first reward model evaluation tool to make one that is substantially harder and more correlated with both downstream RLHF and inference-time scaling.

thumb_up_off_alt136

chat_bubble_outline2

repeat21

shareShare

Jordan Boyd-Graber

@boydgraber

a month ago

Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.

thumb_up_off_alt29

chat_bubble_outline1

repeat7

shareShare

Saumya Malik

@saumyamalik44

a month ago

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!

thumb_up_off_alt218

chat_bubble_outline4

repeat46

shareShare

Chau Minh Pham

@chautmpham

a month ago

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts? 🧟 You get what we call a Frankentext! 💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

thumb_up_off_alt115

chat_bubble_outline4

repeat33

shareShare

Nishant Balepur

@nishantbalepur

a month ago

This is such a clever way to make a benchmark, cool work!

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

Majeed Kazemi

@majeedkazemi

a month ago

Excited to share that I'll be joining the CS department at University of Alberta as an Assistant Professor in January 2026 where I will be affiliated with Amii. I'll be recruiting 2-3 PhD/MSc students and establishing a research lab on AI in Education and Human–AI Interaction.

Excited to share that I'll be joining the CS department at <a href="/UAlberta/">University of Alberta</a> as an Assistant Professor in January 2026 where I will be affiliated with <a href="/AmiiThinks/">Amii</a>.

I'll be recruiting 2-3 PhD/MSc students and establishing a research lab on AI in Education and Human–AI Interaction.

thumb_up_off_alt176

chat_bubble_outline9

repeat13

shareShare

Samaya AI

@samaya_ai

a month ago

Evaluating long-form answers to complex technical questions is very challenging. Existing methods fall short in this setting. At Samaya, we built Criteria-Eval, a checklist-based evaluation that aligns with how domain experts judge answers. 🧵samaya.ai/blog/criteria-… ✍️

thumb_up_off_alt29

chat_bubble_outline1

repeat7

shareShare

Nishant Balepur

@nishantbalepur

a month ago

I'm now a Ph.D. candidate! 🎉🥳 A few weeks ago, I proposed my thesis: "Teaching AI to Answer Questions with Reasoning that Actually Helps You". Thanks to my amazing committee + friends UMD CLIP Lab! 🫶 I won't be back in Maryland for a while, some exciting things coming soon 👀

thumb_up_off_alt181

chat_bubble_outline12

repeat6

shareShare

Lindia Tjuatja

@lltjuatja

a month ago

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

thumb_up_off_alt85

chat_bubble_outline1

repeat18

shareShare

Omar Shaikh

@oshaikh13

a month ago

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

thumb_up_off_alt181

chat_bubble_outline12

repeat57

shareShare

Nishant Balepur

Gate.io

Zifan (Sail) Wang

Yuzhen Huang @ ICLR 2025

Nathan Benaich

Lucy Li

Max Spero

Nishant Balepur

Yong Zheng-Xin (Yong)

Ritwik Gupta 🇺🇦

Ai2

Jordan Boyd-Graber

Saumya Malik

Chau Minh Pham

Nishant Balepur

Majeed Kazemi

Samaya AI

Nishant Balepur

Lindia Tjuatja

Omar Shaikh