Ehsan Kamalloo (@ehsk0) Twitter Tweets • TwiCopy

Spandana Gella

3 months ago

Internship ServiceNow Research to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here: bit.ly/3V3mmTg

thumb_up_off_alt32

chat_bubble_outline0

repeat28

shareShare

Alexandre L.-Piché

@alexpiche_

2 months ago

Glad to see OpenAI prioritizing abstention responses in their paper! That's a great intro to our TMLR paper in which we developed an iterative self-reflection method for LLM to know when to abstain without ground truth and no additional cost at test time. openreview.net/pdf?id=SvKPfch…

thumb_up_off_alt18

chat_bubble_outline1

repeat11

shareShare

Massimo Caccia

@masscaccia

2 months ago

If you want to get the gist of the paper in 8min, here's the ICML Workshop oral :) slideslive.com/39044475/how-t…

thumb_up_off_alt21

chat_bubble_outline0

repeat8

shareShare

Sai Rajeswar

@rajeswarsai

2 months ago

💡So far, I have been sharing our multimodal AI research at ServiceNow focused on reasoning over pixels. Today, we share a new chapter with an open-source release of our big initiative in the voice and speech domain.🚀 🎧 AU-Harness: Holistic Evaluation of Audio LLM Responses

💡So far, I have been sharing our multimodal AI research at <a href="/ServiceNow/">ServiceNow</a> focused on reasoning over pixels. Today, we share a new chapter with an open-source release of our big initiative in the voice and speech domain.🚀

🎧 AU-Harness: Holistic Evaluation of Audio LLM Responses

thumb_up_off_alt19

chat_bubble_outline1

repeat6

shareShare

Nouha Dziri

@nouhadziri

2 months ago

[New work🥁] Can RL actually teach NEW solutions, or is it just polishing what already the model learnt in pre-training/mid-training/post-training? 🤔 🧵👇 Can models truly be creative with incredibly challenging problems e.g., math, code, etc This has been the big question

thumb_up_off_alt25

chat_bubble_outline1

repeat6

shareShare

ServiceNow Research

@servicenowrsrch

2 months ago

SLAM Labs presents Apriel-1.5-15B-Thinker 🚀 An open-weights multimodal reasoning model that hits frontier-level performance with just a fraction of the compute.

$SLAM Labs presents Apriel-1.5-15B-Thinker 🚀 An open-weights multimodal reasoning model that hits frontier-level performance with just a fraction of the compute.$

thumb_up_off_alt324

chat_bubble_outline14

repeat76

shareShare

ServiceNow Research

@servicenowrsrch

2 months ago

Don’t miss the Expo Talk by Alex Piché & Dzmitry Bahdanau at #COLM2025! 📢 Fast On-Policy RL for Long Sequence Generation 📍Oct 9 | 1–2:30 PM | Room 523A-B They’ll present PipelineRL: ⚡2x faster learning on long-form reasoning (128 H100s) ⚡Fresh, on-policy data

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Nouha Dziri

@nouhadziri

2 months ago

🚀Ever wondered how to make RL work on impossible hard tasks where pass@k = 0%? 🤔 In our new work, we share the RL Grokking Recipe: a training recipe that enables LLMs to solve previously unsolvable coding problems! I will be at #CoLM2025 next week so happy to chat about it!

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat163

shareShare

ServiceNow Research

@servicenowrsrch

2 months ago

🚀 New Research Blog Live! Our latest post is out: Unifying autoregressive & diffusion language models by Nima Fathi, Torsten Scholak, and Pierre-André Noël. 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 and 𝗱𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀 have each driven major advances in generative AI — but

thumb_up_off_alt15

chat_bubble_outline0

repeat7

shareShare

vLLM

@vllm_project

2 months ago

🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what

thumb_up_off_alt474

chat_bubble_outline8

repeat66

shareShare

Alexandre L.-Piché

@alexpiche_

a month ago

Very excited to see vLLM supports Pipeline RL’s in-flight weight updates! It allowed our team to quickly and reliably train Qwen base 7B to reason from scratch! Want to hear more? Join us at our Pipeline RL expo talk at CoLM this Thursday 1PM room 524C.

thumb_up_off_alt25

chat_bubble_outline1

repeat10

shareShare

Alexandre Lacoste

@alex_lacoste_

a month ago

🚨 Call for Interns – ServiceNow AI Research (Montreal) Our Computer-Use Agents team (Frontier AI Research) is recruiting interns for 2026! We work on LLMs and VLMs that can reliably use software and publishing at top venues (NeurIPS, ICML, ICLR) and developing open-source

thumb_up_off_alt167

chat_bubble_outline7

repeat40

shareShare

ServiceNow Research

@servicenowrsrch

a month ago

🎉 It’s CoLM week! The Conference on Language Modeling (CoLM 2025) kicks off tomorrow in Montréal 🇨🇦🍁 Proud that ServiceNow AI Research is a main sponsor — and that our team will present 5 papers on: 📊 Multimodal reasoning 🔄 Unified AR & diffusion models 🔍 Dense retrieval

thumb_up_off_alt8

chat_bubble_outline0

repeat4

shareShare

Torsten Scholak

@tscholak

a month ago

🧠 Call for Interns – ServiceNow AI Research (Montreal) Our Foundation Models Lab is recruiting interns for 2026! We train & optimize LLMs, from diffusion-based generation to state-space hybrids. If you care about efficient LLMs, diffusion or reasoning → this is for you. 🧵👇

thumb_up_off_alt142

chat_bubble_outline5

repeat22

shareShare

🇺🇦 Dzmitry Bahdanau

@dbahdanau

a month ago

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: arxiv.org/pdf/2509.19128… We'll present today at CoLM EXPO, room 524C, 1pm!

thumb_up_off_alt61

chat_bubble_outline2

repeat9

shareShare

Alexandre L.-Piché

@alexpiche_

a month ago

Very excited to be presenting Pipeline RL this afternoon at CoLM. Join us if you are interested in fast on policy RL training for LLMs 🚀

thumb_up_off_alt20

chat_bubble_outline0

repeat8

shareShare

Issam Laradji

@ilaradji

a month ago

🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: lnkd.in/gpRXbb7K 💻 Code: lnkd.in/g4-x5EDc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,

thumb_up_off_alt41

chat_bubble_outline0

repeat18

shareShare

Alexandre Drouin

@alexandredrouin

a month ago

Excited to speak at the AAAI-26 Workshop on Agentic AI Benchmarks & Enterprise Tasks (Jan 26, Singapore) 🇸🇬 As agents are rapidly productized, realistic enterprise benchmarks for capabilities and reliability are essential! Submit: openreview.net/group?id=AAAI.… 🗓️ Oct 29 cc Graham Neubig

thumb_up_off_alt3

chat_bubble_outline0

repeat4

shareShare

Alexandre L.-Piché

@alexpiche_

17 days ago

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk 🇺🇦 Dzmitry Bahdanau and I gave: youtu.be/Z1uEuRKACRs

thumb_up_off_alt141

chat_bubble_outline1

repeat29

shareShare

ServiceNow Research

@servicenowrsrch

15 days ago

ServiceNow AI Research presents PipelineRL — one of the most impactful efficiency tricks in modern RL training. An elegant solution to a noisy, expensive problem. Worth the read 👇

thumb_up_off_alt22

chat_bubble_outline0

repeat10

shareShare