Spandana Gella (@gspandana) 's Twitter Profile
Spandana Gella

@gspandana

Research Scientist & Sr. Manager @ServiceNowRSRCH, Montreal. Previously @AmazonScience, PhD:@EdinburghNLP, Intern:@MetaAI, @MSFTResearch

ID: 15960453

calendar_today23-08-2008 20:05:59

711 Tweet

808 Followers

476 Following

Xiangru (Edward) Jian @ ICLR 2025 (@edwardjian2) 's Twitter Profile Photo

🚀 Our team at ServiceNow ServiceNow Research is gonna present our paper at #ICLR2025: BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks 📍 Thursday, April 24 | 10 a.m.–12:30 p.m. 📍 Hall 3 + Hall 2B, Poster #280 🔗 bigdocs.github.io

🇺🇦 Dzmitry Bahdanau (@dbahdanau) 's Twitter Profile Photo

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: github.com/ServiceNow/Pip… Blog: huggingface.co/blog/ServiceNo…

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference!

Code: github.com/ServiceNow/Pip…
Blog: huggingface.co/blog/ServiceNo…
Xing Han Lu (@xhluca) 's Twitter Profile Photo

⚠️We’re looking for emergency reviewers for the REALM workshop @ ACL 2025 (realm-workshop.github.io) If you work on LLM agents & can review 1–3 papers, we’d love your help! Sign up here: forms.gle/Mah45swKGNrUY5…

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Incredibly proud of my students Ada Tur and Gaurav Kamath for winning a SAC award at #NAACL2025 for their work on assessing how LLMs model constituent shifts. Humans have a tendency to move heavier constituents towards the end of the sentence. While LLMs unsurprisingly show

Mila - Institut québécois d'IA (@mila_quebec) 's Twitter Profile Photo

Congratulations to Mila members Ada Tur, Gaurav Kamath and Siva Reddy for their SAC award at #NAACL2025! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670

Torsten Scholak (@tscholak) 's Twitter Profile Photo

🚨🤯 Today Jensen Huang announced SLAM Lab's newest model on the ServiceNow Events stage: Apriel‑Nemotron‑15B‑Thinker 🚨 A lean, mean reasoning machine punching way above its weight class 👊 Built by SLAM × NVIDIA. Smaller models, bigger impact. 🧵👇

VLMs4All - CVPR 2025 Workshop (@vlms4all) 's Twitter Profile Photo

🚀 Important Update! We're reaching out to collect email IDs of the CulturalVQA and GlobalRG challenge participants for time-sensitive communications, including informing the winning teams. ALL participating teams please fill out the forms below ASAP (ideally within 24 hours). 👇

P Shravan Nayak (@pshravannayak) 's Twitter Profile Photo

🚀 Excited to share that UI-Vision has been accepted at ICML 2025! 🎉 We have also released the UI-Vision grounding datasets. Test your agents on it now! 🚀 🤗 Dataset: huggingface.co/datasets/Servi… #ICML2025 #AI #DatasetRelease #Agents

Perouz Taslakian (@perouzt) 's Twitter Profile Photo

Our team has released the UI-Vision benchmark (accepted at #ICML2025) for testing GUI agent visual grounding and action prediction! 🚀🚀🚀 🤗 Dataset: huggingface.co/datasets/Servi… Special thanks to the students to lead this effort, P Shravan Nayak and Xiangru (Edward) Jian ServiceNow Research

Sai Rajeswar (@rajeswarsai) 's Twitter Profile Photo

Congrats Tianbao Xie and team on this exciting work and release! 🎉 We’re happy to share that Jedi-7B performs on par with UI-Tars-72B agent on our challenging UI-Vision benchmark, with 10x fewer parameters! 👏 Incredible 🤗Dataset: huggingface.co/datasets/Servi… 🌐uivision.github.io

Congrats <a href="/TianbaoX/">Tianbao Xie</a> and team on this exciting work and release! 🎉 We’re happy to share that Jedi-7B performs on par with UI-Tars-72B agent on our challenging UI-Vision benchmark, with 10x fewer parameters! 👏 Incredible
🤗Dataset: huggingface.co/datasets/Servi…
🌐uivision.github.io
Spandana Gella (@gspandana) 's Twitter Profile Photo

Very excited to release StarFlow: a large diverse workflow dataset and open models that can transform sketches into executable workflows using VLMs. Stay tuned for more updates on this space!

AK (@_akhaliq) 's Twitter Profile Photo

Rendering-Aware Reinforcement Learning for Vector Graphics Generation RLRF significantly outperforms supervised fine-tuning, addressing common failure modes and enabling precise, high-quality SVG generation with strong structural understanding and generalization

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

RLRF significantly outperforms supervised fine-tuning, addressing common failure modes and enabling precise, high-quality SVG generation with strong structural understanding and generalization
Juan A. Rodríguez 💫 (@joanrod_ai) 's Twitter Profile Photo

Thanks AK for sharing our work! Excited to present our next generation of SVG models, now using Reinforcement Learning from Rendering Feedback (RLRF). 🧠 We think we cracked SVG generalization with this one. Go read the paper! arxiv.org/abs/2505.20793 More details on

Thanks <a href="/_akhaliq/">AK</a> for sharing our work! Excited to present our next generation of SVG models, now using Reinforcement Learning from Rendering Feedback (RLRF). 

🧠 We think we cracked SVG generalization with this one.

Go read the paper! arxiv.org/abs/2505.20793

More details on
Spandana Gella (@gspandana) 's Twitter Profile Photo

Are you new to ML, or first-time author attending ICML 📢 Submit to New in ML @ ICML 2025! ✅ Feedback from top researchers ✅ Oral presentations + awards ✅ Limited ICML 2025 tickets! newinml.github.io

Spandana Gella (@gspandana) 's Twitter Profile Photo

If you are at #CVPR25 come join us at our workshop in room 104E for exciting line up of talks, posters, and a panel! And some cool merch!!

P Shravan Nayak (@pshravannayak) 's Twitter Profile Photo

A Hindu wedding without a sacred fire? A Chinese banquet with forks? Do text-to-image models meet cultural expectations, both explicitly stated and implicitly assumed? Excited to share our latest paper on evaluating cultural alignment in T2I models 🌐 culturalframes.github.io

A Hindu wedding without a sacred fire? A Chinese banquet with forks?
Do text-to-image models meet cultural expectations, both explicitly stated and implicitly assumed?
Excited to share our latest paper on evaluating cultural alignment in T2I models
🌐 culturalframes.github.io
IVADO (@ivado_qc) 's Twitter Profile Photo

The IVADO #Bootcamp marked the launch of the Thematic Semester on Autonomous #LLM Agents last week at Université de Montréal. Over 4 days, researchers, experts, and #AI enthusiasts gathered for conferences, tutorials, and rich discussions, laying the groundwork for our next two workshops.

The IVADO #Bootcamp marked the launch of the Thematic Semester on Autonomous #LLM Agents last week at <a href="/UMontreal/">Université de Montréal</a>. Over 4 days, researchers, experts, and #AI enthusiasts gathered for conferences, tutorials, and rich discussions, laying the groundwork for our next two workshops.
Massimo Caccia (@masscaccia) 's Twitter Profile Photo

🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙. Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation. The verdict? Even top models only hit ~50% success.

🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙.

Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation.

The verdict? Even top models only hit ~50% success.
Alexandre Lacoste (@alex_lacoste_) 's Twitter Profile Photo

🚨 Is #WorkArena on the verge of being solved? Or did GPT-5 just get trained on it? 🔥While some benchmarks show modest gains, GPT-5 is crushing WorkArena L2🔥 ➡️ 69.4% avg success vs. ~40% for next best🤯 ➡️ Complex tasks, up to 100 steps, 5–20 min for humans

🚨 Is #WorkArena on the verge of being solved? Or did GPT-5 just get trained on it?

🔥While some benchmarks show modest gains, GPT-5 is crushing WorkArena L2🔥
➡️ 69.4% avg success vs. ~40% for next best🤯
➡️ Complex tasks, up to 100 steps, 5–20 min for humans
Ahmed Masry (@ahmed_masry97) 's Twitter Profile Photo

UI-Vision vs GPT-5: Still holding the crown 👑 and far from saturation. GPT-5 has strengths in coding and reasoning, but when it comes to computer-use tasks, it’s still awkward to rely on it alone. And our team's UI-Vision (ICML 2025) remains a key and still unbeaten multimodal

UI-Vision vs GPT-5: Still holding the crown 👑 and far from saturation.

GPT-5 has strengths in coding and reasoning, but when it comes to computer-use tasks, it’s still awkward to rely on it alone. And our team's UI-Vision (ICML 2025) remains a key and still unbeaten multimodal