P Shravan Nayak (@pshravannayak) 's Twitter Profile
P Shravan Nayak

@pshravannayak

Master's student at Mila & Université de Montréal | Former SDE 2 at Microsoft | Passionate about pushing the boundaries of vision-language understanding 🚀

ID: 944312821529042944

linkhttps://bajuka.github.io/ calendar_today22-12-2017 21:04:55

62 Tweet

108 Followers

80 Following

P Shravan Nayak (@pshravannayak) 's Twitter Profile Photo

It is amazing to see UI-Vision becoming a go-to benchmark for the community. This is exactly what we hoped for when we created the benchmark. So proud of the work we have done!

Spandana Gella (@gspandana) 's Twitter Profile Photo

Internship ServiceNow Research to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here: bit.ly/3V3mmTg

Aniket Didolkar (@aniket_d98) 's Twitter Profile Photo

🚨Reasoning LLMs are e̵f̵f̵e̵c̵t̵i̵v̵e̵ ̵y̵e̵t̵ inefficient! Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and

Aniket Didolkar (@aniket_d98) 's Twitter Profile Photo

Our work (arxiv.org/abs/2509.13237) can be seen as one instantiation of the paradigm proposed by Andrej Karpathy here. The behavior handbook is a repository of problem solving strategies which we show can be reused to get better and more efficient reasoning in the future. 🧵 for more

Siddarth Venkatraman (@siddarthv66) 's Twitter Profile Photo

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

NO verifiers. NO Tools.
Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling.

Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of!
Then we use aggregation-aware RL to push further!! 📈📈
🧵below!