P Shravan Nayak (@pshravannayak) Twitter Tweets • TwiCopy

P Shravan Nayak

@pshravannayak

+ Follow

Master's student at Mila & Université de Montréal | Former SDE 2 at Microsoft | Passionate about pushing the boundaries of vision-language understanding 🚀

ID: 944312821529042944

linkhttps://bajuka.github.io/ calendar_today22-12-2017 21:04:55

62 Tweet

108 Followers

80 Following

P Shravan Nayak

@pshravannayak

3 months ago

It is amazing to see UI-Vision becoming a go-to benchmark for the community. This is exactly what we hoped for when we created the benchmark. So proud of the work we have done!

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Internship ServiceNow Research to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here: bit.ly/3V3mmTg

thumb_up_off_alt32

chat_bubble_outline0

repeat28

shareShare

P Shravan Nayak

@pshravannayak

3 months ago

Highly recommend this for those wanting to work on agents. Had a great experience working with Spandana Gella and team.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Aniket Didolkar

@aniket_d98

2 months ago

🚨Reasoning LLMs are e̵f̵f̵e̵c̵t̵i̵v̵e̵ ̵y̵e̵t̵ inefficient! Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and

thumb_up_off_alt206

chat_bubble_outline4

repeat34

shareShare

Massimo Caccia

@masscaccia

2 months ago

See you in San Diego 🚀 #NeurIPS2025

thumb_up_off_alt58

chat_bubble_outline3

repeat10

shareShare

Aniket Didolkar

@aniket_d98

2 months ago

Our work (arxiv.org/abs/2509.13237) can be seen as one instantiation of the paradigm proposed by Andrej Karpathy here. The behavior handbook is a repository of problem solving strategies which we show can be reused to get better and more efficient reasoning in the future. 🧵 for more

thumb_up_off_alt20

chat_bubble_outline2

repeat4

shareShare

Siddarth Venkatraman

@siddarthv66

2 months ago

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

thumb_up_off_alt488

chat_bubble_outline12

repeat55

shareShare