Koustuv Sinha (@koustuvsinha) 's Twitter Profile
Koustuv Sinha

@koustuvsinha

Research Scientist @MetaAI; PhD from @mcgillu + @Mila_Quebec; I organize ML Reproducibility Challenge (@repro_challenge). I work on Interpretable multimodal ML

ID: 42234513

linkhttps://koustuvsinha.com calendar_today24-05-2009 16:12:07

946 Tweet

2,2K Followers

761 Following

The ML Reproducibility Challenge (@repro_challenge) 's Twitter Profile Photo

Less than a week left to #MLRC2025! Here is the agenda for the day. Location - Friend Center @ Princeton University maps.app.goo.gl/Cq39Ju3qNYCMy3…

Less than a week left to #MLRC2025! Here is the agenda for the day. Location - Friend Center @ Princeton University maps.app.goo.gl/Cq39Ju3qNYCMy3…
Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

The Transformer is not quite a Ship of Theseus, but, since 2017, best practices have radically changed for positional encodings, layer norms, attention, residual streams, and MLP components. Architecture work is not dead, and scaling is not all you need to respect scaling laws.

Oscar Mañas @ ICLR (@oscmansan) 's Twitter Profile Photo

I’m happy to share that our paper "Controlling Multimodal LLMs via Reward-guided Decoding" has been accepted to #ICCV2025! 🎉 w/ Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, and Aishwarya Agrawal 🔗 Read more: arxiv.org/abs/2508.11616 🧵 Here's what we did:

I’m happy to share that our paper "Controlling Multimodal LLMs via Reward-guided Decoding" has been accepted to #ICCV2025! 🎉

w/ <a href="/proceduralia/">Pierluca D'Oro</a>, <a href="/koustuvsinha/">Koustuv Sinha</a>, <a href="/adri_romsor/">Adriana Romero-Soriano</a>, <a href="/michal_drozdzal/">Michal Drozdzal</a>, and <a href="/aagrawalAA/">Aishwarya Agrawal</a>

🔗 Read more: arxiv.org/abs/2508.11616

🧵 Here's what we did:
Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate.

Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus.

Links to the
Adina Williams (@adinamwilliams) 's Twitter Profile Photo

Awesome #MLRC2025 talks kicking us off this morning! I'm learning lots The ML Reproducibility Challenge about science with ML and reproducibility for real world applications (Arvind Narayanan), and software/firmware and data concerns for reproducibility (Soumith Chintala) Slides coming soon!

Awesome #MLRC2025 talks kicking us off this morning! I'm learning lots <a href="/repro_challenge/">The ML Reproducibility Challenge</a> about science with ML and reproducibility for real world applications (<a href="/random_walker/">Arvind Narayanan</a>), and software/firmware and data concerns for reproducibility (<a href="/soumithchintala/">Soumith Chintala</a>) Slides coming soon!
Koustuv Sinha (@koustuvsinha) 's Twitter Profile Photo

Its a wrap - the first in-person event for #MLRC2025 successfully concluded yesterday - we witnessed some of the best talks I have ever heard on reproducibility issues in AI, ranging from issues regarding leakage and irreproducibility in ML-based science (Arvind Narayanan),

Its a wrap - the first in-person event for #MLRC2025 successfully concluded yesterday - we witnessed some of the best talks I have ever heard on reproducibility issues in AI, ranging from issues regarding leakage and irreproducibility in ML-based science (<a href="/random_walker/">Arvind Narayanan</a>),
Dan Jurafsky (@jurafsky) 's Twitter Profile Photo

Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/slp3/

Edward Grefenstette (@egrefen) 's Twitter Profile Photo

On the contrary, when history remembers people who stood in the face of unbridled hype constructively and in line with the scientific method, let David be thus-remembered over the perpetual goalpost-moving Garys of the world.

Koustuv Sinha (@koustuvsinha) 's Twitter Profile Photo

Quality > Quantity. I’d love to see efforts to further reduce the size of pretraining data while keeping the downstream evals constant. BabyLM was a step in this direction, but the focus was to design the right architecture. We need a GPT speedrun equivalent of this, where the

Bodhisattwa Majumder (@mbodhisattwa) 's Twitter Profile Photo

So happy to share what we have been working on for the past 2 years Ai2. Data-driven discovery sits at the core of Asta, along with amazing literature discovery tools! You probably have read our works, now see them in action 🛼🛼

Arvind Narayanan (@random_walker) 's Twitter Profile Photo

I’m excited to announce I’ve started a YouTube channel. I plan to publish videos regularly explaining my views on AI and its present and future impacts. My first video asks: What happens if there’s an AI crash? youtube.com/watch?v=VDfyuB… This is my first foray into video (beyond

I’m excited to announce I’ve started a YouTube channel. I plan to publish videos regularly explaining my views on AI and its present and future impacts. 

My first video asks: What happens if there’s an AI crash?
youtube.com/watch?v=VDfyuB…

This is my first foray into video (beyond
Shiwei Liu (@shiwei_liu66) 's Twitter Profile Photo

Happy to share a side project: Diffusion Language Models Know the Answer Before Decoding. Diffusion LMs are often dismissed as slow. But what if they already *know* the answer halfway through? 1. Early Answer Convergence: Our new paper shows that in many cases, they do,

Happy to share a side project: Diffusion Language Models Know the Answer Before Decoding. 

Diffusion LMs are often dismissed as slow. But what if they already *know* the answer halfway through?   

1. Early Answer Convergence:
Our new paper shows that in many cases, they do,
Arian Hosseini (@ariantbd) 's Twitter Profile Photo

LLMs are great at single-shot problems, but in the era of experience, interactive environments are key 🔑 Introducing * Multi-Turn Puzzles (MTP) * , a new benchmark to test multi-turn reasoning and strategizing 🔗 Paper: huggingface.co/papers/2508.10… 🫙Data: huggingface.co/datasets/arian…

Jakob Foerster (@j_foerst) 's Twitter Profile Photo

Super excited about this event! I will give an updated version of my talk on the Simulation Hypothesis - i.e. Machine Learning in the upcoming era of extremely fast computers. How can we do science that stands the test of time when compute capacity is accelerating?