Chandan Singh (@csinva) 's Twitter Profile
Chandan Singh

@csinva

Seeking good explanations with machine learning. Senior researcher @MSFTResearch, PhD from @Berkeley_AI

ID: 959537146414620673

linkhttps://csinva.io/ calendar_today02-02-2018 21:20:57

342 Tweet

1,1K Followers

523 Following

Katie Kang (@katie_kang_) 's Twitter Profile Photo

LLMs excel at fitting finetuning data, but are they learning to reason or just parroting🦜? We found a way to probe a model's learning process to reveal *how* each example is learned. This lets us predict model generalization using only training data, amongst other insights: 🧵

LLMs excel at fitting finetuning data, but are they learning to reason or just parroting🦜?

We found a way to probe a model's learning process to reveal *how* each example is learned. This lets us predict model generalization using only training data, amongst other insights: 🧵
Colin Fraser (@colin_fraser) 's Twitter Profile Photo

I'm really fascinated by this dataset from the AI poetry survey paper. Here's another visualization I just made. Survey respondents were shown one of these 10 poems, and either told that they were authored by AI, human, or not told anything.

I'm really fascinated by this dataset from the AI poetry survey paper. Here's another visualization I just made. Survey respondents were shown one of these 10 poems, and either told that they were authored by AI, human, or not told anything.
elvis (@omarsar0) 's Twitter Profile Photo

LLMs surpass human experts in predicting neuroscience results Scientific discovery is the next big goal for AI. We are seeing a huge number of research studies tackling AI-powered scientific discovery from different angles and for different problems. This new paper published in

LLMs surpass human experts in predicting neuroscience results

Scientific discovery is the next big goal for AI. We are seeing a huge number of research studies tackling AI-powered scientific discovery from different angles and for different problems.

This new paper published in
Chandan Singh (@csinva) 's Twitter Profile Photo

I’ll be at NeurIPS this week presenting our work on interpretable embeddings — drop me a message if you want to chat!

Zhou Xian (@zhou_xian_) 's Twitter Profile Photo

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics

Surya Ganguli (@suryaganguli) 's Twitter Profile Photo

Absolutely. In any hypothesis test between A and B about the working of a complex system, the right answer is invariably none of the above. Systems identification is a much better paradigm for neuroscience discovery; it allows us to efficiently explore huge hypothesis spaces.

Jonas Pfeiffer (@pfeiffjo) 's Twitter Profile Photo

🧠💡 Our LLMs just had a ‘memory augmentation’—now they can deliberate like seasoned thinkers! arxiv.org/abs/2412.17747

🧠💡 Our LLMs just had a ‘memory augmentation’—now they can deliberate like seasoned thinkers!

arxiv.org/abs/2412.17747
Marlene Cohen (@marlenecohen) 's Twitter Profile Photo

New results for a new year! “Linking neural population formatting to function” describes our modern take on an old question: how can we understand the contribution of a brain area to behavior? biorxiv.org/content/10.110… 🧵1/

Frank Hutter (@frankrhutter) 's Twitter Profile Photo

The data science revolution is getting closer. TabPFN v2 is published in Nature: nature.com/articles/s4158… On tabular classification with up to 10k data points & 500 features, in 2.8s TabPFN on average outperforms all other methods, even when tuning them for up to 4 hours🧵1/19

The data science revolution is getting closer. TabPFN v2 is published in Nature: nature.com/articles/s4158… On tabular classification with up to 10k data points & 500 features, in 2.8s TabPFN on average outperforms all other methods, even when tuning them for up to 4 hours🧵1/19
Yi Ma (@yimatweets) 's Twitter Profile Photo

arxiv.org/abs/2502.10385 This is our latest work SimDINO that, again based on coding rate principle, significantly simplifies the popular (but unnecessarily sophisticated) visual self-supervised learning methods DINO and DINOv2. The power of understanding and principles is

Jianwei Yang (@jw2yang4ai) 's Twitter Profile Photo

Thanks for featuring our work! Aran Komatsuzaki. 🔥Today we are thrilled to announce our MSR flagship project Magma! This is a fully open-sourced project. We will roll out all the stuff: code, model and training data through the following days. Check out our full work here:

Berkeley AI Research (@berkeley_ai) 's Twitter Profile Photo

Humans just saw a *new* color—literally outside the known visual spectrum. BAIR faculty and visual computing expert Ren Ng and collaborators made it possible with the Oz Vision System. 🌈👁️ Newly published in Science Advances: science.org/doi/10.1126/sc… popsci.com/health/new-col…

Yufan Zhuang (@yufan_zhuang) 's Twitter Profile Photo

🤯Your LLM just threw away 99.9 % of what it knows. Standard decoding samples one token at a time and discards the rest of the probability mass. Mixture of Inputs (MoI) rescues that lost information, feeding it back for more nuanced expressions. It is a brand new

🤯Your LLM just threw away 99.9 % of what it knows.

Standard decoding samples one token at a time and discards the rest of the probability mass. 

Mixture of Inputs (MoI) rescues that lost information, feeding it back for more nuanced expressions.

It is a brand new
Sahil Verma (@sahil1v) 's Twitter Profile Photo

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

🚨 New Paper! 🚨
Guard models slow, language-specific, and modality-limited?

Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀

arxiv.org/abs/2505.23856
rohit (@rohitarorayyc) 's Twitter Profile Photo

We automated systematic reviews using gpt-4.1 and o3-mini ! Our platform (otto-SR) beat humans at all tasks and conducted 12 years of systematic review research in just two days. We also show how otto-SR can be used in the real world to rapidly update clinical guidelines 🧵

We automated systematic reviews using gpt-4.1 and o3-mini !

Our platform (otto-SR) beat humans at all tasks and conducted 12 years of systematic review research in just two days.

We also show how otto-SR can be used in the real world to rapidly update clinical guidelines 🧵
Shirley Wu (@shirleyyxwu) 's Twitter Profile Photo

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral ICML Conference) transforms LLMs from passive responders into active collaborators.

Even the smartest LLMs can fail at basic multiturn communication

Ask for grocery help → without asking where you live 🤦‍♀️
Ask to write articles → assumes your preferences 🤷🏻‍♀️

⭐️CollabLLM (top 1%; oral <a href="/icmlconf/">ICML Conference</a>) transforms LLMs from passive responders into active collaborators.
hardmaru (@hardmaru) 's Twitter Profile Photo

Reinforcement Learning Teachers of Test Time Scaling In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve! The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve

Reinforcement Learning Teachers of Test Time Scaling

In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve!

The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve