Shivam Chandhok (@shivamchandhok2) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🔁 What if you could bootstrap a world model (state1 × action → state2) using a much easier-to-train dynamics model (state1 × state2 → action) in a generalist VLM? 💡 We show how a dynamics model can generate synthetic trajectories & serve for inference-time verification 🧵👇

thumb_up_off_alt27

chat_bubble_outline1

repeat11

shareShare

Yuki

@y_m_asano

6 days ago

Today we release Franca, a new vision Foundation Model that matches and sometimes outperforms DINOv2. The data, the training code and the model weights (with intermediate checkpoints) are open-source, allowing everyone to build on this. Methodologically, we introduce two new

thumb_up_off_alt169

chat_bubble_outline3

repeat23

shareShare

ℏεsam

@hesamation

6 days ago

the legendary Daniel Han just made a full 3-hour workshop on reinforcement learning and agents. he goes through RL fundamentals, kernels, quantization, and RL+Agents covering both theory and code. great video to get up to speed on these topics.

the legendary <a href="/danielhanchen/">Daniel Han</a> just made a full 3-hour workshop on reinforcement learning and agents.

he goes through RL fundamentals, kernels, quantization, and RL+Agents covering both theory and code.

great video to get up to speed on these topics.

thumb_up_off_alt1,1K

chat_bubble_outline3

repeat180

shareShare

Benno Krojer

@benno_krojer

6 days ago

Love to see this I am always hoping for papers that show that text-only understanding is influenced by being physically grounded (images, videos, interaction) It was a big hope of people years ago with few positive findings, glad it is still explored!

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

6 days ago

Latent Denoising Makes Good Visual Tokenizers "we introduce the Latent Denoising Tokenizer (l-DeTok), a simple yet effective tokenizer trained to reconstruct clean images from latent embeddings corrupted by interpolative noise and random masking. Extensive experiments on

thumb_up_off_alt169

chat_bubble_outline2

repeat25

shareShare

Simone Scardapane

@s_scardapane

5 days ago

*Emergence and Evolution of Interpretable Concepts in Diffusion Models* by Berk Tınaz Zalan Fabian Mahdi Soltanolkotabi SAEs trained on cross-attention layers of StableDiffusion are (surprisingly) good and can be used to intervene on the generation. arxiv.org/abs/2504.15473

*Emergence and Evolution of Interpretable Concepts in Diffusion Models*
by <a href="/berk_tinaz/">Berk Tınaz</a> <a href="/zalan_fabian/">Zalan Fabian</a> <a href="/mahdisoltanol/">Mahdi Soltanolkotabi</a>

SAEs trained on cross-attention layers of StableDiffusion are (surprisingly) good and can be used to intervene on the generation.

arxiv.org/abs/2504.15473

thumb_up_off_alt302

chat_bubble_outline1

repeat40

shareShare

Deep-ML

@real_deep_ml

5 days ago

Great new video showing how to implement Adam optimization from scratch! Thanks for the great content Yacine Mahdid

Great new video showing how to implement Adam optimization from scratch! Thanks for the great content <a href="/yacinelearning/">Yacine Mahdid</a>

thumb_up_off_alt506

chat_bubble_outline7

repeat42

shareShare

Yacine Mahdid

@yacinelearning

5 days ago

15 min tutorial on the adam optimizer by the end of it you will understand what is up with the formula 100% you'll see it's not that complicated™️

thumb_up_off_alt713

chat_bubble_outline10

repeat44

shareShare

You Jiacheng

@youjiacheng

5 days ago

very impactful work, it tells us the cross-over DOES exist. big for our community.

thumb_up_off_alt32

chat_bubble_outline2

repeat5

shareShare

Fu-En (Fred) Yang

@fuenyang1

5 days ago

🤖 How can we teach embodied agents to think before they act? 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framework with an MLLM for complex, slow reasoning and an action expert for fast, grounded execution. Slow think, fast act. 🧠⚡🤲

thumb_up_off_alt91

chat_bubble_outline3

repeat24

shareShare

Simone Scardapane

@s_scardapane

4 days ago

*I-Con: A Unifying Framework for Representation Learning* by Shaden Mark Hamilton et al. They show that many losses (contrastive, supervised, clustering, ...) can be derived from a single loss defined in terms of neighbors distributions. arxiv.org/abs/2504.16929

*I-Con: A Unifying Framework for Representation Learning*
by <a href="/Sa_9810/">Shaden</a> <a href="/mhamilton723/">Mark Hamilton</a> et al.

They show that many losses (contrastive, supervised, clustering, ...) can be derived from a single loss defined in terms of neighbors distributions.

arxiv.org/abs/2504.16929

thumb_up_off_alt260

chat_bubble_outline2

repeat45

shareShare

Micah Goldblum

@micahgoldblum

4 days ago

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

thumb_up_off_alt116

chat_bubble_outline1

repeat24

shareShare

Min-Hung (Steve) Chen

@cmhungsteven

4 days ago

Thanks AK for sharing our latest VLA Reasoning work! Please see more details here: x.com/FuEnYang1/stat… NVIDIA AI Developer NVIDIA AI NVIDIA #NVIDIA #NVIDIAResearch #VLA #reasoning #RL

thumb_up_off_alt66

chat_bubble_outline0

repeat16

shareShare

Mehul Damani @ ICLR

@mehuldamani2

4 days ago

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --

thumb_up_off_alt892

chat_bubble_outline11

repeat286

shareShare

Salesforce AI Research

@sfresearch

4 days ago

💡 Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models 💡 📄 Paper: bit.ly/44IAvuO 💻 Code: bit.ly/4lLjQgd 😵‍💫 Have a task but experiencing prompt engineering existential dread? Few-shot or zero-shot? Chain-of-thought or ReAct?

thumb_up_off_alt134

chat_bubble_outline6

repeat37

shareShare

⚡JNS⚡ 𝕩 🪖

@_devjns

4 days ago

i saw this on YouTube today. you need to check it out. absolute banger 🔥

thumb_up_off_alt10,10K

chat_bubble_outline23

repeat834

shareShare

Ruihan Yang

@rchalyang

4 days ago

How can we leverage diverse human videos to improve robot manipulation? Excited to introduce EgoVLA — a Vision-Language-Action model trained on egocentric human videos by explicitly modeling wrist & hand motion. We build a shared action space between humans and robots, enabling

thumb_up_off_alt478

chat_bubble_outline6

repeat73

shareShare

Denny Zhou

@denny_zhou

3 days ago

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

thumb_up_off_alt2,2K

chat_bubble_outline22

repeat322

shareShare

☆

@realonx1

2 days ago

thumb_up_off_alt2,2K

chat_bubble_outline9

repeat202

shareShare

Duy Nguyen

@duynguyen772

2 days ago

🚀 We introduce GrAInS, a gradient-based attribution method for inference-time steering (of both LLMs & VLMs). ✅ Works for both LLMs (+13.2% on TruthfulQA) & VLMs (+8.1% win rate on SPA-VL). ✅ Preserves core abilities (<1% drop on MMLU/MMMU). LLMs & VLMs often fail because

thumb_up_off_alt47

chat_bubble_outline2

repeat24

shareShare

Shivam Chandhok

Gate.io

Yifu Qiu

Yuki

ℏεsam

Benno Krojer

Tanishq Mathew Abraham, Ph.D.

Simone Scardapane

Deep-ML

Yacine Mahdid

You Jiacheng

Fu-En (Fred) Yang

Simone Scardapane

Micah Goldblum

Min-Hung (Steve) Chen

Mehul Damani @ ICLR

Salesforce AI Research

⚡JNS⚡ 𝕩 🪖

Ruihan Yang

Denny Zhou

☆

Duy Nguyen