Hritik Bansal (@hbxnov) 's Twitter Profile
Hritik Bansal

@hbxnov

CS PhD @UCLA | Prev: Bachelors @IITDelhi, Intern @GoogleDeepMind @AmazonScience | Multimodal ML, Language models | Cricket🏏

ID: 998780848613683200

linkhttp://sites.google.com/view/hbansal calendar_today22-05-2018 04:21:25

623 Tweet

1,1K Followers

1,1K Following

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

Join me & @hbxnov at #ICLR2025 for our very purple poster on risks of LLM evals by private companies! 🕒 Today, 10am | 🪧 #219 Beyond Llama drama, LMSYS incorporation & ARC-AGI train/test fiasco, we discuss irreducible biases—even when firms act in good faith. Come say hi! 💜

Join me & @hbxnov at #ICLR2025 for our very purple poster on risks of LLM evals by private companies!

🕒 Today, 10am | 🪧 #219

Beyond Llama drama, LMSYS incorporation & ARC-AGI train/test fiasco, we discuss irreducible biases—even when firms act in good faith. Come say hi! 💜
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

The Leaderboard Illusion - Identifies systematic issues that have resulted in a distorted playing field of Chatbot Arena - Identifies 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release

The Leaderboard Illusion

- Identifies systematic issues that have resulted in a distorted playing field of Chatbot Arena

- Identifies 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release
Mehran Kazemi (@kazemi_sm) 's Twitter Profile Photo

Upon some requests, we now have a BBEH Mini with 460 examples (20 per task) for faster and cheaper experimentation. The set can be downloaded from: github.com/google-deepmin… The results are reported in Table 3 of arxiv.org/pdf/2502.19187

Hritik Bansal (@hbxnov) 's Twitter Profile Photo

📢 Submit your cool ideas as short or long papers to the first workshop on the foundations of long video generation, understanding and evaluation 🚀 ramoscsv.github.io/longvid_founda…

Hritik Bansal (@hbxnov) 's Twitter Profile Photo

Great to see that the latest #GeminiDiffusion release benchmarks on our challenging general-purpose reasoning Big Bench Extra Hard dataset! It is now available on HF 🤗: huggingface.co/datasets/BBEH/… Eval code: github.com/google-deepmin…

Great to see that the latest #GeminiDiffusion release benchmarks on our challenging general-purpose reasoning Big Bench Extra Hard dataset! 

It is now available on HF 🤗: huggingface.co/datasets/BBEH/…
Eval code: github.com/google-deepmin…
Hritik Bansal (@hbxnov) 's Twitter Profile Photo

🧑‍🍳Very excited to present LaViDa, one of the first diffusion language models for multimodal understanding! 🌟Unlike autoregressive LMs, you can control the speed-quality tradeoff, and solve constrained generation problems out of the box 📦 🌟 We also release LaViDa-Reason, a

🧑‍🍳Very excited to present LaViDa, one of the first diffusion language models for multimodal understanding! 

🌟Unlike autoregressive LMs, you can control the speed-quality tradeoff, and solve constrained generation problems out of the box 📦
🌟 We also release LaViDa-Reason, a
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

LaViDa: A Large Diffusion Language Model for Multimodal Understanding "We introduce LaViDa, a family of VLMs built on DMs. We build LaViDa by equipping DMs with a vision encoder and jointly fine-tune the combined parts for multimodal instruction following. " "LaViDa achieves

LaViDa: A Large Diffusion Language Model for Multimodal Understanding

"We introduce LaViDa, a family of VLMs built on DMs. We build LaViDa by equipping DMs with a vision encoder and jointly fine-tune the combined parts for multimodal instruction following. "

"LaViDa achieves
Ryan Marten (@ryanmart3n) 's Twitter Profile Photo

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals.

We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
Tanmay Parekh (@tparekh97) 's Twitter Profile Photo

🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors. We introduce DiCoRe — a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.🧵📷 (1/N)

🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors.  

We introduce DiCoRe — a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.🧵📷 (1/N)
Hritik Bansal (@hbxnov) 's Twitter Profile Photo

🥳 Excited to share that VideoPhy-2 has been awarded 🏆 Best Paper at the World Models Workshop (physical-world-modeling.github.io) #ICML2025! Looking forward to presenting it as a contributed talk at the workshop! 😃 w/ Clark Peng Yonatan Bitton @ CVPR Roman Aditya Grover Kai-Wei Chang

Hritik Bansal (@hbxnov) 's Twitter Profile Photo

Excited to share that I will join Meta FAIR (Seattle 🗻) for my final summer internship w/ Ramakanth! 🧑‍🎓Looking forward to meeting new people, learning new things, and chatting about data, algorithms, and evaluation for LLM/VLM reasoning.