Hritik Bansal (@hbxnov) Twitter Tweets • TwiCopy

Hritik Bansal

@hbxnov

+ Follow

CS PhD @UCLA | Prev: Bachelors @IITDelhi, Intern @GoogleDeepMind @AmazonScience | Multimodal ML, Language models | Cricket🏏

ID: 998780848613683200

linkhttp://sites.google.com/view/hbansal calendar_today22-05-2018 04:21:25

623 Tweet

1,1K Followers

1,1K Following

Pratyush Maini

@pratyushmaini

6 months ago

Join me & @hbxnov at #ICLR2025 for our very purple poster on risks of LLM evals by private companies! 🕒 Today, 10am | 🪧 #219 Beyond Llama drama, LMSYS incorporation & ARC-AGI train/test fiasco, we discuss irreducible biases—even when firms act in good faith. Come say hi! 💜

thumb_up_off_alt34

chat_bubble_outline2

repeat4

shareShare

Aran Komatsuzaki

@arankomatsuzaki

6 months ago

The Leaderboard Illusion - Identifies systematic issues that have resulted in a distorted playing field of Chatbot Arena - Identifies 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release

thumb_up_off_alt502

chat_bubble_outline16

repeat79

shareShare

Mehran Kazemi

@kazemi_sm

6 months ago

Upon some requests, we now have a BBEH Mini with 460 examples (20 per task) for faster and cheaper experimentation. The set can be downloaded from: github.com/google-deepmin… The results are reported in Table 3 of arxiv.org/pdf/2502.19187

thumb_up_off_alt31

chat_bubble_outline0

repeat7

shareShare

Hritik Bansal

@hbxnov

6 months ago

📢 Submit your cool ideas as short or long papers to the first workshop on the foundations of long video generation, understanding and evaluation 🚀 ramoscsv.github.io/longvid_founda…

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Hritik Bansal

@hbxnov

5 months ago

Great to see that the latest #GeminiDiffusion release benchmarks on our challenging general-purpose reasoning Big Bench Extra Hard dataset! It is now available on HF 🤗: huggingface.co/datasets/BBEH/… Eval code: github.com/google-deepmin…

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Hritik Bansal

@hbxnov

5 months ago

🧑‍🍳Very excited to present LaViDa, one of the first diffusion language models for multimodal understanding! 🌟Unlike autoregressive LMs, you can control the speed-quality tradeoff, and solve constrained generation problems out of the box 📦 🌟 We also release LaViDa-Reason, a

thumb_up_off_alt99

chat_bubble_outline0

repeat29

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

5 months ago

LaViDa: A Large Diffusion Language Model for Multimodal Understanding "We introduce LaViDa, a family of VLMs built on DMs. We build LaViDa by equipping DMs with a vision encoder and jointly fine-tune the combined parts for multimodal instruction following. " "LaViDa achieves

thumb_up_off_alt245

chat_bubble_outline5

repeat46

shareShare

Ryan Marten

@ryanmart3n

5 months ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

thumb_up_off_alt880

chat_bubble_outline27

repeat181

shareShare

Tanmay Parekh

@tparekh97

5 months ago

🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constraints, causing precision and recall errors. We introduce DiCoRe — a lightweight 3-stage Divergent-Convergent reasoning framework to fix this.🧵📷 (1/N)

thumb_up_off_alt46

chat_bubble_outline1

repeat18

shareShare

Hritik Bansal

@hbxnov

5 months ago

🥳 Excited to share that VideoPhy-2 has been awarded 🏆 Best Paper at the World Models Workshop (physical-world-modeling.github.io) #ICML2025! Looking forward to presenting it as a contributed talk at the workshop! 😃 w/ Clark Peng Yonatan Bitton @ CVPR Roman Aditya Grover Kai-Wei Chang

thumb_up_off_alt42

chat_bubble_outline9

repeat3

shareShare

Hritik Bansal

@hbxnov

4 months ago

Excited to share that I will join Meta FAIR (Seattle 🗻) for my final summer internship w/ Ramakanth! 🧑‍🎓Looking forward to meeting new people, learning new things, and chatting about data, algorithms, and evaluation for LLM/VLM reasoning.

thumb_up_off_alt95

chat_bubble_outline0

repeat2

shareShare