Tanveer Hannan (@hannan_tanveer) 's Twitter Profile
Tanveer Hannan

@hannan_tanveer

PhD at LMU Munich, Multimodal Deep Learning on Computer Vision and NLP

ID: 2241926006

linkhttps://tanveer81.github.io/ calendar_today12-12-2013 06:43:51

122 Tweet

37 Followers

223 Following

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

News: Qwen Qwen-Max jumps to #7, surpassing DeepSeek-v3! 🔥 Highlights: - Matches top proprietary models (GPT-4o/Sonnet 3.5) - +30 pts vs DeepSeek-v3 in coding, math, and hard prompts ChatGLM GLM-4-Plus also breaks into top-10, Chinese AI companies are closing the gap

News: <a href="/Alibaba_Qwen/">Qwen</a> Qwen-Max jumps to #7, surpassing DeepSeek-v3! 🔥

Highlights:
- Matches top proprietary models (GPT-4o/Sonnet 3.5)
- +30 pts vs DeepSeek-v3 in coding, math, and hard prompts

<a href="/ChatGLM/">ChatGLM</a> GLM-4-Plus also breaks into top-10, Chinese AI companies are closing the gap
Tanveer Hannan (@hannan_tanveer) 's Twitter Profile Photo

I recently reviewed Transformer²—a novel approach in self-adaptive AI. The model dynamically adjusts its weight matrices for task-specific optimization using reinforcement learning, marking a significant advancement in adaptive LLMs. sakana.ai/transformer-sq… #AI #MachineLearning

Brett Adcock (@adcock_brett) 's Twitter Profile Photo

Google added infinite memory to Gemini, allowing it to remember & refer to past interactions while answering It's available in Gemini Advanced and can be tweaked by editing/deleting chats OpenAI is also working on a similar feature, but no release yet

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Okay so I didn't super expect the results of the GPT4 vs. GPT4.5 poll from earlier today 😅, of this thread: x.com/karpathy/statu… ✅ Question 1: GPT4.5 is A; 56% of people prefer it. ❌Question 2: GPT4.5 is B; 43% of people prefer it. ❌Question 3: GPT4.5 is A; 35% of people

AK (@_akhaliq) 's Twitter Profile Photo

VisualThinker-R1-Zero R1-Zero's Aha Moment on just a 2B non-SFT Model VisualThinker-R1-Zero is a replication of DeepSeek-R1-Zero in visual reasoning. Successfully observe the emergent “aha moment” and increased response length in visual reasoning on just a 2B non-SFT models

VisualThinker-R1-Zero

R1-Zero's Aha Moment on just a 2B non-SFT Model

VisualThinker-R1-Zero is a replication of DeepSeek-R1-Zero in visual reasoning. Successfully observe the emergent “aha moment” and increased response length in visual reasoning on just a 2B non-SFT models
Tanveer Hannan (@hannan_tanveer) 's Twitter Profile Photo

Check out the #CVPR2025 paper on long video understanding. It achieves SOTA with a much simpler and efficient end-to-end approach.

Tanveer Hannan (@hannan_tanveer) 's Twitter Profile Photo

Effective long-context comprehension remains a significant hurdle for LLMs. Meta's forthcoming Llama 4 aims to address this by iRoPE architecture. I am looking forward to testing them on more real life setups like streaming videos.

Laura Leal-Taixe (@lealtaixe) 's Twitter Profile Photo

The time for new architectures is over? Not quite! SeNaTra, a native segmentation backbone, is waiting, let's see how it works 🧵arxiv.org/abs/2505.16993

Mohaiminul (Emon) Islam (@mmiemon) 's Twitter Profile Photo

Had a great time presenting at the GenAI session @CiscoMeraki—thanks Nahid Alam @ CVPR 2025 for the invite🙏 Catch us at #CVPR2025: 📌 BIMBA: arxiv.org/abs/2503.09590 (June 15, 4–6PM, Poster #282) 📌 ReVisionLLM: arxiv.org/abs/2411.14901 (June 14, 5–7PM, Poster #307) Gedas Bertasius Tanveer Hannan

Mohaiminul (Emon) Islam (@mmiemon) 's Twitter Profile Photo

Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout arxiv.org/abs/2411.14901 Tanveer Hannan

Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout arxiv.org/abs/2411.14901

<a href="/hannan_tanveer/">Tanveer Hannan</a>
Tanveer Hannan (@hannan_tanveer) 's Twitter Profile Photo

🚀 Check out our latest work, ReVisionLLM, now featured on the MCML blog! 🔍 A Vision-Language Model for accurate temporal grounding in hour-long videos. 👉 mcml.ai/news/2025-06-2… #VisionLanguage #MultimodalAI #MCML #CVPR2025

Mohaiminul (Emon) Islam (@mmiemon) 's Twitter Profile Photo

🚀 On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern Meta 🔍 Seeking Research Scientist/Engineer roles! 🔗 md-mohaiminul.github.io 📧 mmiemon [at] cs [dot] unc [dot] edu