Wasi Ahmad (@ahmadwasi) 's Twitter Profile
Wasi Ahmad

@ahmadwasi

Sr. Research Scientist, NVIDIA

ID: 91817330

linkhttp://wasiahmad.github.io calendar_today22-11-2009 16:19:07

66 Tweet

216 Followers

248 Following

Wasi Ahmad (@ahmadwasi) 's Twitter Profile Photo

🚀 Our latest research paper on code representation learning, CodeSage, outperforms OpenAI text-embedding-3-large on Code2Code search, and is on par with NL2Code search tasks! Dive into the techniques and insights - check them out on the blog: code-representation-learning.github.io

Wasi Ahmad (@ahmadwasi) 's Twitter Profile Photo

🚀 Dive into the cutting-edge research exploring keyphrase generation! The work delves deep into evaluating keyphrase generation on diversity, utility, faithfulness, and reference alignment.

Wasi Ahmad (@ahmadwasi) 's Twitter Profile Photo

Introducing RepoFormer! A repository-level code completion framework. The project was led by Di during his internship (summer'23) at AWS AI Labs. Read the paper to learn about the awesome work.

Wasi Ahmad (@ahmadwasi) 's Twitter Profile Photo

🔥 Introducing IllusionVQA 🔥 A challenging dataset to test VLMs' ability to locate and comprehend optical illusions. While humans achieved near perfect accuracy, GPT4V, the best-performing VLM, achieved 63% and 49.7% accuracy on the comprehension and localization tasks.

Arif's Den 𝕏 (@arifsden) 's Twitter Profile Photo

My twitter network Please help me to spread this. #SaveBangladeshStudents #SaveBangladeshiStudents #StudentProtest #StarlinkforBangladesh

Di Wu (@diwu0162) 's Twitter Profile Photo

Introducing LongMemEval: a comprehensive, challenging, and scalable benchmark for testing the long-term memory of chat assistants. 📊 LongMemEval features: • 📝 164 topics • 💡 5 core memory abilities • 🔍 500 manually created questions • ⏳ Freely extensible chat history

Introducing LongMemEval: a comprehensive, challenging, and scalable benchmark for testing the long-term memory of chat assistants.

📊 LongMemEval features:
• 📝 164 topics
• 💡 5 core memory abilities
• 🔍 500 manually created questions
• ⏳ Freely extensible chat history
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Today, we announce a collaboration between SWE Arena (Computer Intelligence) and Hugging Face (w/ Gradio). We believe that Hugging Face can help us shape the future of AI Software Engineering evaluations. We have now open-sourced the SWE Arena codebase to accelerate the development of

Today, we announce a collaboration between SWE Arena (<a href="/BigComProject/">Computer Intelligence</a>) and <a href="/huggingface/">Hugging Face</a> (w/ <a href="/Gradio/">Gradio</a>). We believe that <a href="/huggingface/">Hugging Face</a> can help us shape the future of AI Software Engineering evaluations.

We have now open-sourced the SWE Arena codebase to accelerate the development of
Mostofa Patwary (@mapatwary) 's Twitter Profile Photo

Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adlr/nemo…

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Nvidia presents Llama-Nemotron: Efficient Reasoning Models - An open family of models w/ exceptional reasoning capabilities and inference efficiency - Discusses the training procedure, incl. NAS from Llama 3 for accelerated inference, knowledge distillation, and continued

Nvidia presents Llama-Nemotron: Efficient Reasoning Models

- An open family of models w/ exceptional reasoning capabilities and inference efficiency

- Discusses the training procedure, incl. NAS from Llama 3 for accelerated inference, knowledge distillation, and continued
Wasi Ahmad (@ahmadwasi) 's Twitter Profile Photo

Tested K2-Think on LiveCodeBench (v6) — 2408:2505 (454 samples). Got pass@1[avg-of-10] = 60.8%, vs. Nemotron-32B at 69.8%. Disappointing to see such inflated/false scores being reported.