Ahmet Iscen (@ahmetius) 's Twitter Profile
Ahmet Iscen

@ahmetius

Research scientist at Google DeepMind

ID: 1179672664002183168

calendar_today03-10-2019 08:21:04

44 Tweet

661 Followers

168 Following

Alireza Fathi (@alirezafathi) 's Twitter Profile Photo

🚀Introducing AVIS: a groundbreaking system that couples #LLM powered planning & reasoning with external tools, resulting in #StateOfTheArt performance on VQA datasets that demand external knowledge! 🧠🔍

Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

How do we find information on the web? We try to address this question in AVIS, by coupling #LLM-based reasoner and planner with external tools, e.g. search. This results in a significant performance increase in challenging fine-grained VQA datasets, where SOTA VLMs struggle.

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano Gemini Ultra’s performance exceeds current state-of-the-art results on

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano

Gemini Ultra’s performance exceeds current state-of-the-art results on
Ziniu Hu (@acbuller) 's Twitter Profile Photo

Interested in LLM + Tool-Use, via Tree-Search? This afternoon in #NeurIPS2023, #215, I'll present "AVIS: Autonomous Visual Information Seeking with Large Language Model Agent" (blog.research.google/2023/08/autono…) Feel free to drop by and chat.

Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

We will be organizing the 1st Tool-Augmented VIsion (TAVI) Workshop at #CVPR2024. We are looking forward to having an exciting list of keynote speakers covering various topics about tool-use and retrieval augmented models. More details at: sites.google.com/view/tavi-cvpr…

Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

VLMs are great, but can we use their generative capabilities for web-scale entity recognition? GERALD leverages VLMs to generate unambiguous, language-based and discriminative codes for 6M-scale entity recognition. Looking forward to present GERALD at CVPR24!

Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

🔥 Calling all #CVPR2024 attendees! 🔥 Join us for the 1st Tool-Augmented VIsion (TAVI) Workshop on Monday morning in Summit 321! 💡 5 inspiring keynote talks 🎨 5 invited posters from the main conference Don't miss out! ➡️ More info: sites.google.com/corp/view/tavi…

🔥 Calling all #CVPR2024 attendees! 🔥  

Join us for the 1st Tool-Augmented VIsion (TAVI) Workshop on Monday morning in Summit 321!   

💡 5 inspiring keynote talks 
🎨 5 invited posters from the main conference

Don't miss out! 
➡️ More info: sites.google.com/corp/view/tavi…
Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

Xuhui Jui, the author of the Instruct-Imagen (CVPR24 oral paper), will present his work in 20 minutes! Come to Summit 321! #CVPR24

Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

Cordelia Schmid will now present "Multistage reasoning for video understanding and scene generation." in Summit 321 ! #CVPR2024

Arsha Nagrani (@nagraniarsha) 's Twitter Profile Photo

Ahmet Ahmet Iscen, Cordelia Schmid and I are looking to hire a student researcher at @GooglDeepMind this fall! Start: September Loc: Cam, USA but flexible Unfortunately I’m not at CVPR News this year (2 month old baby!!👶) but pls find Ahmet or Cordelia #CVPR2025 if interested!

Yisong Yue (@yisongyue) 's Twitter Profile Photo

In case you missed our #ICML2024 oral presentation, check out SceneCraft, an LLM Agent for writing Blender-executable code that can render complex scenes with up to a hundred 3D assets. Paper: arxiv.org/abs/2403.01248 The SceneCraft agent is able to do complex spatial planning

In case you missed our #ICML2024 oral presentation, check out SceneCraft, an LLM Agent for writing Blender-executable code that can render complex scenes with up to a hundred 3D assets.

Paper: arxiv.org/abs/2403.01248

The SceneCraft agent is able to do complex spatial planning
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval Pavel Šuma Giorgos Kordopatis-Zilos Ahmet Iscen @giotolias tl;dr: global+local similarity via transformer->binarize descrs+distill. Crucial:train with random number of descriptors. arxiv.org/abs/2408.03282

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

<a href="/SumaPavel/">Pavel Šuma</a> <a href="/g_kordo/">Giorgos Kordopatis-Zilos</a> <a href="/ahmetius/">Ahmet Iscen</a> @giotolias 

tl;dr: global+local similarity via transformer-&gt;binarize descrs+distill. Crucial:train with random number of descriptors.

arxiv.org/abs/2408.03282
Alireza Fathi (@alirezafathi) 's Twitter Profile Photo

Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at [email protected] Cordelia Schmid

Ahmet Iscen (@ahmetius) 's Twitter Profile Photo

Want to work on the future of multimodal AI? Our Google DeepMind team in Grenoble, led by Cordelia Schmid, is hiring interns for multimodal AI research (long-video understanding and visual reasoning in 2D and 3D). Email [email protected] or find me at #NeurIPS2024!

Want to work on the future of multimodal AI? Our Google DeepMind team in Grenoble, led by <a href="/CordeliaSchmid/">Cordelia Schmid</a>, is hiring interns for multimodal AI research (long-video understanding and visual reasoning in 2D and 3D). Email ai.gnb.hiring@gmail.com or find me at #NeurIPS2024!
Alireza Fathi (@alirezafathi) 's Twitter Profile Photo

Our team at Google DeepMind Foundational Research is hiring full-time Research Scientists and Research Interns! Multimodal, Reasoning, self-improving agents, Video Understanding. Looking for candidates with strong papers at top ML and CV conferences. Email: [email protected]

Alireza Fathi (@alirezafathi) 's Twitter Profile Photo

Our team at Google DeepMind Foundational Research has an opening for a full-time Research Scientist! Areas of Interest are Multimodal, 3D and Spatial Reasoning, Self-improving Agents. Looking for candidates with strong publications at top ML and CV conferences. Email: