Ahmet Iscen (@ahmetius) Twitter Tweets • TwiCopy

Alireza Fathi

2 years ago

🚀Introducing AVIS: a groundbreaking system that couples #LLM powered planning & reasoning with external tools, resulting in #StateOfTheArt performance on VQA datasets that demand external knowledge! 🧠🔍

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Ahmet Iscen

@ahmetius

2 years ago

How do we find information on the web? We try to address this question in AVIS, by coupling #LLM-based reasoner and planner with external tools, e.g. search. This results in a significant performance increase in challenging fine-grained VQA datasets, where SOTA VLMs struggle.

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Alireza Fathi

@alirezafathi

2 years ago

Here is our Google AI blog post on AVIS, a Large Language Model Agent that achieves state-of-the-art results on visual information seeking tasks. Ziniu Hu Ahmet Iscen Chen Sun Cordelia Schmid

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Sundar Pichai

@sundarpichai

2 years ago

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano Gemini Ultra’s performance exceeds current state-of-the-art results on

thumb_up_off_alt23,23K

chat_bubble_outline962

repeat3,3K

shareShare

Ziniu Hu

@acbuller

2 years ago

Interested in LLM + Tool-Use, via Tree-Search? This afternoon in #NeurIPS2023, #215, I'll present "AVIS: Autonomous Visual Information Seeking with Large Language Model Agent" (blog.research.google/2023/08/autono…) Feel free to drop by and chat.

thumb_up_off_alt146

chat_bubble_outline2

repeat26

shareShare

Ahmet Iscen

@ahmetius

2 years ago

Looking forward to present RECO at #ICLR2024 !

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Ahmet Iscen

@ahmetius

2 years ago

We will be organizing the 1st Tool-Augmented VIsion (TAVI) Workshop at #CVPR2024. We are looking forward to having an exciting list of keynote speakers covering various topics about tool-use and retrieval augmented models. More details at: sites.google.com/view/tavi-cvpr…

thumb_up_off_alt34

chat_bubble_outline1

repeat9

shareShare

Ahmet Iscen

@ahmetius

2 years ago

VLMs are great, but can we use their generative capabilities for web-scale entity recognition? GERALD leverages VLMs to generate unambiguous, language-based and discriminative codes for 6M-scale entity recognition. Looking forward to present GERALD at CVPR24!

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Ahmet Iscen

@ahmetius

a year ago

🔥 Calling all #CVPR2024 attendees! 🔥 Join us for the 1st Tool-Augmented VIsion (TAVI) Workshop on Monday morning in Summit 321! 💡 5 inspiring keynote talks 🎨 5 invited posters from the main conference Don't miss out! ➡️ More info: sites.google.com/corp/view/tavi…

thumb_up_off_alt21

chat_bubble_outline1

repeat7

shareShare

Ahmet Iscen

@ahmetius

a year ago

Xuhui Jui, the author of the Instruct-Imagen (CVPR24 oral paper), will present his work in 20 minutes! Come to Summit 321! #CVPR24

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Ahmet Iscen

@ahmetius

a year ago

Cordelia Schmid will now present "Multistage reasoning for video understanding and scene generation." in Summit 321 ! #CVPR2024

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Arsha Nagrani

@nagraniarsha

a year ago

Ahmet Ahmet Iscen, Cordelia Schmid and I are looking to hire a student researcher at @GooglDeepMind this fall! Start: September Loc: Cam, USA but flexible Unfortunately I’m not at CVPR News this year (2 month old baby!!👶) but pls find Ahmet or Cordelia #CVPR2025 if interested!

thumb_up_off_alt81

chat_bubble_outline6

repeat10

shareShare

Yisong Yue

@yisongyue

a year ago

In case you missed our #ICML2024 oral presentation, check out SceneCraft, an LLM Agent for writing Blender-executable code that can render complex scenes with up to a hundred 3D assets. Paper: arxiv.org/abs/2403.01248 The SceneCraft agent is able to do complex spatial planning

thumb_up_off_alt80

chat_bubble_outline3

repeat19

shareShare

Dmytro Mishkin 🇺🇦

@ducha_aiki

a year ago

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval Pavel Šuma Giorgos Kordopatis-Zilos Ahmet Iscen @giotolias tl;dr: global+local similarity via transformer->binarize descrs+distill. Crucial:train with random number of descriptors. arxiv.org/abs/2408.03282

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

<a href="/SumaPavel/">Pavel Šuma</a> <a href="/g_kordo/">Giorgos Kordopatis-Zilos</a> <a href="/ahmetius/">Ahmet Iscen</a> @giotolias

tl;dr: global+local similarity via transformer->binarize descrs+distill. Crucial:train with random number of descriptors.

arxiv.org/abs/2408.03282

thumb_up_off_alt25

chat_bubble_outline1

repeat11

shareShare

Alireza Fathi

@alirezafathi

a year ago

Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at [email protected] Cordelia Schmid

thumb_up_off_alt383

chat_bubble_outline4

repeat49

shareShare

Ahmet Iscen

@ahmetius

a year ago

Want to work on the future of multimodal AI? Our Google DeepMind team in Grenoble, led by Cordelia Schmid, is hiring interns for multimodal AI research (long-video understanding and visual reasoning in 2D and 3D). Email [email protected] or find me at #NeurIPS2024!

Want to work on the future of multimodal AI? Our Google DeepMind team in Grenoble, led by <a href="/CordeliaSchmid/">Cordelia Schmid</a>, is hiring interns for multimodal AI research (long-video understanding and visual reasoning in 2D and 3D). Email ai.gnb.hiring@gmail.com or find me at #NeurIPS2024!

thumb_up_off_alt183

chat_bubble_outline5

repeat18

shareShare

Alireza Fathi

@alirezafathi

a year ago

Our team at Google DeepMind Foundational Research is hiring full-time Research Scientists and Research Interns! Multimodal, Reasoning, self-improving agents, Video Understanding. Looking for candidates with strong papers at top ML and CV conferences. Email: [email protected]

thumb_up_off_alt623

chat_bubble_outline13

repeat66

shareShare

Alireza Fathi

@alirezafathi

4 months ago

Our team at Google DeepMind Foundational Research has an opening for a full-time Research Scientist! Areas of Interest are Multimodal, 3D and Spatial Reasoning, Self-improving Agents. Looking for candidates with strong publications at top ML and CV conferences. Email:

thumb_up_off_alt352

chat_bubble_outline1

repeat28

shareShare