Vittorio Ferrari (@vittoferraricv) 's Twitter Profile
Vittorio Ferrari

@vittoferraricv

Director of Science at Synthesia.io

ID: 1275184138664976384

linkhttps://sites.google.com/view/vittoferrari calendar_today22-06-2020 21:50:10

60 Tweet

5,5K Followers

12 Following

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Four papers accepted to #ICCV2023! 1/4 Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories arxiv.org/abs/2306.09224 Dataset release coming soon! With @tejmensink Jasper Uijlings Lluís Castrejón Arushi Goel Cadar Howard Zhou Fei Sha, A.Araujo

Four papers accepted to #ICCV2023!
1/4

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

arxiv.org/abs/2306.09224

Dataset release coming soon!

With @tejmensink <a href="/JRRU/">Jasper Uijlings</a> <a href="/LluisCastrejon/">Lluís Castrejón</a> <a href="/goelarushi27/">Arushi Goel</a> <a href="/eucadar/">Cadar</a> <a href="/howardzzh/">Howard Zhou</a> <a href="/feishaAI/">Fei Sha</a>, A.Araujo
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Four papers accepted to #ICCV2023! 3/4 Agile Modeling: From Concept to Classifier in Minutes We empower any user to develop a classifier for a subjective visual concept in under 30 minutes. arxiv.org/abs/2302.12948 With O.Strech, E.Vendrow, and many others

Four papers accepted to #ICCV2023!
3/4

Agile Modeling:  From Concept to Classifier in Minutes

We empower any user to develop a classifier for a subjective visual concept in under 30 minutes.

arxiv.org/abs/2302.12948

With O.Strech, E.Vendrow, and many others
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Four papers accepted to #ICCV2023! 2/4 CAD-Estate: Large-scale CAD Model Annotation in RGB Videos >100k 3D objects annotated on RGB videos of complex scenes Dataset release coming soon! arxiv.org/abs/2306.09011 Stefan Popov, Kevis-Kokitsi Maninis, Matthias Niessner

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

We released our new “Encyclopedic VQA” dataset, which contains visual questions about detailed properties of fine-grained categories (1M VQA triplets total!). These pose a hard challenge for large foundation models. arxiv.org/abs/2306.09224 github.com/google-researc…

We released our new “Encyclopedic VQA” dataset, which contains visual questions about detailed properties of fine-grained categories (1M VQA triplets total!). These pose a hard challenge for large foundation models.

arxiv.org/abs/2306.09224

github.com/google-researc…
Emanuele Bugliarello (@ebugliarello) 's Twitter Profile Photo

Wouldn’t it be cool if AI could help us generate movies?🎬 We built a new benchmark to measure progress in this direction🍿 “StoryBench: A Multifaceted Benchmark for Continuous Story Visualization” 📄 arxiv.org/abs/2308.11606 👩‍💻 github.com/google/storybe… 📈 paperswithcode.com/dataset/storyb…

Wouldn’t it be cool if AI could help us generate movies?🎬
We built a new benchmark to measure progress in this direction🍿

“StoryBench: A Multifaceted Benchmark for Continuous Story Visualization”

📄 arxiv.org/abs/2308.11606
👩‍💻 github.com/google/storybe…
📈 paperswithcode.com/dataset/storyb…
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Check out CAD-Estate: a large dataset with 3D object and room layout annotations on RGB videos of complex multi-object scenes (101k objects in total!). github.com/google-researc… arxiv.org/abs/2306.09011 arxiv.org/abs/2306.09077 With Stefan Popov, Kevis-Kokitsi Maninis, Matthias Niessner

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

I am happy to share that I have joined Synthesia as Director of Science. Excited to start this new adventure! x.com/synthesiaIO/st…

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Three papers accepted to #NeurIPS 1/3 StoryBench: a new benchmark for text-to-video generation of stories to guide progress in assistive technology for filmmaking 🧑‍🎨 arxiv.org/abs/2308.11606 github.com/google/storybe… x.com/ebugliarello/s… With Emanuele Bugliarello, Hernan Moraldo, many others

Three papers accepted to #NeurIPS
1/3

StoryBench: a new benchmark for text-to-video generation of stories to guide progress in assistive technology for filmmaking 🧑‍🎨

arxiv.org/abs/2308.11606
github.com/google/storybe…
x.com/ebugliarello/s…

With <a href="/ebugliarello/">Emanuele Bugliarello</a>, <a href="/hhm/">Hernan Moraldo</a>, many others
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Three papers accepted to #NeurIPS 2/3 "Estimating Generic 3D Room Structures from 2D Annotations" 3D room layouts annotations for 2246 videos (part of CAD-Estate dataset). arxiv.org/abs/2306.09077 github.com/google-researc… With Denys Rozumnyi,Stefan Popov, Kevis-Kokitsi Maninis, Matthias Niessner

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Three papers accepted to #NeurIPS 3/3 NAVI: a dataset of image collections of objects, along with high-quality 3D object scans, near-perfect 2D-3D alignments, and accurate camera parameters. arxiv.org/abs/2306.09109 navidataset.github.io With Varun Jampani, Kevis-Kokitsi Maninis, others

Three papers accepted to #NeurIPS
3/3

NAVI: a dataset of image collections of objects, along with high-quality 3D object scans, near-perfect 2D-3D alignments, and accurate camera parameters.

arxiv.org/abs/2306.09109
navidataset.github.io

With <a href="/jampani_varun/">Varun Jampani</a>, <a href="/kmaninis/">Kevis-Kokitsi Maninis</a>, others
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

We are running the Vision and Sports Summer school again this year! Prague, July 22-27. We offer a broad-range of lectures on state-of-the-art Computer Vision techniques, as well as exciting sport activities, such as Volleyball, Frisbee and Table Tennis. cmp.felk.cvut.cz/summerschool20…

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Paper accepted to #CVPR2024! Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM With Walid BOUSSELHAM, Felix Petersen, Hilde Kuehne

Paper accepted to #CVPR2024!

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Paper: arxiv.org/abs/2312.00878
Demo:huggingface.co/spaces/WalidBo…
Code: github.com/WalBouss/GEM

With <a href="/BousselhamWalid/">Walid BOUSSELHAM</a>, <a href="/FHKPetersen/">Felix Petersen</a>, <a href="/HildeKuehne/">Hilde Kuehne</a>
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Introducing HAMMR: hierarchical multimodal agents that handle a broad range of VQA tasks within a single system (counting, spatial reasoning, OCR, visual pointing, external knowledge, and more). arxiv.org/abs/2404.05465 Lluís Castrejón @tejmensink Howard Zhou André Araujo Jasper Uijlings

Introducing HAMMR: hierarchical multimodal agents that handle a broad range of VQA tasks within a single system (counting, spatial reasoning, OCR, visual pointing, external knowledge, and more).

arxiv.org/abs/2404.05465

<a href="/LluisCastrejon/">Lluís Castrejón</a> @tejmensink <a href="/howardzzh/">Howard Zhou</a> <a href="/andrefaraujo/">André Araujo</a> <a href="/JRRU/">Jasper Uijlings</a>
Synthesia 🎥 (@synthesiaio) 's Twitter Profile Photo

AI Avatars have learned to interpret text now. 😬 Our soon-to-be-public EXPRESS-1 AI model enables Synthesia avatars to understand and adjust to the script automatically. 🤯 Join the pre-launch tech chat with: Victor Riparbelli, Matthias Niessner & Jon Starck 👀 x.com/i/spaces/1YpJk…

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Our EXPRESS-1 AI model enables @Synthesiaio avatars to understand and adjust to the script automatically 💥 This is a big milestone, so tune in tomorrow for a pre-launch chat with Matthias Niessner, Jon Starck, Victor Riparbelli and @AlexVoica X Spaces event link: x.com/i/spaces/1YpJk…

Our EXPRESS-1 AI model enables @Synthesiaio avatars to understand and adjust to the script automatically 💥

This is a big milestone, so tune in tomorrow for a pre-launch chat with <a href="/MattNiessner/">Matthias Niessner</a>, <a href="/jnstrck/">Jon Starck</a>, <a href="/vriparbelli/">Victor Riparbelli</a> and @AlexVoica

X Spaces event link: x.com/i/spaces/1YpJk…
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E "Grounding Everything: Emerging Localization Properties in Vision-Language Transformers" Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM

Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E

"Grounding Everything: Emerging Localization Properties in Vision-Language Transformers"

Paper: arxiv.org/abs/2312.00878
Demo:huggingface.co/spaces/WalidBo…
Code: github.com/WalBouss/GEM
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Paper accepted to the “Multimodal Algorithmic Reasoning” NeurIPS workshop! HAMMR: Hierarchical multimodal agents for handing many diverse VQA tasks in a single system arxiv.org/abs/2404.05465 Lluís Castrejón @tejmensink Howard Zhou André Araujo Jasper Uijlings

Paper accepted to the “Multimodal Algorithmic Reasoning” NeurIPS workshop!

HAMMR: Hierarchical multimodal agents for handing many diverse VQA tasks in a single system

arxiv.org/abs/2404.05465

<a href="/LluisCastrejon/">Lluís Castrejón</a> @tejmensink <a href="/howardzzh/">Howard Zhou</a> <a href="/andrefaraujo/">André Araujo</a> <a href="/JRRU/">Jasper Uijlings</a>
Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

I am happy to announce that I have joined Meta Reality Labs as a Principal Research Scientist, working on Spatial AI to power AR/MR experiences on Meta's wearable devices. It's the start of another adventure, and I thank all my new colleagues for making me feel welcome!