Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile
Dmytro Mishkin 🇺🇦

@ducha_aiki

Marrying classical CV and Deep Learning. I do things, which work, rather than being novel, but not working.

ID: 887278045761077248

linkhttp://dmytro.ai calendar_today18-07-2017 11:49:05

20,20K Tweet

22,22K Followers

673 Following

Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

MatChA: Cross-Algorithm Matching with Feature Augmentation Paula Carbó Cubero, Alberto Jaenal Gálvez, André Mateus, José Araújo, Patric Jensfelt tl;dr: allows you to match superpoint against SIFT (in terms of detector) if you decide to (but why?) arxiv.org/abs/2506.22336

MatChA: Cross-Algorithm Matching with Feature Augmentation

Paula Carbó Cubero, Alberto Jaenal Gálvez, André Mateus, José Araújo, Patric Jensfelt
tl;dr: allows you to match superpoint against SIFT (in terms of detector) if you decide to (but why?)
arxiv.org/abs/2506.22336
Paul-Edouard Sarlin @pesarlin.bsky.social (@pesarlin) 's Twitter Profile Photo

We released COLMAP v3.12, which adds long-awaited end-to-end support for multi-camera rigs and 360° panoramas 👀 COLMAP just got better at handling your robotics, AR/VR, or 360 data - try it and let us know! github.com/colmap/colmap/… Kudos to Johannes & team for this great work 🚀

We released COLMAP v3.12, which adds long-awaited end-to-end support for multi-camera rigs and 360° panoramas 👀 COLMAP just got better at handling your robotics, AR/VR, or 360 data - try it and let us know! github.com/colmap/colmap/… Kudos to Johannes & team for this great work 🚀
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras Petr Hruby Marc Pollefeys arxiv.org/abs/2506.22069 tl;dr: we have now minimal solvers in title, but (curved) line detection and matching is in todo list #ICCV2025

Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras

Petr Hruby <a href="/mapo1/">Marc Pollefeys</a> 
arxiv.org/abs/2506.22069
tl;dr: we have now minimal solvers in title, but (curved) line detection and matching is in todo list
#ICCV2025
Paul-Edouard Sarlin @pesarlin.bsky.social (@pesarlin) 's Twitter Profile Photo

Of course we welcome feature requests and community contributions! We have a long roadmap and certainly could use some help - please do reach out if you have relevant experience and time to spare for an amazing open-source impact 😀

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

I have very exciting news to share with you all. One of the smartest people I know on FP8 training, xr-5 🐀 from the Hugging Face nanotron team, will be giving a guest lecture on "The Practitioner's Guide to FP8 Training" as part of the course! This is a topic that is

I have very exciting news to share with you all. 

One of the smartest people I know on FP8 training, <a href="/xariusrke/">xr-5 🐀</a> from the <a href="/huggingface/">Hugging Face</a> nanotron team, will be giving a guest lecture on "The Practitioner's Guide to FP8 Training" as part of the course! 

This is a topic that is
#ICCV2025 (@iccvconference) 's Twitter Profile Photo

The deadline for camera-ready paper and copyright submission for #ICCV2025 is Friday, August 1st, 2025 @ 11:59 Pacific Time. Lock in!

The deadline for camera-ready paper and copyright submission for #ICCV2025 is Friday, August 1st, 2025 @ 11:59 Pacific Time.

Lock in!
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion Peilin Tao, Hainan Cui, Diantao Tu, Shuhan Shen tl;dr: in title - global SfM for rigid camera rigs (e.g. autonomous driving, arxiv.org/abs/2507.03306

MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion

Peilin Tao, Hainan Cui, Diantao Tu, Shuhan Shen

tl;dr: in title - global SfM for rigid camera rigs (e.g. autonomous driving,

arxiv.org/abs/2507.03306
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

LACONIC: A 3D Layout Adapter for Controllable Image Creation Léopold Maillard, Tom Durand, Adrien Ramanana Rahary, Maks Ovsjanikov tl;dr: encoder 3D scene condition (camera pose, bboxes, objects) -> cross attention with SD1.5 -> train on HyperSim. arxiv.org/abs/2507.03257

LACONIC: A 3D Layout Adapter for Controllable Image Creation

Léopold Maillard, Tom Durand, Adrien Ramanana Rahary, Maks Ovsjanikov

tl;dr: encoder 3D scene condition (camera pose, bboxes, objects) -&gt; cross attention with SD1.5 -&gt; train on HyperSim.

arxiv.org/abs/2507.03257
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu tl;dr: transform current frame tokens into "memory tokens", if they are different from existing, add, else update current corresponding arxiv.org/abs/2507.02863

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu

tl;dr: transform current frame tokens into "memory tokens", if they are different from existing, add, else update current corresponding
arxiv.org/abs/2507.02863
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

On the rankability of visual embeddings Ankit Sonthalia Arnas Uselis Seong Joon Oh tl;dr: one can discover "property ordering axis", such as age, etc in visual descriptors, often by having a couple of extreme examples. arxiv.org/abs/2507.03683

On the rankability of visual embeddings

Ankit Sonthalia <a href="/a_uselis/">Arnas Uselis</a>  <a href="/coallaoh/">Seong Joon Oh</a> 

tl;dr: one can discover "property ordering axis", such as age, etc in visual descriptors, often by having a couple of extreme examples. 
arxiv.org/abs/2507.03683
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

Vision-Language Models Can't See the Obvious Yasser Dahou Ngoc Dung Huynh, Phuc H. Le-Khac, Wamiq Reyaz Para, Ankit Singh,Sanath Narayan tl;dr: despite the title, they are pretty good, although far from perfect (see Table 3 screenshot). Cool benchmark arxiv.org/abs/2507.04741

Vision-Language Models Can't See the Obvious

<a href="/dahou_yasser/">Yasser Dahou</a>  Ngoc Dung Huynh, Phuc H. Le-Khac, Wamiq Reyaz Para, Ankit Singh,<a href="/NarayanSanath/">Sanath Narayan</a> 

tl;dr: despite the title, they are pretty good, although far from perfect (see Table 3 screenshot). Cool benchmark
arxiv.org/abs/2507.04741
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Is there any research on what happens if, instead of adding rope into every attention layer, we add learnable posembs in every layer? Either always the same, or each layer its own set?

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

This paper is pretty cool; through careful tuning, they show: - you can train LLMs with batch-size as small as 1, just need smaller lr. - even plain SGD works at small batch. - Fancy optims mainly help at larger batch. (This reconciles discrepancy with past ResNet research.) - At

This paper is pretty cool; through careful tuning, they show:
- you can train LLMs with batch-size as small as 1, just need smaller lr.
- even plain SGD works at small batch.
- Fancy optims mainly help at larger batch. (This reconciles discrepancy with past ResNet research.)
- At
Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

Lol, that's brilliant from Ben Recht recent post: "This result inevitably broke the internet. The reaction on Twitter was “Bro, that’s clearly wrong.” The reaction on Bluesky was “See, I told you so.”" argmin.net/p/are-develope…