Roland Memisevic (@rolandmemisevic) Twitter Tweets • TwiCopy

Roland Memisevic

@rolandmemisevic

+ Follow

PhD, U Toronto 2008 (advisor Geoff Hinton)
Faculty @ MILA (until 2016)
Co-founder/CEO Twenty Billion Neurons (acquired 2021)
Qualcomm AI research since 2021

ID: 843157972356403200

linkhttps://www.iro.umontreal.ca/~memisevr/ calendar_today18-03-2017 17:51:40

154 Tweet

401 Followers

376 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Apratim Bhattacharyya

@apratimbh

2 years ago

Qualcomm AI Research is looking for interns (PhD/Master's) in Toronto, in the area of LLMs, multi-modality and agents. Job posting: tinyurl.com/3tyavey5

thumb_up_off_alt9

chat_bubble_outline1

repeat4

shareShare

Excited to present our #ICLR2024 paper “Look, Remember and Reason: Grounded Reasoning in Videos with Language Models” (arxiv.org/pdf/2306.17778…). Our method: LRR, is current ranked 1st on the STAR leaderboard: eval.ai/web/challenges… 1/3

thumb_up_off_alt39

chat_bubble_outline1

repeat14

shareShare

Roland Memisevic

@rolandmemisevic

2 years ago

A widely held believe is that difficult vision tasks require vision components like object detectors during inference. We show that we can push any of that stuff into the training stage and get SOTA on challenging visual reasoning tasks. End-to-end visual reasoning on raw pixels.

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Roland Memisevic

@rolandmemisevic

a year ago

I remember that well, have been looking for it recently but couldn't find it. Also features Jitendra Malik as far as I remember.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Roland Memisevic

@rolandmemisevic

a year ago

Even finnicky visual reason tasks are best solved end-to-end. By distilling visual subroutines, like object detection, into the model during training... Check out our poster at #ICLR2024

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Roland Memisevic

@rolandmemisevic

a year ago

I'll be giving a talk online next week on language models that "see" and interact with you in real-time.

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Roland Memisevic

@rolandmemisevic

10 months ago

It is tempting to view the context window of an LLM like an "array" - with elements you can access like in an addressable memory. In this work, we argue that this entrenched, but wrong, view may be at the heart of problems like the inability of LLMs to length-generalize.

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

JB

@iamjbdel

10 months ago

#COLM24 is live! Head over to paper-central to browse the proceedings and access each paper's 🤗 paper-page, where you can check out discussions, open-source resources, GitHub links, and OpenReview peer review conversations—all in one spot. By the way, this new conference

thumb_up_off_alt10

chat_bubble_outline1

repeat4

shareShare

Roland Memisevic

@rolandmemisevic

9 months ago

Yes! And there are many more.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Apratim Bhattacharyya

@apratimbh

8 months ago

🚨 Don't miss out our #NeurIPS2024 (D&B track) poster "Live Fitness Coaching as a Testbed for Situated Interaction". arXiv: arxiv.org/abs/2407.08101 Dataset: qualcomm.com/developer/soft… Code: github.com/Qualcomm-AI-re… (coming shortly) 📅 Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare

Apratim Bhattacharyya

@apratimbh

6 months ago

Join us at the CVPR 2025 Workshop on Vision-based Assistants in the Real-world (VAR) and tackle one of AI's biggest challenges: building systems that can comprehend and reason about dynamic, real-world scenes. Workshop Page: varworkshop.github.io #CVPR2025 1/2

thumb_up_off_alt49

chat_bubble_outline1

repeat16

shareShare

Roland Memisevic

@rolandmemisevic

6 months ago

There is a very simple possible explanation: The transformer architecture lacks a simple inductive bias - that of inductive ("step-by-step") inference. The lack of that bias leads to insane data inefficiency. If true, humans + RNNs will learn this task easily with much less data.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Roland Memisevic

@rolandmemisevic

5 months ago

A few years ago I lost a bet, saying by 2021 you could have a video call with someone without being able to tell if it's AI or human. It's 2025 and we're half-way there (the missing half is making the AI model see properly through the camera, which is still an unsolved problem).

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Roland Memisevic

@rolandmemisevic

4 months ago

No existing AI model can talk to a user in the real world and understand what's happening _right now_. This is called situated AI and it's still a wide open problem.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Roland Memisevic

@rolandmemisevic

2 months ago

Binary parity ("count the number of 1s in a bit-string") is a common task to show that transformers cannot generalize. Turns out, a random(!) RNN (train only readout) can learn the task easily, and it can do so with as few as 2 (two) training examples...: arxiv.org/pdf/2505.21749

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare