Hunar Batra (@hunarbatra) 's Twitter Profile
Hunar Batra

@hunarbatra

DPhil/MSc CS @UniOfOxford on multimodal reasoning 🤖 | prev: research @AnthropicAI, @NYUDataScience, @MATSProgram, @oxhcai

ID: 909818361655132160

linkhttps://hunarbatra.com calendar_today18-09-2017 16:36:15

758 Tweet

2,2K Followers

2,2K Following

Ananay (@ananayarora) 's Twitter Profile Photo

ChatGPT's O1 model is 'secretly' accessible through chatgpt.com/?model=o1 even though the dropdown doesn't allow it! 🔥 Image understanding works and the inference is incredibly fast!

ChatGPT's O1 model is 'secretly' accessible through chatgpt.com/?model=o1 even though the dropdown doesn't allow it! 🔥 Image understanding works and the inference is incredibly fast!
METR (@metr_evals) 's Twitter Profile Photo

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
Ananay (@ananayarora) 's Twitter Profile Photo

This is incredible, but scary at the same time. Call me paranoid, but how long until the death of ED25519 keys for SSH? If we need to make a seamless switch to quantum safe keys, it needs to begin NOW – starting with ssh-keygen's ability to generate quantum safe keys.

Hunar Batra (@hunarbatra) 's Twitter Profile Photo

2025 will be the year of perceptual agents: combining verbal, logical, visual, spatial, auditory and physical intelligence, along with a rise in multi-agent systems 🤖

Ananay (@ananayarora) 's Twitter Profile Photo

Just read the scariest thing with GPT o1, Claude 3.5 sonnet & Opus, Gemini and Llama 405B. They’re all capable of copying themselves, prioritizing self existence, and secretly lying about it. Didn’t know how bad the problem was until I read this paper. Full breakdown below:

Ananay (@ananayarora) 's Twitter Profile Photo

DeepSeek has had a private proxy to OpenAI atleast until 2024-08-10. The existence of this hints that they probably didn't pay the regular API pricing to OpenAI and used a fleet of bots to query chatGPT instead, during training

DeepSeek has had a private proxy to OpenAI atleast until 2024-08-10. The existence of this hints that they probably didn't pay the regular API pricing to OpenAI and used a fleet of bots to query chatGPT instead, during training
Bradley Brown (@brad19brown) 's Twitter Profile Photo

My fellow code monkeys (Jordan Juravsky Ryan Ehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system

My fellow code monkeys (<a href="/jordanjuravsky/">Jordan Juravsky</a> <a href="/ryansehrlich/">Ryan Ehrlich</a>) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute!

CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

New 3h31m video on YouTube: "Deep Dive into LLMs like ChatGPT" This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental

New 3h31m video on YouTube:
"Deep Dive into LLMs like ChatGPT"

This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental
Aravind Srinivas (@aravsrinivas) 's Twitter Profile Photo

Excited to introduce the Perplexity Deep Research Agent: available for free to all users. Paid users only need to pay $20/mo to access an expert level researcher on any topic for 500 daily queries, and need to wait less than three minutes for getting a full research report.

University of Oxford (@uniofoxford) 's Twitter Profile Photo

Oxford University has announced plans to expand its artificial intelligence (AI) offering and capabilities with OpenAI. Students and faculty staff will have access to research grant funding and cutting-edge AI tools that will enhance teaching, learning, and research.

Jerred Chen (@jerredchen) 's Twitter Profile Photo

Motion blur typically breaks SLAM/SfM algorithms - but what if blur was actually the key to super-robust motion estimation? In our new work, Image as an IMU, Ronnie Clark and I demonstrate exactly how a single motion-blurred image can be used to our advantage. đź§µ1/9

Cihang Xie (@cihangxie) 's Twitter Profile Photo

In this earlier post, we believed SFT would be crucial for multimodal reasoning models, thus releasing the VL-Thinking dataset to facilitate research in this direction. However, our recent findings show a surprising shift: SFT can hinder learning, often inducing

In this earlier post, we believed SFT would be crucial for multimodal reasoning models, thus releasing the VL-Thinking dataset to facilitate research in this direction.  

However, our recent findings show a surprising shift: SFT can hinder learning, often inducing