Ohad Rubin (@ohadrubin) 's Twitter Profile
Ohad Rubin

@ohadrubin

P.hD student. Researching Natural Language Processing at Tel Aviv University. Let's have more paperclips? 📎⏩

ID: 28635924

linkhttps://ohadrubin.github.io/ calendar_today03-04-2009 19:42:47

3,3K Tweet

841 Followers

3,3K Following

Tomer Wolfson (@tomerwolfson) 's Twitter Profile Photo

Deep research systems can't handle questions involving dozens of documents. Let me show you why this is (still) true đź§µand what does it all have to do with Grace Kelly? (1/)

Deep research systems can't handle questions involving dozens of documents.

Let me show you why this is (still) true đź§µand what does it all have to do with Grace Kelly? (1/)
Bepis™ 🔀 (@underwaterbepis) 's Twitter Profile Photo

Neurosama (LLM Vtuber) was just called a cl*ank*r and in response brute forced her filter to send death threats in response and then got stuck in a loop about wanting to stop existing

Daniel Nakov (@dnak0v) 's Twitter Profile Photo

just remove the following string from claude-code cli.js and it will always just read full files: offset:h.number().optional().describe("The line number to start reading from. Only provide if the file is too large to read at once"),limit:h.number().optional().describe("The

Ross Taylor (@rosstaylor90) 's Twitter Profile Photo

Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs

jason liu - vacation mode (@jxnlco) 's Twitter Profile Photo

why dspy usually wastes your time (and when it doesn't) the question: "should i use dspy for prompt optimization? it seems like the perfect tool for improving my rag system." the answer: dspy is great for very specific, well-defined tasks. but for most rag systems, it's a

Ohad Rubin (@ohadrubin) 's Twitter Profile Photo

Anyone else feeling the same? It's a bit annoying that existing benchmarks like SWEBench don't capture this reward-hacking.

Ohad Rubin (@ohadrubin) 's Twitter Profile Photo

I don't understand why people keep saying that men don't see therapists, all the girls I know who study psychology are seeing someone

Graham Neubig (@gneubig) 's Twitter Profile Photo

Which LM is better at agentic coding? We have a bunch of useful academic benchmarks like SWE-Bench, but we don't have a good comparison of agentic coding LMs *in the wild*. To solve this, we released PR Arena: github.com/neulab/pr-arena

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Unfortunate reality: most open-source LLM servers (e.g. Together) don’t offer cache-hit discounts, while closed providers like OpenAI do. DeepSeek does discount, but most third-party servers don't. Closed models can end up much cheaper than open ones :(