Rajan Vivek (@rajan__vivek) 's Twitter Profile
Rajan Vivek

@rajan__vivek

Member of Technical Staff @ContextualAI. MS CS + AI researcher @stanford. Prev @scale_AI @georgiatech

ID: 1676006311295848448

calendar_today03-07-2023 23:13:55

39 Tweet

186 Followers

250 Following

Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

Instruction following in LLMs changed the game, allowing humans to communicate their needs to AI in the most natural way. Shouldn’t retrieval systems follow instructions too? Super excited for the release of our SoTA instruction-following reranker!

Contextual AI (@contextualai) 's Twitter Profile Photo

🔥 Introducing the most reliable way to evaluate LLMs and agents in production! It's time to stop “vibe testing” your AI systems. Our latest developer's guide shows you how to rigorously test AI systems so that they hold up in production, using Contextual AI's LMUnit evaluation

🔥 Introducing the most reliable way to evaluate LLMs and agents in production! It's time to stop “vibe testing” your AI systems.

Our latest developer's guide shows you how to rigorously test AI systems so that they hold up in production, using Contextual AI's LMUnit evaluation
Douwe Kiela (@douwekiela) 's Twitter Profile Photo

Every time a new model drops with an expanded context window (like Meta's impressive Llama 4 Scout with its 10M token capacity), I see the inevitable "RAG is dead" posts flooding my feed. But this fundamentally misunderstands what RAG is about. 🧵 1/

Every time a new model drops with an expanded context window (like Meta's impressive Llama 4 Scout with its 10M token capacity), I see the inevitable "RAG is dead" posts flooding my feed.

But this fundamentally misunderstands what RAG is about.

🧵 1/
Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

Your agent is only as good as the context it understands. For enterprise, documents are an essential part of that context. We dropped a crazy good document parser built for RAG. Check it out!

Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

Verifiable rewards only get you so far. We ran a lot of experiments figuring out how to build the best reward models that measure nearly any quality describable in natural language. This has been crucial for aligning our models to any criteria we want. We're open sourcing!

Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

This could be the next dominant paradigm once we figure out how to make EBTs tractable at scale. Starting with a nebulous idea and iteratively verifying/refining until it feels correct is much closer to how humans think. Extended COT resembles this but it’s like needing to keep

Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

Feels like we’ve hit the point where the primary bottleneck for AI being useful isn’t intelligence, it’s context+memory management. Today’s models are already smart enough to do most of the production use cases they’re failing at. For these real world tasks, the performance delta

Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

This is a crucial point people seem to talk about less. openai found the same: o4-mini < o3 < o1 at avoiding hallucinations! we're sharing our solution to this tomorrow 👀

This is a crucial point people seem to talk about less. openai found the same: o4-mini &lt; o3 &lt; o1 at avoiding hallucinations!

we're sharing our solution to this tomorrow 👀
Rajan Vivek (@rajan__vivek) 's Twitter Profile Photo

The hardest part of training models to never hallucinate is they tend to become less helpful. We ran a lot of experiments navigating this trade-off and found some tricks that work well. Check out William Berrios's thread to see our post-training strategies!