Rajan Vivek (@rajan__vivek) Twitter Tweets • TwiCopy

Rajan Vivek

@rajan__vivek

+ Follow

Member of Technical Staff @ContextualAI. MS CS + AI researcher @stanford. Prev @scale_AI @georgiatech

ID: 1676006311295848448

calendar_today03-07-2023 23:13:55

39 Tweet

186 Followers

250 Following

Rajan Vivek

@rajan__vivek

8 months ago

Instruction following in LLMs changed the game, allowing humans to communicate their needs to AI in the most natural way. Shouldn’t retrieval systems follow instructions too? Super excited for the release of our SoTA instruction-following reranker!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Contextual AI

@contextualai

7 months ago

🔥 Introducing the most reliable way to evaluate LLMs and agents in production! It's time to stop “vibe testing” your AI systems. Our latest developer's guide shows you how to rigorously test AI systems so that they hold up in production, using Contextual AI's LMUnit evaluation

thumb_up_off_alt27

chat_bubble_outline2

repeat9

shareShare

Douwe Kiela

@douwekiela

7 months ago

Every time a new model drops with an expanded context window (like Meta's impressive Llama 4 Scout with its 10M token capacity), I see the inevitable "RAG is dead" posts flooding my feed. But this fundamentally misunderstands what RAG is about. 🧵 1/

thumb_up_off_alt64

chat_bubble_outline5

repeat9

shareShare

Rajan Vivek

@rajan__vivek

6 months ago

Your agent is only as good as the context it understands. For enterprise, documents are an essential part of that context. We dropped a crazy good document parser built for RAG. Check it out!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Rajan Vivek

@rajan__vivek

5 months ago

Verifiable rewards only get you so far. We ran a lot of experiments figuring out how to build the best reward models that measure nearly any quality describable in natural language. This has been crucial for aligning our models to any criteria we want. We're open sourcing!

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Rajan Vivek

@rajan__vivek

4 months ago

This could be the next dominant paradigm once we figure out how to make EBTs tractable at scale. Starting with a nebulous idea and iteratively verifying/refining until it feels correct is much closer to how humans think. Extended COT resembles this but it’s like needing to keep

thumb_up_off_alt108

chat_bubble_outline4

repeat19

shareShare

Rajan Vivek

@rajan__vivek

4 months ago

Feels like we’ve hit the point where the primary bottleneck for AI being useful isn’t intelligence, it’s context+memory management. Today’s models are already smart enough to do most of the production use cases they’re failing at. For these real world tasks, the performance delta

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Rajan Vivek

@rajan__vivek

4 months ago

This is a crucial point people seem to talk about less. openai found the same: o4-mini < o3 < o1 at avoiding hallucinations! we're sharing our solution to this tomorrow 👀

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Rajan Vivek

@rajan__vivek

4 months ago

The hardest part of training models to never hallucinate is they tend to become less helpful. We ran a lot of experiments navigating this trade-off and found some tricks that work well. Check out William Berrios's thread to see our post-training strategies!

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare