Rajiv Shah (@rajistics) Twitter Tweets • TwiCopy

Rajiv Shah

@rajistics

+ Follow

occasionally funny videos along with practical AI posts, now at ML/AI @ContextualAI - was @huggingface @datarobot @snorkelai

ID: 2835252451

linkhttp://www.rajivshah.com calendar_today28-09-2014 15:58:37

1,1K Tweet

2,2K Followers

354 Following

dk

@dkposts

a year ago

Built a natural language search engine for NBA plays. Uses spacy, fuzzy searching, and the nba_api library. Supports any query that includes a player's name and a type of play! repo: github.com/dkgitcode/ball… demo:

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Shishir Kumar Prasad

@invinc4u

5 months ago

Excited to share our latest work at Instacart! We built LACE—an LLM-powered framework to evaluate multi-turn customer chats. What took hours now takes minutes, helping us scale GenAI with speed and confidence. 🔗 tech.instacart.com/turbocharging-…

thumb_up_off_alt130

chat_bubble_outline4

repeat15

shareShare

Nina Lopatina

@ninalopatina

5 months ago

Looking forward to a lively panel discussion on agentic futures this Thursday at World Summit AI: We do AI different.! There's immense potential in what Agentic AI can help us accomplish, but we're largely still navigating the critical security and safety considerations that will determine how we

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

William Berrios

@w33lliam

5 months ago

Excited to share 🤯 that our LMUnit models with Contextual AI just claimed the top spots on RewardBench2 🥇 How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: 🧵 1/11

Excited to share 🤯 that our LMUnit models with <a href="/ContextualAI/">Contextual AI</a> just claimed the top spots on RewardBench2 🥇

How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below:

🧵 1/11

thumb_up_off_alt115

chat_bubble_outline6

repeat22

shareShare

Barr Yaron

@barrnanas

5 months ago

How are people actually interfacing with all of these models? Besides the typical few shot learning / prompting, the overwhelming answer is RAG (or retrieval augmented generation). 70% of respondents say they’re using RAG in one way or another.

thumb_up_off_alt26

chat_bubble_outline1

repeat1

shareShare

Tibor Blaho

@btibor91

4 months ago

According to The Information, OpenAI is requiring customers to spend at least $10 million for a new consulting-like service using forward deployed engineers (FDEs), around a dozen people hired in recent months, several of whom previously worked at Palantir, where OpenAI engineers

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat83

shareShare

Rajiv Shah

@rajistics

4 months ago

Why Video Failed based on Sergey Levine “Language Models in Plato’s Cave”: sergeylevine.substack.com/p/language-mod…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Peng Qi

@qi2peng2

4 months ago

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of

thumb_up_off_alt222

chat_bubble_outline5

repeat44

shareShare

Sheshansh Agrawal

@sheshanshag

4 months ago

Excited to release Contextual-SQL! 🏆#1 local Text-to-SQL system that is currently top 4 (behind API models) on BIRD benchmark! 🌐Fully open-source, runs locally 🔥MIT license 🧵

thumb_up_off_alt41

chat_bubble_outline1

repeat12

shareShare

Akash Mahajan

@akashmjn

4 months ago

Context engineering → better tools for agents (not just better retrieval/RAG). Traditional retrieval works well on pointed questions over chunks/snippets. But struggles with holistic cross-document questions, forcing you to stuff entire docs into context. We need `llms.txt`

thumb_up_off_alt15

chat_bubble_outline1

repeat5

shareShare

finbarr

@finbarrtimbers

4 months ago

horrifying bug of the day is finding out the vllm and huggingface produce significantly different logprobs discuss.vllm.ai/t/numerical-di…

thumb_up_off_alt783

chat_bubble_outline29

repeat59

shareShare

Rajiv Shah

@rajistics

4 months ago

Claude Opus 4 is the first to beat the human baseline 🤯 It simply has better analytic skills on pricing. Not sure why more people aren't talking about this (but look for my video on this)

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Rajiv Shah

@rajistics

4 months ago

Is your RAG pipeline actually retrieving the right chunks? I just dropped a video walking through a full retrieval analysis for Retrieval-Augmented Generation (RAG) systems — and not just in theory. 🧪 What I cover: • Why retrieval quality matters way more than people think •

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ethan Mollick

@emollick

4 months ago

I suspect the next few weeks after Grok 4 follows the same pattern as Grok 3 xAI beats everyone to market with the first RonnaFLOP model. The benchmarks show the 10-20% improvement the scaling law suggests. In the coming months, the other labs release their RonnaFLOPs, catch up.

thumb_up_off_alt915

chat_bubble_outline26

repeat43

shareShare

Andon Labs

@andonlabs

4 months ago

Thanks to Elon Musk and the xAI team for inviting us to share the latest updates to vending bench. Grok 4 jumps to the top of the leaderboard.

Thanks to <a href="/elonmusk/">Elon Musk</a> and the <a href="/xai/">xAI</a> team for inviting us to share the latest updates to vending bench. Grok 4 jumps to the top of the leaderboard.

thumb_up_off_alt4,4K

chat_bubble_outline267

repeat1,1K

shareShare