Rajiv Shah (@rajistics) 's Twitter Profile
Rajiv Shah

@rajistics

occasionally funny videos along with practical AI posts, now at ML/AI @ContextualAI - was @huggingface @datarobot @snorkelai

ID: 2835252451

linkhttp://www.rajivshah.com calendar_today28-09-2014 15:58:37

1,1K Tweet

2,2K Followers

354 Following

Nina Lopatina (@ninalopatina) 's Twitter Profile Photo

Looking forward to a lively panel discussion on agentic futures this Thursday at World Summit AI: We do AI different.! There's immense potential in what Agentic AI can help us accomplish, but we're largely still navigating the critical security and safety considerations that will determine how we

William Berrios (@w33lliam) 's Twitter Profile Photo

Excited to share 🤯 that our LMUnit models with Contextual AI just claimed the top spots on RewardBench2 🥇 How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: 🧵 1/11

Excited to share 🤯 that our LMUnit models with <a href="/ContextualAI/">Contextual AI</a> just claimed the top spots on RewardBench2 🥇

How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below:

🧵 1/11
Barr Yaron (@barrnanas) 's Twitter Profile Photo

How are people actually interfacing with all of these models? Besides the typical few shot learning / prompting, the overwhelming answer is RAG (or retrieval augmented generation). 70% of respondents say they’re using RAG in one way or another.

How are people actually interfacing with all of these models? Besides the typical few shot learning / prompting, the overwhelming answer is RAG (or retrieval augmented generation). 70% of respondents say they’re using RAG in one way or another.
Tibor Blaho (@btibor91) 's Twitter Profile Photo

According to The Information, OpenAI is requiring customers to spend at least $10 million for a new consulting-like service using forward deployed engineers (FDEs), around a dozen people hired in recent months, several of whom previously worked at Palantir, where OpenAI engineers

Peng Qi (@qi2peng2) 's Twitter Profile Photo

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of

Sheshansh Agrawal (@sheshanshag) 's Twitter Profile Photo

Excited to release Contextual-SQL! 🏆#1 local Text-to-SQL system that is currently top 4 (behind API models) on BIRD benchmark! 🌐Fully open-source, runs locally 🔥MIT license 🧵

Excited to release Contextual-SQL!

🏆#1 local Text-to-SQL system that is currently top 4 (behind API models) on BIRD benchmark!
🌐Fully open-source, runs locally
🔥MIT license

🧵
finbarr (@finbarrtimbers) 's Twitter Profile Photo

horrifying bug of the day is finding out the vllm and huggingface produce significantly different logprobs discuss.vllm.ai/t/numerical-di…

Rajiv Shah (@rajistics) 's Twitter Profile Photo

Claude Opus 4 is the first to beat the human baseline 🤯 It simply has better analytic skills on pricing. Not sure why more people aren't talking about this (but look for my video on this)

Rajiv Shah (@rajistics) 's Twitter Profile Photo

Is your RAG pipeline actually retrieving the right chunks? I just dropped a video walking through a full retrieval analysis for Retrieval-Augmented Generation (RAG) systems — and not just in theory. 🧪 What I cover: • Why retrieval quality matters way more than people think •

Ethan Mollick (@emollick) 's Twitter Profile Photo

I suspect the next few weeks after Grok 4 follows the same pattern as Grok 3 xAI beats everyone to market with the first RonnaFLOP model. The benchmarks show the 10-20% improvement the scaling law suggests. In the coming months, the other labs release their RonnaFLOPs, catch up.