Darshan Deshpande (@getdarshan) 's Twitter Profile
Darshan Deshpande

@getdarshan

ML Practicioner | Research Engineer @PatronusAI | ex-Research @USC_ISI

ID: 1286004215966347264

linkhttps://darshandeshpande.github.io/ calendar_today22-07-2020 18:24:47

140 Tweet

179 Followers

66 Following

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

Diffusion models are amazing! Want to know what makes them special? Join me at the TFUG event on the 4th of June to know more about them!

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

Extending my presentation, I've written an article on Diffusion models that includes #JAX code and explains the math in detail. This will also be my Weights & Biases Blogathon submission! Check it out here 🤗: bit.ly/diffusing-away…

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

I am excited to announce that my notebook comparing the code and mathematics behind DDPMs and DDIMs won Kaggle's Google OSS Expert Prize! 🥳 If you are interested in diffusion models then you can find my notebook here: kaggle.com/code/darshan15… #MachineLearning #OpenSource

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

I know I've been inactive for a while but I wanted to put this out there. I've joined USC Viterbi School for my MSCS this Fall and am working with an amazing team at the Information Sciences Institute on some amazing NLP research. Couldn't ask for more 🤗

I know I've been inactive for a while but I wanted to put this out there. I've joined <a href="/USCViterbi/">USC Viterbi School</a> for my MSCS this Fall and am working with an amazing team at the Information Sciences Institute on some amazing NLP research. Couldn't ask for more 🤗
Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

🚀🔥 Excited to announce our NAACL-2024 paper introducing ✨SPARK✨, a novel framework leveraging large language models for generalizable and effective argument quality evaluation. Paper: arxiv.org/abs/2305.12280 #NLP #LLM #NAACL2024 #AI #MachineLearning

🚀🔥 Excited to announce our NAACL-2024 paper introducing ✨SPARK✨, a novel framework leveraging large language models for generalizable and effective argument quality evaluation.

Paper: arxiv.org/abs/2305.12280

#NLP #LLM #NAACL2024 #AI #MachineLearning
Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

🆕 Curious about the origins of LLM alignment? Check my recent report that explores the topic in depth (with accompanying code for applying RLHF on Google AI's Gemma using PPO!) 🎉⚖️

PatronusAI (@patronusai) 's Twitter Profile Photo

1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBench by 3.0% - Outperforms GPT-4o on medical questions and answers by 6.8% - 1.4% higher accuracy than Lynx v1.0 on HaluBench Try it out on HuggingFace

1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀

- Beats Claude-3.5-Sonnet on HaluBench by 3.0%
- Outperforms GPT-4o on medical questions and answers by 6.8%
- 1.4% higher accuracy than Lynx v1.0 on HaluBench

Try it out on HuggingFace
PatronusAI (@patronusai) 's Twitter Profile Photo

Llama Guard is Off Duty 😲 It’s weak at toxicity detection! We benchmarked popular toxicity datasets spanning languages like Portuguese, Ukrainian, and Turkish, and found that Llama Guard has a very high false negative rate for toxic content! We found that base models like

NEC Laboratories Europe (@neclabseu) 's Twitter Profile Photo

Prototype-based networks can greatly enhance the robustness of #languagemodels in text classification, addressing real-world needs by combining robustness & interpretability for #trustworthyAI. Learn how in our Findings of #EMNLP24 accepted paper. neclab.eu/research-group… #NECLabs

Prototype-based networks can greatly enhance the robustness of #languagemodels in text classification, addressing real-world needs by combining robustness &amp; interpretability for #trustworthyAI. Learn how in our Findings of #EMNLP24 accepted paper. neclab.eu/research-group… #NECLabs
Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

Hey everyone, I am at #EMNLP2024 this week, co-presenting our work on Prototype based Networks with Zhivar Sourati. Please reach out if you are interested in AI evaluations, interpretability or model alignment!

PatronusAI (@patronusai) 's Twitter Profile Photo

1/ Introducing Lynx v2.0: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBench by 2.2% - 3.4% higher accuracy than Lynx v1.1 on HaluBench - Optimized for long context use cases - Detects 8 types of common hallucinations, including

1/ Introducing Lynx v2.0: an 8B State-of-the-Art RAG hallucination detection model 🚀 

- Beats Claude-3.5-Sonnet on HaluBench by 2.2%
- 3.4% higher accuracy than Lynx v1.1 on HaluBench
- Optimized for long context use cases
- Detects 8 types of common hallucinations, including
Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

I am excited to announce the release of our Glider model - small size, multi metric evals, explainable highlight spans, multilingual generalization, amazing subjective metric performance - Check it out!! Paper: arxiv.org/abs/2412.14140…

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

While experimenting with alignment methods, we observed that APO was more robust to noise in synthetic training data as compared to DPO or KTO. Thanks for the excellent contribution to the community Karel D’Oosterlinck and team 🚀

PatronusAI (@patronusai) 's Twitter Profile Photo

1/ Ever tried to remember the name of a movie you’ve seen – you can picture the scenes clearly, but the movie name won’t come to you? Introducing BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning 🔥 We benchmarked SOTA agents and found that the

1/ Ever tried to remember the name of a movie you’ve seen – you can picture the scenes clearly, but the movie name won’t come to you?

Introducing BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning 🔥

We benchmarked SOTA agents and found that the
PatronusAI (@patronusai) 's Twitter Profile Photo

We're excited to introduce the BLUR Leaderboard on Hugging Face 🔥 Earlier today, we open sourced BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning. It measures how effectively agents can help you identify something you vaguely remember, but can’t

Annie Franco (@anniefranco) 's Twitter Profile Photo

Building good benchmarks is hard, and PatronusAI has released what may be the coolest agent eval yet: ✅ Realistic and objectively useful task ✅ Multilingual, multimodal, and multi-domain ✅ Easy for humans, still challenging for agents

Annie Franco (@anniefranco) 's Twitter Profile Photo

My colleague Chris McConnell and I greatly enjoyed seeing Sky CH. Wang Darshan Deshpande Rebecca Qian Anand Kannappan bring this project to life. We’re excited to finally see it out in the world, and look forward to collaborating on the next one!

Darshan Deshpande (@getdarshan) 's Twitter Profile Photo

Non-deterministic trajectories need autonomous supervision. Introducing Percival, a SoTA system to detect issues with long context agentic problems and suggest fixes to systems. The time to make a move towards autonomous evaluations is now! 🔥

Non-deterministic trajectories need autonomous supervision. Introducing Percival, a SoTA system to detect issues with long context agentic problems and suggest fixes to systems. 

The time to make a move towards autonomous evaluations is now! 🔥