Sebastian Riedel (@riedelcastro@sigmoid.social) (@riedelcastro) 's Twitter Profile
Sebastian Riedel (@[email protected])

@riedelcastro

Researcher in NLP/ML @deepmind, @ucl_nlp, @[email protected] on Mastodon

ID: 76080258

linkhttp://www.riedelcastro.org/ calendar_today21-09-2009 17:09:54

1,1K Tweet

16,16K Followers

460 Following

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Google presents Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Long-context LM: - Often rivals SotA retrieval and RAG systems - But still struggles with areas like compositional reasoning repo: github.com/google-deepmin… abs: arxiv.org/abs/2406.13121

Google presents Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Long-context LM:
- Often rivals SotA retrieval and RAG systems
- But still struggles with areas like compositional reasoning

repo: github.com/google-deepmin…
abs: arxiv.org/abs/2406.13121
Sohee Yang (@soheeyang_) 's Twitter Profile Photo

I'll be presenting our TACL paper "Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis" at #ACL2024! 📍 Oral: Mon 12:00 (Machine Learning for NLP I, World Ballroom B) 📍 Poster: Tue 16:00 (Conv. Center A1) Please drop by if you are interested!

Sohee Yang (@soheeyang_) 's Twitter Profile Photo

Our paper "Do Large Language Models Latently Perform Multi-Hop Reasoning?" will be presented at #ACL2024 today. 📍 Mon 14:00-15:30 Poster Session 2 (Conv. Center A1) Please visit our poster if you are interested, and catch me to chat about the latent reasoning ability of LLMs!

Tim Rocktäschel (@_rockt) 's Twitter Profile Photo

The popular science series 10 Things You Should Know by SevenDials Orion Publishing has a new member: Artificial Intelligence! Out September 26th. You can pre-order it at geni.us/ArtificialInte…

The popular science series 10 Things You Should Know by <a href="/SevenDialsBooks/">SevenDials</a> <a href="/orionbooks/">Orion Publishing</a> has a new member: Artificial Intelligence!

Out September 26th. You can pre-order it at geni.us/ArtificialInte…
Eduardo Sánchez (@eduardosg_ai) 's Twitter Profile Photo

🚨NEW BENCHMARK🚨 Are LLMs good at linguistic reasoning if we minimize the chance of prior language memorization? We introduce Linguini🍝, a benchmark for linguistic reasoning in which SOTA models perform below 25%. w/ Belen Alastruey, Mikel Artetxe, Marta R. Costa-jussa et al. 🧵(1/n)

🚨NEW BENCHMARK🚨

Are LLMs good at linguistic reasoning if we minimize the chance of prior language memorization?

We introduce Linguini🍝, a benchmark for linguistic reasoning in which SOTA models perform below 25%.

w/ <a href="/b_alastruey/">Belen Alastruey</a>, <a href="/artetxem/">Mikel Artetxe</a>, <a href="/costajussamarta/">Marta R. Costa-jussa</a> et al.

🧵(1/n)
Ledell Wu (@ledellwu) 's Twitter Profile Photo

We are launching Design Your Own Avatar (DYOA)! With our latest innovations in multimodal generation at Creatify AI , you can now create ultra realistic AI avatars from text description and bring it to life! This unblocks a whole new level of possibilities. Check it out:

Nicola Cancedda (@nicola_cancedda) 's Twitter Profile Photo

I am looking for a Research Scientist intern for 2025. If you have already published work that involves understanding behaviours of AI models looking at their parameters and activations, I would like to hear from you. metacareers.com/jobs/556063310…

Dipanjan Das (@dipanjand) 's Twitter Profile Photo

I am hiring for a research engineering role in NYC, focused on Gemini post training. If you are interested, please apply here. Deadline is just in two weeks. boards.greenhouse.io/deepmind/jobs/…

Varun Godbole (@varungodbole) 's Twitter Profile Photo

Excited to share our prompt tuning playbook! (Not an official product. Just authors tips & tricks for better prompting). I'm most excited about first half on mental models for post-training & prompting. Feedback/forks welcome! #LLM #PromptEngineering github.com/varungodbole/p…

Excited to share our prompt tuning playbook! (Not an official product. Just authors tips &amp; tricks for better prompting).  I'm most excited about first half on mental models for post-training &amp; prompting. Feedback/forks welcome!  #LLM #PromptEngineering

github.com/varungodbole/p…
Theo Weber (@theophaneweber) 's Twitter Profile Photo

The team @jhamrick and I co-lead is hiring a research engineer. If you are interested in improving the capabilities of LLMs in the planning and reasoning space, and building generally capable agents, please apply! boards.greenhouse.io/deepmind/jobs/…

Sohee Yang (@soheeyang_) 's Twitter Profile Photo

🚨 New Paper 🚨 Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80%+ for

Sebastian Riedel (@riedelcastro@sigmoid.social) (@riedelcastro) 's Twitter Profile Photo

Frontier models can do this stuff, but also not! Opinions differ on how much we even want this (CC Geoffrey Irving), but understanding the patterns will be critical regardless. Been a pleasure to work with Latent Reasoning Dream Team Sohee Yang Mor Geva Nora Kassner!

Lisan al Gaib (@scaling01) 's Twitter Profile Photo

It's paper review day (every day) - since I discovered that DeepMind already knows everything let's look at their latest Paper arxiv.org/pdf/2411.16679

It's paper review day (every day) - since I discovered that DeepMind already knows everything let's look at their latest Paper

arxiv.org/pdf/2411.16679
Aida Nematzadeh 🦋 (@aidanematzadeh) 's Twitter Profile Photo

I am hiring for RS/RE positions! If you are interested in language-flavored multimodal learning, evaluation, or post-training apply here 🦎 boards.greenhouse.io/deepmind/jobs/… I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)

Shrestha Basu Mallick (@shresbm) 's Twitter Profile Photo

The Gemini 2.0 era begins with 2.0 Flash Experimental release ⚡️ 📈2.0 Flash beats 1.5 Pro across factuality, reasoning, coding, math. 📳 More modalities - image and audio out (in EAP) 🔧 Native tool use for Google Search, code execution and 3P functions 🆕 a new multimodal,

Alexander Chen (@alexanderchen) 's Twitter Profile Photo

Want to build on the new Google Multimodal Live API with Gemini 2.0? My teammates Kyle Phillips + Tina Tarighian + Trudy Painter made open-source starter demos! 🧵 Here's a React.js boilerplate console you can start with. Code here: github.com/google-gemini/…

Sohee Yang (@soheeyang_) 's Twitter Profile Photo

🚨 New Paper 🧵 How effectively do reasoning models reevaluate their thought? We find that: - Models excel at identifying unhelpful thoughts but struggle to recover from them - Smaller models can be more robust - Self-reevaluation ability is far from true meta-cognitive awareness

🚨 New Paper 🧵
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness