Sebastian Riedel (@riedelcastro@sigmoid.social) (@riedelcastro) Twitter Tweets • TwiCopy

Aran Komatsuzaki

a year ago

Google presents Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Long-context LM: - Often rivals SotA retrieval and RAG systems - But still struggles with areas like compositional reasoning repo: github.com/google-deepmin… abs: arxiv.org/abs/2406.13121

thumb_up_off_alt329

chat_bubble_outline3

repeat86

shareShare

Sohee Yang

@soheeyang_

a year ago

I'll be presenting our TACL paper "Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis" at #ACL2024! 📍 Oral: Mon 12:00 (Machine Learning for NLP I, World Ballroom B) 📍 Poster: Tue 16:00 (Conv. Center A1) Please drop by if you are interested!

thumb_up_off_alt28

chat_bubble_outline0

repeat6

shareShare

Sohee Yang

@soheeyang_

a year ago

Our paper "Do Large Language Models Latently Perform Multi-Hop Reasoning?" will be presented at #ACL2024 today. 📍 Mon 14:00-15:30 Poster Session 2 (Conv. Center A1) Please visit our poster if you are interested, and catch me to chat about the latent reasoning ability of LLMs!

thumb_up_off_alt103

chat_bubble_outline3

repeat14

shareShare

Tim Rocktäschel

@_rockt

a year ago

The popular science series 10 Things You Should Know by SevenDials Orion Publishing has a new member: Artificial Intelligence! Out September 26th. You can pre-order it at geni.us/ArtificialInte…

The popular science series 10 Things You Should Know by <a href="/SevenDialsBooks/">SevenDials</a> <a href="/orionbooks/">Orion Publishing</a> has a new member: Artificial Intelligence!

Out September 26th. You can pre-order it at geni.us/ArtificialInte…

thumb_up_off_alt93

chat_bubble_outline7

repeat12

shareShare

Sebastian Riedel (@[email protected])

@riedelcastro

a year ago

Super proud to have been able to work with you Patrick Lewis! Does this improve my Bacon number?

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare

Eduardo Sánchez

@eduardosg_ai

a year ago

🚨NEW BENCHMARK🚨 Are LLMs good at linguistic reasoning if we minimize the chance of prior language memorization? We introduce Linguini🍝, a benchmark for linguistic reasoning in which SOTA models perform below 25%. w/ Belen Alastruey, Mikel Artetxe, Marta R. Costa-jussa et al. 🧵(1/n)

thumb_up_off_alt117

chat_bubble_outline3

repeat23

shareShare

Ledell Wu

@ledellwu

a year ago

We are launching Design Your Own Avatar (DYOA)! With our latest innovations in multimodal generation at Creatify AI , you can now create ultra realistic AI avatars from text description and bring it to life! This unblocks a whole new level of possibilities. Check it out:

thumb_up_off_alt30

chat_bubble_outline8

repeat5

shareShare

Sebastian Riedel (@[email protected])

@riedelcastro

10 months ago

Amazing progress Yuxiang (Jimmy) Wu and Zhengyao Jiang, and great to see the impact of "agent scaffolding" given a base model.

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Nicola Cancedda

@nicola_cancedda

10 months ago

I am looking for a Research Scientist intern for 2025. If you have already published work that involves understanding behaviours of AI models looking at their parameters and activations, I would like to hear from you. metacareers.com/jobs/556063310…

thumb_up_off_alt329

chat_bubble_outline5

repeat50

shareShare

Dipanjan Das

@dipanjand

10 months ago

I am hiring for a research engineering role in NYC, focused on Gemini post training. If you are interested, please apply here. Deadline is just in two weeks. boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt380

chat_bubble_outline4

repeat63

shareShare

Varun Godbole

@varungodbole

9 months ago

Excited to share our prompt tuning playbook! (Not an official product. Just authors tips & tricks for better prompting). I'm most excited about first half on mental models for post-training & prompting. Feedback/forks welcome! #LLM #PromptEngineering github.com/varungodbole/p…

thumb_up_off_alt614

chat_bubble_outline13

repeat132

shareShare

Theo Weber

@theophaneweber

9 months ago

The team @jhamrick and I co-lead is hiring a research engineer. If you are interested in improving the capabilities of LLMs in the planning and reasoning space, and building generally capable agents, please apply! boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt56

chat_bubble_outline1

repeat13

shareShare

Sohee Yang

@soheeyang_

9 months ago

🚨 New Paper 🚨 Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80%+ for

thumb_up_off_alt192

chat_bubble_outline7

repeat46

shareShare

Sebastian Riedel (@[email protected])

@riedelcastro

9 months ago

Frontier models can do this stuff, but also not! Opinions differ on how much we even want this (CC Geoffrey Irving), but understanding the patterns will be critical regardless. Been a pleasure to work with Latent Reasoning Dream Team Sohee Yang Mor Geva Nora Kassner!

thumb_up_off_alt27

chat_bubble_outline0

repeat6

shareShare

Lisan al Gaib

@scaling01

9 months ago

It's paper review day (every day) - since I discovered that DeepMind already knows everything let's look at their latest Paper arxiv.org/pdf/2411.16679

thumb_up_off_alt634

chat_bubble_outline5

repeat43

shareShare

Pasquale Minervini is hiring postdocs! 🚀

@pminervini

8 months ago

Sohee (Sohee Yang) in the house! 🚀🚀🚀🚀

Sohee (<a href="/soheeyang_/">Sohee Yang</a>) in the house! 🚀🚀🚀🚀

thumb_up_off_alt23

chat_bubble_outline0

repeat6

shareShare

Aida Nematzadeh 🦋

@aidanematzadeh

8 months ago

I am hiring for RS/RE positions! If you are interested in language-flavored multimodal learning, evaluation, or post-training apply here 🦎 boards.greenhouse.io/deepmind/jobs/… I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)

thumb_up_off_alt211

chat_bubble_outline4

repeat44

shareShare

Shrestha Basu Mallick

@shresbm

8 months ago

The Gemini 2.0 era begins with 2.0 Flash Experimental release ⚡️ 📈2.0 Flash beats 1.5 Pro across factuality, reasoning, coding, math. 📳 More modalities - image and audio out (in EAP) 🔧 Native tool use for Google Search, code execution and 3P functions 🆕 a new multimodal,

thumb_up_off_alt187

chat_bubble_outline6

repeat13

shareShare

Alexander Chen

@alexanderchen

8 months ago

Want to build on the new Google Multimodal Live API with Gemini 2.0? My teammates Kyle Phillips + Tina Tarighian + Trudy Painter made open-source starter demos! 🧵 Here's a React.js boilerplate console you can start with. Code here: github.com/google-gemini/…

thumb_up_off_alt124

chat_bubble_outline2

repeat19

shareShare

Sohee Yang

@soheeyang_

2 months ago

🚨 New Paper 🧵 How effectively do reasoning models reevaluate their thought? We find that: - Models excel at identifying unhelpful thoughts but struggle to recover from them - Smaller models can be more robust - Self-reevaluation ability is far from true meta-cognitive awareness

thumb_up_off_alt103

chat_bubble_outline3

repeat24

shareShare