Dennis Aumiller (@d_aumiller) 's Twitter Profile
Dennis Aumiller

@d_aumiller

Getting paid to complain about LLM evaluation @cohere. PhD on summarization from @UniHeidelberg. Previously: @AmazonScience, @sap. Find me on Stackoverflow!

ID: 964558325001211904

linkhttps://dennis-aumiller.de calendar_today16-02-2018 17:53:20

804 Tweet

684 Followers

722 Following

Jan Trienes (@jantrienes) 's Twitter Profile Photo

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧡 about our new paper: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it?

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧡 about our new paper: an interpretable framework for salience analysis in LLMs. 

First of all, information salience is a fuzzy concept. So how can we even measure it?
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿

Maya Moritz (@mayarmoritz) 's Twitter Profile Photo

Are you studying, working in, or utilizing #forensics? We're looking for expert opinions in a short, anonymous survey! Message or email me with any questions or for the link! #Science #DNA #pathology

Are you studying, working in, or utilizing #forensics? We're looking for expert opinions in a short, anonymous survey! Message or email me with any questions or for the link! #Science #DNA #pathology
Kyle Duffy (@kyduffy) 's Twitter Profile Photo

My team recently launched a best-in-class LLM specializing in English and Arabic. We just published a tech report explaining our methods. Check it out on arxiv: arxiv.org/abs/2503.14603

Matthias GallΓ© (@mgalle) 's Twitter Profile Photo

A year ago we released LBBP - a drop-in replacement of HumanEval that was more challenging and less leaked Internally we have been using the multilingual version of this for benchmarking, and as code is not only python we decided to release that as well huggingface.co/datasets/Coher…

Nils Reimers (@nils_reimers) 's Twitter Profile Photo

π‚π¨π‘πžπ«πž π„π¦π›πžπ π―πŸ’ - π’π­πšπ­πž-𝐨𝐟-𝐭𝐑𝐞-𝐚𝐫𝐭 𝐭𝐞𝐱𝐭 & 𝐒𝐦𝐚𝐠𝐞 𝐫𝐞𝐭𝐫𝐒𝐞𝐯𝐚π₯ Today we are releasing Embed v4, unlocking so many cool new features for retrieval. πŸ‡ΊπŸ‡³ 100+ languages πŸ–ΌοΈ Text & Image capabilities πŸ“œ 128k context length

Arnold Ventures (@arnold_ventures) 's Twitter Profile Photo

AV's latest #BRIDGEseries convening brought together researchers, public officials, and industry experts to better understand the impact and prevalence of retail theft, and what can be done to effectively prevent it.

AV's latest #BRIDGEseries convening brought together researchers, public officials, and industry experts to better understand the impact and prevalence of retail theft, and what can be done to effectively prevent it.
cohere (@cohere) 's Twitter Profile Photo

Command A, our state-of-the-art generative model, is now the highest-scoring generalist LLM on the Bird Bench leaderboard for SQL! It outperforms other systems that rely on extensive scaffolding to tackle these SQL benchmarks, and instead delivers these results out-of-the-box,

Command A, our state-of-the-art generative model, is now the highest-scoring generalist LLM on the Bird Bench leaderboard for SQL!  

It outperforms other systems that rely on extensive scaffolding to tackle these SQL benchmarks, and instead delivers these results out-of-the-box,
Dennis Aumiller (@d_aumiller) 's Twitter Profile Photo

It's my first time area chairing for the ACLRollingReview May cycle! And it will also be the first time asking for availability of emergency reviewers πŸ˜… If you (or somebody you know) has availability for reviews in the Resources and Languages track, I have two papers missing reviews.

Dennis Aumiller (@d_aumiller) 's Twitter Profile Photo

Probably a good time to mention that I will be in Vienna attending ACL in two weeks. If you're tired of attending session after session, come and talk to me about LLM evaluation instead (I won't tell on you for skipping sessions🀫)! DMs are open if you want to set something up :)

Dennis Aumiller (@d_aumiller) 's Twitter Profile Photo

Genuine question: how do people in tech with imposter syndrome survive the bay area?? It's bad enough elsewhere, but with the talent density there (plus supposedly being more openly bragging), it seems like death

Dennis Aumiller (@d_aumiller) 's Twitter Profile Photo

No secret to anyone who works with Pierre (and his team), but they are super cracked. Seeing (pun intended) this model come to life was amazing! Please try it out and let us know what you think 😎