Orion Weller @ ICLR 2025 (@orionweller) Twitter Tweets • TwiCopy

Orion Weller @ ICLR 2025

@orionweller

+ Follow

PhD student @jhuclsp. Previously: @samaya_ai, @allen_ai. #NLProc and #IR research

ID: 3057771000

linkhttp://orionweller.github.io calendar_today02-03-2015 19:11:31

412 Tweet

1,1K Followers

886 Following

Benjamin Van Durme

@ben_vandurme

2 months ago

Ettin, a two-headed giant ... language model en.wikipedia.org/wiki/Ettin

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

To anyone wondering what's the difference between encoders and decoders on downstream tasks when both models are trained the same way, this blog post is made for you. Very interesting resource and new models available, impressive work 🙌

thumb_up_off_alt16

chat_bubble_outline1

repeat4

shareShare

tomaarsen

@tomaarsen

2 months ago

I'm very excited to see more Ettin-based embedding models being trained. It would be really solid to see training recipes applied on all 6 sizes. The 17M encoder should allow for a model that outperforms all-MiniLM-L6-v2 with roughly the same size, I think

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

search founder

@n0riskn0r3ward

2 months ago

Finally got the chance to read about Ettin (huggingface.co/blog/ettin). Good stuff, encoders are better. Makes sense. But practically, there are all kinds of Apache 2 decoders to work with trained on 15T+ tokens and I'm pretty focused on retrieval...

thumb_up_off_alt24

chat_bubble_outline1

repeat4

shareShare

Antoine Chaffin

@antoine_chaffin

a month ago

If you missed it because you were at a conference, last week we released SOTA encoders and decoders across various sizes alongside public data to reproduce them I already had nice feedback from people on the small models, can’t wait to see what the community will build!

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Knowledgator

@knowledgator

a month ago

🧠 Variants include: The models are based on DeBERTa, ModernBERT and the Ettin small model for edge device use-cases. – gliclass-edge-v3.0: ultra-efficient – gliclass-large-v3.0: high accuracy – gliclass-x-base: robust multilingual zero-shot

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Orion Weller @ ICLR 2025

@orionweller

a month ago

Does anyone have these stats for ICLR/ NeurIPS etc? Wondering if there’s a US trend to avoid *CL confs (my personal experience sadly) or if this is the case at all conferences

thumb_up_off_alt11

chat_bubble_outline3

repeat3

shareShare

Marc Marone

@ruyimarone

a month ago

Just multiplying everything together

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Antoine Chaffin

@antoine_chaffin

a month ago

Obviously it has been catched by Sumit before the official announcement! 😁 I am very happy to announce that PyLate has now an associated paper and it has been accepted to CIKM! Very happy to share this milestone with my dear co-creator Raphaël Sourty 🫶

thumb_up_off_alt39

chat_bubble_outline3

repeat7

shareShare

Hamel Husain

@hamelhusain

a month ago

TOC for the open book "Beyond Naive RAG: Practical Advanced Methods" from our RAG series. This condenses 5 hours of instruction into something you can read in ~30 minutes. Link: maven.com/p/945082/beyon… Ben Clavié Nandan Thakur Orion Weller Antoine Chaffin Bryan Bischof fka Dr. Donut

thumb_up_off_alt584

chat_bubble_outline8

repeat73

shareShare

Fred Jonsson

@enginoid

a month ago

so many great artifacts of this work: - an open-data recipe for ModernBERT that exceeds ModernBERT in performance - tons of checkpoints (17m, 32m, 68m, 150m, 400m, 1b) - direct comparison of same training recipe, data & model shape with masked vs. causal LM - open data +

thumb_up_off_alt13

chat_bubble_outline1

repeat4

shareShare

Orion Weller @ ICLR 2025

@orionweller

a month ago

Very cool project! It’s nice to get concrete data on how better retrieval improves deep research / agentic search!

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare