Oskar Hallström (@oskar_hallstrom) Twitter Tweets • TwiCopy

Benjamin Clavié

6 months ago

Multimodal RAG: Just use ColPali/DSE then pass your screenshots to the LLM This is the dream, but how well do LLMs read text contained in images? We wanted to know, so we tried a simple thing: do results change on evals when using screenshots rather than text as input? Yes.

thumb_up_off_alt438

chat_bubble_outline16

repeat81

shareShare

Antoine Chaffin

@antoine_chaffin

5 months ago

You can just continue pre-train things ✨ Happy to announce the release of BioClinical ModernBERT, a ModernBERT model whose pre-training has been continued on medical data The result: SOTA performance on various medical tasks with long context support and ModernBERT efficiency

thumb_up_off_alt208

chat_bubble_outline4

repeat33

shareShare

Antoine Chaffin

@antoine_chaffin

5 months ago

I'll be covering Reason-ModernColBERT in tonight's presentation, so please come if you are interested! maven.com/p/1973fe/going… (And please be gentle, this is the first time I will be speaking live in front of this many people 😭)

thumb_up_off_alt87

chat_bubble_outline5

repeat14

shareShare

Raphaël Sourty

@raphaelsrty

5 months ago

With LightOn we are thrilled to release pylate-rs 🚀⭐️ An efficient inference engine for late-interaction models written in Rust and based on Candle ⚡️ pylate-rs is the best Python library / Rust crate / NPM package to spawn late-interaction models in milliseconds.

thumb_up_off_alt100

chat_bubble_outline5

repeat17

shareShare

Antoine Chaffin

@antoine_chaffin

5 months ago

Magical Raphaël back at it again You can now compute ColBERT embeddings in Rust at light speed for any PyLate models (thus any ColBERT models) The best part? You can use it with WebAssembly to create awesome demos/visualizations in the browser!

thumb_up_off_alt96

chat_bubble_outline2

repeat13

shareShare

Amélie Chatelain

@amelietabatta

5 months ago

🚀 Insane day yesterday for the Knowledge squad LightOn! Raphaël Sourty shipped PyLate-rs and Antoine Chaffin delivered a beautiful lecture on late interaction models supremacy, LFG ❤️

🚀 Insane day yesterday for the Knowledge squad <a href="/LightOnIO/">LightOn</a>! <a href="/raphaelsrty/">Raphaël Sourty</a> shipped PyLate-rs and <a href="/antoine_chaffin/">Antoine Chaffin</a> delivered a beautiful lecture on late interaction models supremacy, LFG ❤️

thumb_up_off_alt28

chat_bubble_outline1

repeat6

shareShare

Antoine Chaffin

@antoine_chaffin

4 months ago

Should we just focus our pre-training efforts on decoders? To answer this, we trained Ettin, various identically trained encoders and decoders, ranging from 17M to 1B parameters on 2T tokens of open data (beating Llama 3.2 and ModernBERT in the process)!

thumb_up_off_alt78

chat_bubble_outline1

repeat8

shareShare

Antoine Chaffin

@antoine_chaffin

4 months ago

We are at #ACL2025 with Oskar Hallström Do not hesitate to come discuss with us if you are interested in IR, encoders, late interaction or VLMs! I am attaching a picture of us because I figured people do not know our faces due to our profile pictures 🥲

thumb_up_off_alt46

chat_bubble_outline1

repeat3

shareShare

LightOn

@lightonio

4 months ago

📍 ACL 2025: Encoders-only coffee chat anyone? Antoine Chaffin & Oskar Hallström are in Vienna to present ModernBERT Paper at ACL 2025 📅 Don’t miss the Poster Session today 11am. ➡️ Poster 115 ☕ Or feel free to catch them in the #ACL2025NLP aisles! 👉 To know more

📍 ACL 2025: Encoders-only coffee chat anyone?

<a href="/antoine_chaffin/">Antoine Chaffin</a> & <a href="/oskar_hallstrom/">Oskar Hallström</a> are in Vienna to present ModernBERT Paper at <a href="/aclmeeting/">ACL 2025</a>

📅 Don’t miss the Poster Session today 11am.
➡️ Poster 115

☕ Or feel free to catch them in the #ACL2025NLP aisles!

👉 To know more

thumb_up_off_alt30

chat_bubble_outline1

repeat10

shareShare

Antoine Chaffin

@antoine_chaffin

4 months ago

Today is the national encoder day at #ACL2025 Make sure to come to the board 115 to claim your encoder party membership card!

thumb_up_off_alt39

chat_bubble_outline1

repeat2

shareShare

Raphaël Sourty

@raphaelsrty

2 months ago

Happy to release the 1.3.0 version of PyLate at LightOn with my handsome co-maintainer Antoine Chaffin 😗 Fast-Plaid is now the default backend for PyLate retrieval. It's faster and as accurate as the original Stanford PLAID on both CPU and GPU

Happy to release the 1.3.0 version of PyLate at <a href="/LightOnIO/">LightOn</a> with my handsome co-maintainer <a href="/antoine_chaffin/">Antoine Chaffin</a> 😗

Fast-Plaid is now the default backend for PyLate retrieval. It's faster and as accurate as the original Stanford PLAID on both CPU and GPU

thumb_up_off_alt26

chat_bubble_outline2

repeat8

shareShare

staghado

@staghado

a month ago

4/10 Efficiency Single H100 GPU (80 GB): • 5.71 pages / s ≈ 493 000 pages / day • 6.49× faster than dots.ocr • 2.67× faster than PaddleOCR-VL-0.9B • 1.73× faster than DeepSeekOCR • < $0.01 per 1 000 pages A compact model that’s both high-quality and cost-efficient.

thumb_up_off_alt21

chat_bubble_outline1

repeat3

shareShare

Antoine Chaffin

@antoine_chaffin

a month ago

LightOn joins the OCR mania We release a 1B model achieving SOTA results while being much faster than all the recent releases It is also an end-to-end trainable solution for easy adaptation to your specific domains We also share interesting insights (and soon the dataset!)

thumb_up_off_alt291

chat_bubble_outline6

repeat46

shareShare

Raphaël Sourty

@raphaelsrty

a month ago

1B VLM dedicated to OCR. State of the art, cooked at LightOn Compatible with HF and VLLM. I have been amazed by the quality of the output on scientific papers, huge congrats staghado Baptiste Aubertin and the whole R&D team 🐐

thumb_up_off_alt20

chat_bubble_outline2

repeat4

shareShare

Axel Darmouni

@adarmouni

a month ago

SotA OCR model that is FRENCH and can process up to 493k pages per day on a H100!!

thumb_up_off_alt2

chat_bubble_outline0

repeat3

shareShare

Iacopo Poli

@iacopo_poli

a month ago

The recipe for a fast, performant OCR model: 1. tell Said that OCR is solved 2. let him rage about the state of OCR 3. get a few smart people in a GMeet with him 4. tell them there are GPUs available 5. wait a bit 6. enjoy🦉 Soon deployed in your favorite Enterprise environments

thumb_up_off_alt16

chat_bubble_outline1

repeat4

shareShare

Oskar Hallström

@oskar_hallstrom

a month ago

Last few days have been insane in the OCR land with releases from DeepSeek, PaddlePaddle and others. Now we at LightOn are entering the game with our latest release, pushing the state of the art even further. Kudos staghado Baptiste Aubertin Adrien Cavaillès 🥳

thumb_up_off_alt9

chat_bubble_outline0

repeat4

shareShare

Oskar Hallström

@oskar_hallstrom

a month ago

Shoutout to our Grand Retrieval Master and Model Whisperer Amélie Chatelain. I had so much FOMO for this talk that I decided to go to London myself to see it. See you there!!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Amélie Chatelain

@amelietabatta

20 days ago

Had an amazing time giving a talk on Retrieval in the Age of Agents at Weights & Biases's #FullyConnected2025! Feeling very grateful to have had this opportunity as well as fascinating discussions with the other attendees ❤️.

thumb_up_off_alt15

chat_bubble_outline2

repeat3

shareShare

Antoine Chaffin

@antoine_chaffin

15 days ago

This is now, booth 190, come get your encoder and late interaction party subscription! Edit: it’s at CIKM in Seoul in case you missed the context from my previous tweets 🥹

thumb_up_off_alt39

chat_bubble_outline4

repeat3

shareShare