Manuel Faysse (@manuelfaysse) 's Twitter Profile
Manuel Faysse

@manuelfaysse

NLP (LLMs) & ML Privacy - 🥐CroissantLLM & ColPali 👀 - PhD Candidate @CentraleSupelec Prev: @imperialcollege, @epfl

ID: 2220306764

linkhttps://manuelfay.github.io/ calendar_today28-11-2013 20:22:28

431 Tweet

1,1K Followers

345 Following

Antoine Chaffin (@antoine_chaffin) 's Twitter Profile Photo

Context matters and this is why late chunking is important As per usual, when something works great out-of-the-box, it indicates that you can squeeze some more gain by training for it!

Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

On some tasks (where I essentially need to reformulate and restructure long documents I attach) GPT-4.5 is awesome, O3 hallucinates like crazy... It's gonna be sad to see it go, clearly no other model fills the gap

Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

Reducing ColPali / ColQwen index size is super valuable in many use case, and I know many people who tried and couldn't beat the clustering technique from Ben Clavié. Clustering aware training helps ! Cool work Yubo Ma ! arxiv.org/abs/2506.04997

Reducing ColPali / ColQwen index size is super valuable in many use case, and I know many people who tried and couldn't beat the clustering technique from <a href="/bclavie/">Ben Clavié</a>.
Clustering aware training helps ! Cool work <a href="/mayubo2333/">Yubo Ma</a> !

arxiv.org/abs/2506.04997
Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

Amazing work on evals! Now, time for big techs to work on improving their VLMs for multi-page document understanding (instead of focusing on gaming Needle In A Haystack evals)