
Haoli Yin
@haoliyin
multimodal data curation @datologyai, 24/7 poaster
ID: 1550122414498988034
https://haoliyin.me 21-07-2022 14:16:17
804 Tweet
465 Followers
1,1K Following








We've improved our image-text curation significantly from our last blog post, now beating SigLIP2 through *data interventions alone* using vanilla CLIP. So proud of Ricardo Monti, Haoli Yin, Matthew Leavitt and the rest of the team! Check out the thread for all the details 👇


If you want to remain competitive, and ensure that your model improvements continue in the near and long term you MUST be investing in data curation. Very exciting to see these latest results from DatologyAI, which makes building better datasets suck far less.




Join DatologyAI if you have conviction on #4


Andrej Karpathy This is our exclusive focus DatologyAI. Data quality is the single most underinvested area of ML research relative to its impact. We've already been able to achieve 10x efficiency gains over open-source datasets, and I'm confident there's still another 100x because there's

