
Marc Najork
@marc_najork
Research Engineering Director at Google
ID: 126132953
http://marc.najork.org 24-03-2010 23:05:44
110 Tweet
537 Followers
91 Following


Join us in #KDD2021 for the DI workshop (document-intelligence.github.io/DI-2021/) if you are interested in #DocumentAI, an intersection that spans NLP, CV, knowledge representation, and more. I will present on Glean's efforts on data-efficient generalization to different languages and doc types.


Joint work with Navneet Potti, Sandeep Tata, James B. Wendt, Marc Najork, and Jing Xie from Google AI -- also the workshop has a great line of speakers + will be on Sunday August 15th. :)







Our paper "Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models" (by Tao Chen Mingyang Zhang Jing Lu Michael Bendersky Marc Najork; To appear in ECIR, 2022) is now on arXiv: arxiv.org/abs/2201.10582

Congrats to the recipients of the Wikimedia Foundation Research Award of The Year!! 🎉"WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning" Srinivasan et al 🎉"Assessing the quality of sources in @Wikidata across languages: a hybrid approach" Amaral et al




I just discovered a "creative re-use" of our 1999 Mercator paper on scalable web crawling. It employs "tortured phrases" -- using semantic mapping services (e.g. EN->FR->EN machine translation) to evade plagiarism detection software. Read all about it at marcnajork.blogspot.com/2022/10/tortur…
