This account is inactive. Please follow me on Mastodon at @[email protected] or Bluesky at @sbaack.com
ID: 45529639
http://sbaack.com/ 08-06-2009 09:02:23
19 Tweet
787 Followers
1,1K Following



Excellent report from Dr. Stefan Baack | @[email protected] Mozilla on Common Crawl, used to train many LLMs. Throwaway line for news publishers to ponder: "We will focus on the main crawl because the news crawl is rarely used by AI builders to train their LLMs (only once in our sample of 47 [models])."