
William Hockey
@williamhockey
@column ceo, @plaid co-founder, board @scale_ai
defender of the dollar and dreaming of the eastern sierras
ID: 15385273
http://wi.lli.am 11-07-2008 00:34:47
589 Tweet
7,7K Followers
85 Following



Academic benchmarks are losing their potency. Moving forward, there’re 3 types of LLM evaluations that matter: 1. Privately held test set but publicly reported scores, by a trusted 3rd party who doesn’t have their own LLM to promote. Scale AI’s latest GSM1k is a great example.













Plaid has had a truly insane 5 years: - Acquired by Visa - Just kidding - Raise big $$ at big valuation - Expand into new businesses - Then fintech bubble bursts - Raises at lower valuation... but now with a more robust + diversified core business New ACQ2 with Zachary Perret:





