vinh q. tran (@vqctran) 's Twitter Profile
vinh q. tran

@vqctran

research scientist @GoogleDeepMind, all thoughts my own, he/him

ID: 974097637564665856

linkhttp://vqtran.github.io calendar_today15-03-2018 01:39:09

129 Tweet

1,1K Followers

322 Following

vinh q. tran (@vqctran) 's Twitter Profile Photo

idk who needs to hear this but span corruption != bidirectional attention, you could have one, both, or neither. addendum: ul2 could be implemented completely with a casual decoder!!!!

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting News from Chatbot Arena! Google DeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

Exciting News from Chatbot Arena!

<a href="/GoogleDeepMind/">Google DeepMind</a>'s new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.

For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive
Hritik Bansal (@hbxnov) 's Twitter Profile Photo

New paper📢 LLM folks have been supervised finetuning their models with data from large and expensive models (e.g., Gemini Pro). However, we achieve better perf. by finetuning on the samples from the smaller and weaker LLMs (e.g., Flash)! w/Mehran Kazemi Arian Hosseini Rishabh Agarwal vinh q. tran

New paper📢 LLM folks have been supervised finetuning their models with data from large and expensive models (e.g., Gemini Pro).
However, we achieve better perf. by finetuning on the samples from the smaller and weaker LLMs (e.g., Flash)!
w/<a href="/kazemi_sm/">Mehran Kazemi</a> <a href="/arianTBD/">Arian Hosseini</a> <a href="/agarwl_/">Rishabh Agarwal</a> <a href="/vqctran/">vinh q. tran</a>
vinh q. tran (@vqctran) 's Twitter Profile Photo

just like many lessons in LLMs past, take the compute-match comparisons seriously and you will prosper check out Hritik's excellent internship work!

Ibrahim Alabdulmohsin | إبراهيم العبدالمحسن (@ibomohsin) 's Twitter Profile Photo

Have you wondered why next-token prediction can be such a powerful training objective? Come visit our poster to talk about language and fractals and how to predict downstream performance in LLMs better. Poster #3105, Fri 13 Dec 4:30-7:30pm x.com/ibomohsin/stat… See you there!

foam shazeer (@foamshazeer) 's Twitter Profile Photo

they don't realize it yet but legacy brain and legacy deepmind are secretly the yin and yang of AI research -- forced together in GDM they are differing, chasing, yet necessary to push Gemini beyond the frontier

rohan anil (@_arohan_) 's Twitter Profile Photo

Prediction: People say pretraining will end, and I think everyone will be surprised how many multipliers we can squeeze from existing data through all kinds of algorithms.

vinh q. tran (@vqctran) 's Twitter Profile Photo

Take BIG-Bench Hard but make it EVEN HARDER!! Check out this cool new benchmark that really shows how much further our models still have to go on general reasoning!

vinh q. tran (@vqctran) 's Twitter Profile Photo

even deeper BTS: semantic ids in DSI were thought of and implemented almost completely from the hospital since I was in poor health back then (kidney failure, all better now!) -- sometimes doing interesting research is a great distraction from other more difficult things going on

vinh q. tran (@vqctran) 's Twitter Profile Photo

Congrats to everyone on this remarkable milestone! Incredible to see the outrageous power of general models and training methods.