Xiaozhe Yao (@xiaozheyao) 's Twitter Profile
Xiaozhe Yao

@xiaozheyao

Doctoral student in Computer Science @ETH_en.
love, passion and devotion

ID: 899883061596127232

linkhttps://about.yao.sh calendar_today22-08-2017 06:36:55

135 Tweet

224 Followers

871 Following

Xiaozhe Yao (@xiaozheyao) 's Twitter Profile Photo

When fine-tune LLMs, do you usually use LoRA (or it's variants) or full fine-tuning? Why? (I have read some reports, just curious)

Tim Dettmers (@tim_dettmers) 's Twitter Profile Photo

Just to clarify this benchmark. This is an apple to oranges comparison. - Cerebras is fast for batch size 1 but slow for batch size n. - GPUs are slow for batch size 1 but fast for batch size n. I get >800 tok/s on 8x H100 for a 405B model for batch size=n. Cerebras' system

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

The question that a reviewer should ask themselves is: Does this paper take a gradient step in a promising direction? Is the community better off with this paper published? If the answer is yes, then the recommendation should be to accept.

Berivan Isik (@berivanisik) 's Twitter Profile Photo

I’ll be hosting an intern Google AI in 2025 to work on the value of data for LLMs. If you’re interested, please email me your CV and a brief summary of your background. I won’t be checking DMs.

Anne Ouyang (@anneouyang) 's Twitter Profile Photo

Kernels are the kernel of deep learning. 🙃...but writing kernels sucks. Can LLMs help? 🤔 Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.

Kernels are the kernel of deep learning.
🙃...but writing kernels sucks.
Can LLMs help? 🤔

Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.
Pika (@pika_labs) 's Twitter Profile Photo

A giant pre-holiday gift from the Pika Team: We’re giving EVERYONE free, unlimited access to Pika 2.0. From today until December 22nd, anyone on any plan can generate as many videos as they want, using all the Scene Ingredients they want. It’s a 4-day Free-For-All, so get it

Maximilian Böther (@maxiboether) 's Twitter Profile Photo

📊 Are you training LLMs and manage your training data via a DFS? Do you spend a lot of time writing data wrangling/mixing scripts? ⌛ We just posted a preprint on Mixtera, our data plane for LLM/VLM training🎉 🔗 github.com/eth-easl/mixte… 🔗 arxiv.org/abs/2502.19790 Read more👇

Ana Klimovic (@anaklimovic) 's Twitter Profile Photo

Very excited to host Jeff Dean at ETH Zürich Zurich for a Distinguished Colloquium and research discussions next Monday! ETH CS Department Info about his talk "Important Trends in AI: How Did We Get Here, What Can We Do Now and How Can We Shape AI’s Future?": inf.ethz.ch/news-and-event…

Gautam Kamath (@thegautamkamath) 's Twitter Profile Photo

System is so broken: - researchers write papers no one reads - reviewers don't have time to review, shamed to coauthors, use LLMs instead of reading - authors try to fool said LLMs with prompt injection - evaling researchers based on # of papers (no time to read) Dystopic.