Mert Yuksekgonul (@mertyuksekgonul) 's Twitter Profile
Mert Yuksekgonul

@mertyuksekgonul

Computer Science PhD Candidate @Stanford @StanfordAILab

ID: 175309364

linkhttps://cs.stanford.edu/~merty calendar_today06-08-2010 07:11:25

2,2K Tweet

4,4K Followers

763 Following

Irena Gao (@irena_gao) 's Twitter Profile Photo

Many providers offer inference APIs for the same models: for example, there were over nine Llama-3 8B APIs in Summer 2024. Do all of these APIs serve the same completion distribution as the original model? In our new paper, ✨Model Equality Testing: Which Model is This API

Many providers offer inference APIs for the same models: for example, there were over nine Llama-3 8B APIs in Summer 2024. Do all of these APIs serve the same completion distribution as the original model?

In our new paper, ✨Model Equality Testing: Which Model is This API
Weixin Liang (@liang_weixin) 's Twitter Profile Photo

How can we reduce pretraining costs for multi-modal models without sacrificing quality? We study this Q in our new work: arxiv.org/abs/2411.04996 At AI at Meta, We introduce Mixture-of-Transformers (MoT), a sparse architecture with modality-aware sparsity for every non-embedding

How can we reduce pretraining costs for multi-modal models without sacrificing quality? We study this Q in our new work: arxiv.org/abs/2411.04996

At <a href="/AIatMeta/">AI at Meta</a>, We introduce Mixture-of-Transformers (MoT), a sparse architecture with modality-aware sparsity for every non-embedding
Fatih Dinc (@fatihdin4en) 's Twitter Profile Photo

I am defending my thesis next week, if you are around Stanford, please feel free to join! Here is the abstract and details! I will talk about several unpublished works on latent circuits subserving neural manifolds, how to train RNNs with millions of parameters on laptops with

I am defending my thesis next week, if you are around Stanford, please feel free to join! Here is the abstract and details!

I will talk about several unpublished works on latent circuits subserving neural manifolds, how to train RNNs with millions of parameters on laptops with
Mackenzie Mathis, PhD (@trackingactions) 's Twitter Profile Photo

I’m happy to support an AI Fellow at EPFL in my group! ⬇️ Aside from our applied #AI4Science work, we are excited to push on fundamental problems in #ML Here is some recent work: arxiv.org/abs/2410.10744 & sslneurips23.github.io/paper_pdfs/pap…

Teddi Worledge (@teddiworledge) 's Twitter Profile Photo

🧵LLMs are great at synthesizing info, but unreliable at citing sources. Search engines are the opposite. What lies between them? Our new paper runs human evals on 7 systems across the✨extractive-abstractive spectrum✨for utility, citation quality, time-to-verify, & fluency!

🧵LLMs are great at synthesizing info, but unreliable at citing sources. Search engines are the opposite. What lies between them?

Our new paper runs human evals on 7 systems across the✨extractive-abstractive spectrum✨for utility, citation quality, time-to-verify, &amp; fluency!
Luke Bailey (@lukebailey181) 's Twitter Profile Photo

Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets us attack latent-space defenses, from SAEs and probes to Circuit Breakers. We can attack so precisely that we make a harmfulness probe output this QR code. 🧵

Wanjia Zhao (@wanjiazhao1203) 's Twitter Profile Photo

Introducing #SIRIUS🌟: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. 📜Preprint: arxiv.org/pdf/2502.04780 💻code: github.com/zou-group/siri… 1/N

Introducing #SIRIUS🌟: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. 
📜Preprint: arxiv.org/pdf/2502.04780
💻code: github.com/zou-group/siri…
1/N
Barışcan KURTKAYA (@bariskurtkaya) 's Twitter Profile Photo

Our preprint is out!🚀 We explore attractor mechanisms that subserve short-term memory by training 35,000+ recurrent neural networks! Most importantly, we present a phase diagram that reveals how learning rate & delay length shape attractor dynamics. 👉arxiv.org/abs/2502.17433

Our preprint is out!🚀 We explore attractor mechanisms that subserve short-term memory by training 35,000+ recurrent neural networks! Most importantly, we present a phase diagram that reveals how learning rate &amp; delay length shape attractor dynamics.

👉arxiv.org/abs/2502.17433
James Zou (@james_y_zou) 's Twitter Profile Photo

⚡️Really thrilled that #textgrad is published in @nature today!⚡️ We present a general method for genAI to self-improve via our new *calculus of text*. We show how this optimizes agents🤖, molecules🧬, code🖥️, treatments💊, non-differentiable systems🤯 + more!

⚡️Really thrilled that #textgrad is published in @nature today!⚡️

We present a general method for genAI to self-improve via our new *calculus of text*.

We show how this optimizes agents🤖, molecules🧬, code🖥️, treatments💊, non-differentiable systems🤯 + more!
Karan Dalal (@karansdalal) 's Twitter Profile Photo

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by

James Zou (@james_y_zou) 's Twitter Profile Photo

Can LLMs learn to reason better by "cheating"?🤯 Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems 🎯Claude3.5 23% ➡️ 50% AIME 2024 🎯GPT4o 10% ➡️ 99% on Game of 24 Great job Mirac Suzgun w/ awesome

Can LLMs learn to reason better by "cheating"?🤯

Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems
🎯Claude3.5 23% ➡️ 50% AIME 2024
🎯GPT4o 10% ➡️ 99% on Game of 24

Great job <a href="/suzgunmirac/">Mirac Suzgun</a> w/ awesome
Fatih Dinc (@fatihdin4en) 's Twitter Profile Photo

As we say in Turkish "Yasasin 23 Nisan!" BTF is now officially accepting funding applications for summer research internships: bridgetoturkiye.org/our-work/schol… The program requires you to apply with a mentor, who can be a PhD or a postdoc in a US institution. Good luck!

Mehmet Hamza Erol (@mhamzaerol) 's Twitter Profile Photo

How much does a correct answer from an LM cost? How much has AI lowered the cost of solving problems? Meet Cost‑of‑Pass: An Economic Framework for Evaluating LMs! Cost‑of‑Pass = expected $ for one correct answer. Frontier Cost‑of‑Pass = cheapest route: an LM or a human expert.

How much does a correct answer from an LM cost?
How much has AI lowered the cost of solving problems?

Meet Cost‑of‑Pass: An Economic Framework for Evaluating LMs!

Cost‑of‑Pass = expected $ for one correct answer.
Frontier Cost‑of‑Pass = cheapest route: an LM or a human expert.
Mert Yuksekgonul (@mertyuksekgonul) 's Twitter Profile Photo

Hamza has been working to make progress in AI and reasoning feel more measurable and grounded. He’s also genuinely enjoyable to work with. Check out his work and give him a follow (faculty friends, he’ll be applying to PhD programs next year 👀)

Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

🇸🇬 If you're at ICLR and interested in model compression and conditional computation, go chat with Roberto and Jerry!! In our paper led by Roberto, we show how to convert any dense, pretrained linear layer into an MoE-like layer with dynamic sparsity!!

Yiğit Korkmaz (@yigitkkorkmaz) 's Twitter Profile Photo

We see increasingly capable robot policies everyday. Yet during execution, they often act reasonably but fail to complete tasks, e.g. due to novel scenes or objects. Wouldn't it be nice if we provide a handful of interventions to the robot policies and they could learn from them?

James Zou (@james_y_zou) 's Twitter Profile Photo

💸We expand the economic framework of cost-of-production to quantify the benefits of different LLMs arxiv.org/pdf/2504.13359 Llama3 8B + o1-mini really stand out as milestone jumps in efficiency and capability resp! Great job Mehmet Hamza Erol Mert Yuksekgonul Batu El Mirac Suzgun👏

💸We expand the economic framework of cost-of-production to quantify the benefits of different LLMs arxiv.org/pdf/2504.13359

Llama3 8B + o1-mini really stand out as milestone jumps in efficiency and capability resp!

Great job <a href="/mhamzaerol/">Mehmet Hamza Erol</a> <a href="/mertyuksekgonul/">Mert Yuksekgonul</a> <a href="/elb4tu/">Batu El</a>
<a href="/suzgunmirac/">Mirac Suzgun</a>👏