Yu Zhao (@yuzhaouoe) 's Twitter Profile
Yu Zhao

@yuzhaouoe

@EdinburghNLP NLP/ML | Opening the Black Box for Efficient Training/Inference

ID: 1511963648826097670

linkhttps://yuzhaouoe.github.io/ calendar_today07-04-2022 07:06:54

188 Tweet

351 Followers

593 Following

Rohit Saxena (@rohit_saxena) 's Twitter Profile Photo

Can multimodal LLMs truly understand research poster images?📊 🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 🪧 📂 Dataset: huggingface.co/datasets/rohit… 📜 Paper: arxiv.org/abs/2502.17540

Can multimodal LLMs truly understand research poster images?📊

🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 🪧

📂 Dataset: huggingface.co/datasets/rohit…
📜 Paper: arxiv.org/abs/2502.17540
Rohit Saxena (@rohit_saxena) 's Twitter Profile Photo

📣This work will appear at the ICLR 2025 Workshop on Reasoning and Planning for LLMs.🇸🇬 I'm currently on the job market, looking for research scientist roles. Feel free to reach out if you're hiring or know of any opportunities!

Tongyao Zhu @ ICLR 25 🇸🇬 (@tongyao_zhu) 's Twitter Profile Photo

🚀 Excited to share our new paper: SkyLadder: Better and Faster Pretraining via Context Window Scheduling! Have you ever noticed the ever-increasing⬆context window of pretrained language models? The first generation of GPT had a context length of 512, followed by 1024 for GPT2,

🚀 Excited to share our new paper: SkyLadder: Better and Faster Pretraining via Context Window Scheduling!

Have you ever noticed the ever-increasing⬆context window of pretrained language models? The first generation of GPT had a context length of 512, followed by 1024 for GPT2,
Piotr Miłoś (@piotrrmilos) 's Twitter Profile Photo

My good friend has an ongoing fight with cancer. A great father and husband for his family. An excellent co-author for me and many other ML folks. Please support and share! (link in the comment!)

My good friend has an ongoing fight with cancer.

A great father and husband for his family. An excellent co-author for me and many other ML folks.

Please support and share! (link in the comment!)
Pasquale Minervini is hiring postdocs! 🚀 (@pminervini) 's Twitter Profile Photo

My amazing collaborators will present several works at ICLR and NAACL later this month -- please catch up with them if you're attending! I tried to summarise our recent work in a blog post: neuralnoise.com/2025/march-res…

My amazing collaborators will present several works at ICLR and NAACL later this month -- please catch up with them if you're attending! I tried to summarise our recent work in a blog post: neuralnoise.com/2025/march-res…
Hongru Wang (@wangcarrey) 's Twitter Profile Photo

💥 We are so excited to introduce OTC-PO, the first RL framework for optimizing LLMs’ tool-use behavior in Tool-Integrated Reasoning. Arxiv: arxiv.org/pdf/2504.14870 Huggingface: huggingface.co/papers/2504.14… ⚙️ Simple, generalizable, plug-and-play (just a few lines of code) 🧠

Hongru Wang (@wangcarrey) 's Twitter Profile Photo

🎉 Thrilled to share our TWO #NAACL2025 oral papers! 👇 Welcome to catch me and talk about anything! 1️⃣ Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering 📅 30 Apr • 11:30–11:45 AM • Ballroom C TLDR: A general representation learning

🎉 Thrilled to share our TWO #NAACL2025 oral papers! 👇 Welcome to catch me and talk about anything!

1️⃣ Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
📅 30 Apr • 11:30–11:45 AM • Ballroom C
TLDR: A general representation learning
Ne Luo (seeking PhD opportunities) (@neluo19) 's Twitter Profile Photo

Hi! I will be attending #NAACL2025 and presenting our paper on self-training for tool-use today, an extended work of my MSc dissertation at EdinburghNLP, supervised by Pasquale Minervini is hiring postdocs! 🚀. Time: 14:00-15:30 Location: Hall 3 Let’s chat and connect!😊

Hi! I will be attending #NAACL2025 and presenting our paper on self-training for tool-use today, an extended work of my MSc dissertation at <a href="/EdinburghNLP/">EdinburghNLP</a>, supervised by <a href="/PMinervini/">Pasquale Minervini is hiring postdocs! 🚀</a>.

Time: 14:00-15:30
Location: Hall 3

Let’s chat and connect!😊
Aryo Pradipta Gema (@aryopg) 's Twitter Profile Photo

MMLU-Redux just touched down at #NAACL2025! 🎉 Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅 If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

MMLU-Redux just touched down at #NAACL2025! 🎉 
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅
If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋
Aidan Clark (@_aidan_clark_) 's Twitter Profile Photo

No LLM researcher should spent their whole life on one side of the pre/post training divide. The former teaches you what is actually happening, the latter reminds you what actually matters.

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs. 📊 13,331 examples across 5 tasks: – Visual RAG – Many-shot ICL – Needle-in-a-haystack – VL Summarization – Long-document VQA 📏 Lengths: 8 / 16 / 32 / 64 / 128K 🔍 Benchmarking both thoroughly & effectively!

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs.
📊 13,331 examples across 5 tasks:
– Visual RAG
– Many-shot ICL
– Needle-in-a-haystack
– VL Summarization
– Long-document VQA
📏 Lengths: 8 / 16 / 32 / 64 / 128K
🔍 Benchmarking both thoroughly &amp; effectively!
Zhaowei Wang (@zhaoweiwang4) 's Twitter Profile Photo

🚨 New paper! 🚨 Many recent LVLMs claim massive context windows, but can they handle long contexts on diverse downstream tasks? 🤔 💡In our new paper, we find that most models still fall short! We introduce MMLongBench, the first comprehensive benchmark for long-context VLMs:

🚨 New paper! 🚨
Many recent LVLMs claim massive context windows, but can they handle long contexts on diverse downstream tasks? 🤔
💡In our new paper, we find that most models still fall short!

We introduce MMLongBench, the first comprehensive benchmark for long-context VLMs:
Daniel Scalena (@daniel_sc4) 's Twitter Profile Photo

💡 We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs). Following SpARE (Yu Zhao Alessio Devoto), we propose ✨ contrastive SAE steering ✨ with mutual info to personalize literary MT by tuning latent features 4/

💡 We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs).

Following SpARE (<a href="/yuzhaouoe/">Yu Zhao</a> <a href="/devoto_alessio/">Alessio Devoto</a>), we propose ✨ contrastive SAE steering ✨ with mutual info to personalize literary MT by tuning latent features 4/
Anthropic (@anthropicai) 's Twitter Profile Photo

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.