Yu Zhao (@yuzhaouoe) 's Twitter Profile
Yu Zhao

@yuzhaouoe

@EdinburghNLP NLP/ML | Opening the Black Box for Efficient Training/Inference

ID: 1511963648826097670

linkhttps://yuzhaouoe.github.io/ calendar_today07-04-2022 07:06:54

188 Tweet

351 Followers

593 Following

Rohit Saxena (@rohit_saxena) 's Twitter Profile Photo

Can multimodal LLMs truly understand research poster images?๐Ÿ“Š ๐Ÿš€ We introduce PosterSumโ€”a new multimodal benchmark for scientific poster summarization! ๐Ÿชง ๐Ÿ“‚ Dataset: huggingface.co/datasets/rohitโ€ฆ ๐Ÿ“œ Paper: arxiv.org/abs/2502.17540

Can multimodal LLMs truly understand research poster images?๐Ÿ“Š

๐Ÿš€ We introduce PosterSumโ€”a new multimodal benchmark for scientific poster summarization! ๐Ÿชง

๐Ÿ“‚ Dataset: huggingface.co/datasets/rohitโ€ฆ
๐Ÿ“œ Paper: arxiv.org/abs/2502.17540
Rohit Saxena (@rohit_saxena) 's Twitter Profile Photo

๐Ÿ“ฃThis work will appear at the ICLR 2025 Workshop on Reasoning and Planning for LLMs.๐Ÿ‡ธ๐Ÿ‡ฌ I'm currently on the job market, looking for research scientist roles. Feel free to reach out if you're hiring or know of any opportunities!

Tongyao Zhu @ ICLR 25 ๐Ÿ‡ธ๐Ÿ‡ฌ (@tongyao_zhu) 's Twitter Profile Photo

๐Ÿš€ Excited to share our new paper: SkyLadder: Better and Faster Pretraining via Context Window Scheduling! Have you ever noticed the ever-increasingโฌ†context window of pretrained language models? The first generation of GPT had a context length of 512, followed by 1024 for GPT2,

๐Ÿš€ Excited to share our new paper: SkyLadder: Better and Faster Pretraining via Context Window Scheduling!

Have you ever noticed the ever-increasingโฌ†context window of pretrained language models? The first generation of GPT had a context length of 512, followed by 1024 for GPT2,
Mohd Sanad Zaki Rizvi (@sanad_maker) 's Twitter Profile Photo

๐Ÿš€ New ArXiv paper alert! By combining agentic frameworks (ReAct) with smart decoders (DeCoRe, DoLa, CAD), we boost factual accuracy in complex reasoning tasks โ€”reducing those annoying hallucinations! ๐Ÿ”ฅ ๐Ÿ”— Paper: arxiv.org/abs/2503.23415 1\n

Piotr Miล‚oล› (@piotrrmilos) 's Twitter Profile Photo

My good friend has an ongoing fight with cancer. A great father and husband for his family. An excellent co-author for me and many other ML folks. Please support and share! (link in the comment!)

My good friend has an ongoing fight with cancer.

A great father and husband for his family. An excellent co-author for me and many other ML folks.

Please support and share! (link in the comment!)
Pasquale Minervini is hiring postdocs! ๐Ÿš€ (@pminervini) 's Twitter Profile Photo

My amazing collaborators will present several works at ICLR and NAACL later this month -- please catch up with them if you're attending! I tried to summarise our recent work in a blog post: neuralnoise.com/2025/march-resโ€ฆ

My amazing collaborators will present several works at ICLR and NAACL later this month -- please catch up with them if you're attending! I tried to summarise our recent work in a blog post: neuralnoise.com/2025/march-resโ€ฆ
Hongru Wang (@wangcarrey) 's Twitter Profile Photo

๐Ÿ’ฅ We are so excited to introduce OTC-PO, the first RL framework for optimizing LLMsโ€™ tool-use behavior in Tool-Integrated Reasoning. Arxiv: arxiv.org/pdf/2504.14870 Huggingface: huggingface.co/papers/2504.14โ€ฆ โš™๏ธ Simple, generalizable, plug-and-play (just a few lines of code) ๐Ÿง 

Hongru Wang (@wangcarrey) 's Twitter Profile Photo

๐ŸŽ‰ Thrilled to share our TWO #NAACL2025 oral papers! ๐Ÿ‘‡ Welcome to catch me and talk about anything! 1๏ธโƒฃ Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering ๐Ÿ“… 30 Apr โ€ข 11:30โ€“11:45 AM โ€ข Ballroom C TLDR: A general representation learning

๐ŸŽ‰ Thrilled to share our TWO #NAACL2025 oral papers! ๐Ÿ‘‡ Welcome to catch me and talk about anything!

1๏ธโƒฃ Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
๐Ÿ“… 30 Apr โ€ข 11:30โ€“11:45 AM โ€ข Ballroom C
TLDR: A general representation learning
Ne Luo (seeking PhD opportunities) (@neluo19) 's Twitter Profile Photo

Hi! I will be attending #NAACL2025 and presenting our paper on self-training for tool-use today, an extended work of my MSc dissertation at EdinburghNLP, supervised by Pasquale Minervini is hiring postdocs! ๐Ÿš€. Time: 14:00-15:30 Location: Hall 3 Letโ€™s chat and connect!๐Ÿ˜Š

Hi! I will be attending #NAACL2025 and presenting our paper on self-training for tool-use today, an extended work of my MSc dissertation at <a href="/EdinburghNLP/">EdinburghNLP</a>, supervised by <a href="/PMinervini/">Pasquale Minervini is hiring postdocs! ๐Ÿš€</a>.

Time: 14:00-15:30
Location: Hall 3

Letโ€™s chat and connect!๐Ÿ˜Š
Aryo Pradipta Gema (@aryopg) 's Twitter Profile Photo

MMLU-Redux just touched down at #NAACL2025! ๐ŸŽ‰ Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope ๐Ÿ˜… If anyone's swinging by, give our research some love! Hit me up if you check it out! ๐Ÿ‘‹

MMLU-Redux just touched down at #NAACL2025! ๐ŸŽ‰ 
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope ๐Ÿ˜…
If anyone's swinging by, give our research some love! Hit me up if you check it out! ๐Ÿ‘‹
Aidan Clark (@_aidan_clark_) 's Twitter Profile Photo

No LLM researcher should spent their whole life on one side of the pre/post training divide. The former teaches you what is actually happening, the latter reminds you what actually matters.

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

๐Ÿš€ We release MMLongBench: Benchmark for evaluating long-context VLMs. ๐Ÿ“Š 13,331 examples across 5 tasks: โ€“ Visual RAG โ€“ Many-shot ICL โ€“ Needle-in-a-haystack โ€“ VL Summarization โ€“ Long-document VQA ๐Ÿ“ Lengths: 8 / 16 / 32 / 64 / 128K ๐Ÿ” Benchmarking both thoroughly & effectively!

๐Ÿš€ We release MMLongBench: Benchmark for evaluating long-context VLMs.
๐Ÿ“Š 13,331 examples across 5 tasks:
โ€“ Visual RAG
โ€“ Many-shot ICL
โ€“ Needle-in-a-haystack
โ€“ VL Summarization
โ€“ Long-document VQA
๐Ÿ“ Lengths: 8 / 16 / 32 / 64 / 128K
๐Ÿ” Benchmarking both thoroughly &amp; effectively!
Zhaowei Wang (@zhaoweiwang4) 's Twitter Profile Photo

๐Ÿšจ New paper! ๐Ÿšจ Many recent LVLMs claim massive context windows, but can they handle long contexts on diverse downstream tasks? ๐Ÿค” ๐Ÿ’กIn our new paper, we find that most models still fall short! We introduce MMLongBench, the first comprehensive benchmark for long-context VLMs:

๐Ÿšจ New paper! ๐Ÿšจ
Many recent LVLMs claim massive context windows, but can they handle long contexts on diverse downstream tasks? ๐Ÿค”
๐Ÿ’กIn our new paper, we find that most models still fall short!

We introduce MMLongBench, the first comprehensive benchmark for long-context VLMs:
Daniel Scalena (@daniel_sc4) 's Twitter Profile Photo

๐Ÿ’ก We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs). Following SpARE (Yu Zhao Alessio Devoto), we propose โœจ contrastive SAE steering โœจ with mutual info to personalize literary MT by tuning latent features 4/

๐Ÿ’ก We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs).

Following SpARE (<a href="/yuzhaouoe/">Yu Zhao</a> <a href="/devoto_alessio/">Alessio Devoto</a>), we propose โœจ contrastive SAE steering โœจ with mutual info to personalize literary MT by tuning latent features 4/
Anthropic (@anthropicai) 's Twitter Profile Photo

Our interpretability team recently released research that traced the thoughts of a large language model. Now weโ€™re open-sourcing the method. Researchers can generate โ€œattribution graphsโ€ like those in our study, and explore them interactively.