Ivan Vulić (@licwu) Twitter Tweets • TwiCopy

Edoardo Ponti

2 years ago

We scaled sparse fine-tuning (SFT) to LLMs (such as Llama 2) by making it both parameter- and memory-efficient! (q)SFT instruction tuning performance is often better than (q)LoRA with comparable speed and memory load. Paper: arxiv.org/abs/2401.16405 Code:

thumb_up_off_alt236

chat_bubble_outline1

repeat68

shareShare

Ivan Vulić

@licwu

2 years ago

Think globally, act locally? Well, we were thought-experimenting whether LLMs would understand people from different places around our hometowns better than we ever might... And then we have eventually decided to make an actual (non-thought) experiment out of these thoughts! 👇👇

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Edoardo Ponti

@pontiedoardo

2 years ago

I am still looking for PhD students starting in September 2024! The deadline to apply for the CDT in NLP is the 11th of March. If you wish to do research in modular and efficient LLMs, here are some highlights of my lab's research from the past year ⬇️🧵

thumb_up_off_alt149

chat_bubble_outline10

repeat51

shareShare

Sebastian Ruder

@seb_ruder

2 years ago

🚨 A belated update: Our survey on "Modular Deep Learning" has been published in TMLR. Check out the updated version: openreview.net/forum?id=z9EkX…

thumb_up_off_alt127

chat_bubble_outline2

repeat15

shareShare

Ivan Vulić

@licwu

a year ago

If we align LLMs through preferences, perhaps we should also evaluate them the same way (and respect transitivity)? The answer is: yes, we should. The trick, however, is how to make evaluation tractable. If you are into the whole "LLM-as-Judges" line of work, check this paper!

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Neil Houlsby

@neilhoulsby

a year ago

Adapters are just a great way to share/benefit from new capabilities without handing around the kitchen sink. Congrats to the AdapterHub folks for adding support for quantized training (Q-LoRA and friends).

thumb_up_off_alt23

chat_bubble_outline0

repeat5

shareShare

Benjamin Minixhofer

@bminixhofer

a year ago

Introducing Zero-Shot Tokenizer Transfer (ZeTT) ⚡ ZeTT frees language models from their tokenizer, allowing you to use any model with any tokenizer, with little or no extra training. Super excited to (finally!) share the first project of my PhD🧵

thumb_up_off_alt729

chat_bubble_outline31

repeat145

shareShare

Chengzu Li

@li_chengzu

a year ago

Excited to introduce TopViewRS: VLMs as Top-View Spatial Reasoners🤖 TopViewRS assess VLMs’ spatial reasoning in top-view scenarios🏠just like how you read maps🗺️ Spoiler🫢GPT4V and Gemini are neck-and-neck, each excelling in different setups but neither even close to us humans

thumb_up_off_alt21

chat_bubble_outline2

repeat10

shareShare

Han Zhou

@hanzhou032

a year ago

Which output is better? [A] or [B]? LLM🤖: B❌ [B] or [A]? LLM🤖: A✅ Thrilled to share our preprint in addressing preference biases in LLM judgments!🧑‍⚖️We introduce ZEPO, a 0-shot prompt optimizer that enhances your LLM evaluators via fairness⚖️ 📰Paper: arxiv.org/abs/2406.11370

thumb_up_off_alt97

chat_bubble_outline3

repeat22

shareShare

Ivan Vulić

@licwu

a year ago

As someone who spent years working in multilingual NLP, I am so happy that we're finally seeing (L)LMs and (N)MT systems working in tandem towards the shared cause. The idea in this work is so simple & sweet, and yet it moves! 🌍🌏🌎

thumb_up_off_alt38

chat_bubble_outline0

repeat1

shareShare

Markus Frohmann

@frohmannm

a year ago

Introducing 🪓Segment any Text! 🪓 A new state-of-the-art sentence segmentation tool! Compared to existing tools (and strong LLMs!), our models are far more: 1. efficient ⚡ 2. performant 🔝 3. robust 🚀 4. adaptable 🎯 5. multilingual 🗺

thumb_up_off_alt180

chat_bubble_outline2

repeat26

shareShare

Hannah

@h_sterz

a year ago

Do you DARE? Introducing a multiple-choice VQA benchmark ✨DARE✨ with: - 4 main robustness evaluation ⛓️ - 5 diverse categories 🧩 - Extensive analysis of 4 widely used VLMS 🤖

thumb_up_off_alt15

chat_bubble_outline1

repeat7

shareShare

River Yijiang Dong

@river_dong121

10 months ago

Thrilled to share our updated paper: "UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models" We propose a new robust LLM unlearning method via Self-Distillation on Adjusted Logits (UNDIAL). 📄 Paper: arxiv.org/pdf/2402.10052

thumb_up_off_alt6

chat_bubble_outline6

repeat5

shareShare

Fabian David Schmidt

@fdschmidt

9 months ago

📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at Mila - Institut québécois d'IA with David Ifeoluwa Adelani 🇳🇬 Goran Glavaš Ivan Vulić Datasets: huggingface.co/datasets/WueNL… huggingface.co/datasets/WueNL… Details to follow👇

thumb_up_off_alt37

chat_bubble_outline3

repeat18

shareShare

Ivan Vulić

@licwu

5 months ago

We've got plenty of exciting ideas flying around, so consider applying to carve them further with us!

thumb_up_off_alt17

chat_bubble_outline1

repeat1

shareShare

Benjamin Minixhofer

@bminixhofer

5 months ago

We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*! With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧵

thumb_up_off_alt90

chat_bubble_outline2

repeat28

shareShare

Yi Xu

@_yixu

4 months ago

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat207

shareShare

Benjamin Minixhofer

@bminixhofer

3 months ago

We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper. This enables creating byte-level models at a fraction of the cost of what was needed previously. As a proof-of-concept, we created byte-level Gemma2 and Llama3 models. 🧵

$We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper. This enables creating byte-level models at a fraction of the cost of what was needed previously. As a proof-of-concept, we created byte-level Gemma2 and Llama3 models. 🧵$

thumb_up_off_alt59

chat_bubble_outline1

repeat14

shareShare

Han Zhou

@hanzhou032

3 months ago

Automating Multi-Agent Design: 🧩Multi-agent systems aren’t just about throwing more LLM agents together. 🛠️They require mastering the subtle art of prompting and agent orchestration. Introducing MASS🚀- Our new agent optimization framework for better prompts and topologies!

thumb_up_off_alt731

chat_bubble_outline12

repeat167

shareShare

Lucas Caccia

@lucaspcaccia

2 months ago

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

thumb_up_off_alt39

chat_bubble_outline1

repeat13

shareShare