Jesujoba Alabi (@alabi_jesujoba) 's Twitter Profile
Jesujoba Alabi

@alabi_jesujoba

PhD Student @LstSaar & @SIC_Saar, doing natural language processing #NLProc | prev @InriaParisNLP | @UniIbadan @bowenuniversity alumnus | Ọmọ Jesu |Ọmọ Ogbomọṣọ

ID: 3300296825

linkhttps://ajesujoba.github.io/ calendar_today27-05-2015 10:15:02

226 Tweet

290 Followers

779 Following

Hosein Mohebbi (@hmohebbi75) 's Twitter Profile Photo

Is it possible to *disentangle* the internal representations of deep learning models? To separate the hidden states of recent spoken language models into one vector specialized for transcription to text, and another that preserves task-relevant acoustic features? #NLProc 🧵(1/9)

Is it possible to *disentangle* the internal representations of deep learning models? To separate the hidden states of recent spoken language models into one vector specialized for transcription to text, and another that preserves task-relevant acoustic features? #NLProc 
🧵(1/9)
David Ifeoluwa Adelani 🇳🇬 (@davlanade) 's Twitter Profile Photo

Join my lab! I’m currently recruiting new students (MSc & PhD) for admission in the fall of 2025 at Mila - Institut québécois d'IA mila.quebec/en/prospective… Are you interested in multilingual NLP? I would encourage you to apply. Deadline: December 1

Lanfrica (@lanfrica) 's Twitter Profile Photo

Half a decade ago, finding a public Igbo speech-text dataset was impossible, and resources for Yoruba and Hausa were relatively few. Today, we’re changing that narrative! We proudly present the largest African speech dataset for Nigerian languages, encompassing 1,800 hours and

Isabelle Augenstein (@iaugenstein) 's Twitter Profile Photo

📜Excited to share our comprehensive survey on cultural awareness in #LLMs! 🗺️ We reviewed 300+ papers across diverse modalities (language, vision-language, etc.) Siddhesh Pawar Junyeong Park @ NAACL 2025✈️ Jiho Jin Arnav Arora Junho Myung Inhwa Alice Oh #NLProc openreview.net/forum?id=3gg6G…

📜Excited to share our comprehensive survey on cultural awareness in #LLMs! 🗺️
We reviewed 300+ papers across diverse modalities (language, vision-language, etc.) 
<a href="/whoSiddheshp/">Siddhesh Pawar</a> <a href="/jjjunyeong/">Junyeong Park @ NAACL 2025✈️</a> <a href="/jin__jiho/">Jiho Jin</a> <a href="/rnav_arora/">Arnav Arora</a> <a href="/JunhoMyung_/">Junho Myung</a> <a href="/_inhwa_song/">Inhwa</a> <a href="/aliceoh/">Alice Oh</a> #NLProc
 openreview.net/forum?id=3gg6G…
Mor Geva (@megamor2) 's Twitter Profile Photo

Excited to attend EMNLP 2025 in Miami next week 🤩 DM me if you'd like to grab a coffee and chat about interpretability, knowledge, or reasoning in LLMs! Our group/collabs will be presenting a bunch of cool works, come check them out! 🧵

Excited to attend <a href="/emnlpmeeting/">EMNLP 2025</a> in Miami next week 🤩 DM me if you'd like to grab a coffee and chat about interpretability, knowledge, or reasoning in LLMs!

Our group/collabs will be presenting a bunch of cool works, come check them out! 🧵
Michael A. Hedderich (@michedderich) 's Twitter Profile Photo

Still 14 days to apply for the fully-funded PhD positions at the Munich Center for Machine Learning The call covers many areas of ML and the positions will be with PIs and groups at Universität München and TU München in Germany, incl. my group on human-centric NLP and AI. mcml.ai/opportunities/…

Zhaofeng Wu @ ICLR (@zhaofeng_wu) 's Twitter Profile Photo

💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, & audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986

💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, &amp; audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986
John Hewitt (@johnhewtt) 's Twitter Profile Photo

I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding

Lacuna Fund (@lacunafund) 's Twitter Profile Photo

We have very exciting news! There are 18 new #machinelearning datasets out now, built by our amazing grantees working across multiple domains. Take a look! #OurVoiceinData lacunafund.org/lacuna-fund-re… French/Spanish is available on the site. #Ag #Health #Climate #NLP #ML #AI

We have very exciting news! There are 18 new #machinelearning datasets out now, built by our amazing grantees working across multiple domains. Take a look! #OurVoiceinData

lacunafund.org/lacuna-fund-re…

French/Spanish is available on the site.

#Ag #Health #Climate #NLP #ML #AI
omer goldman (@omernlp) 's Twitter Profile Photo

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩

But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯
Benjamin Minixhofer (@bminixhofer) 's Twitter Profile Photo

We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*! With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧵

We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!

With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧵
May Fung (@may_f1_) 's Twitter Profile Photo

🚀 Data to pre-train LLMs on are reaching critical bottleneck. 𝘿𝙤𝙚𝙨 𝙢𝙤𝙙𝙚𝙡-𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙚𝙙 𝙨𝙮𝙣𝙩𝙝𝙚𝙩𝙞𝙘 𝙙𝙖𝙩𝙖 𝙬𝙤𝙧𝙠 𝙨𝙞𝙢𝙞𝙡𝙖𝙧𝙡𝙮 𝙬𝙚𝙡𝙡 𝙛𝙤𝙧 𝙨𝙘𝙖𝙡𝙞𝙣𝙜 𝙥𝙧𝙚-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙛𝙪𝙧𝙩𝙝𝙚𝙧? Let's dive into the "𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗮𝘄𝘀 𝗼𝗳

🚀 Data to pre-train LLMs on are reaching critical bottleneck. 𝘿𝙤𝙚𝙨 𝙢𝙤𝙙𝙚𝙡-𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙚𝙙 𝙨𝙮𝙣𝙩𝙝𝙚𝙩𝙞𝙘 𝙙𝙖𝙩𝙖 𝙬𝙤𝙧𝙠 𝙨𝙞𝙢𝙞𝙡𝙖𝙧𝙡𝙮 𝙬𝙚𝙡𝙡 𝙛𝙤𝙧 𝙨𝙘𝙖𝙡𝙞𝙣𝙜 𝙥𝙧𝙚-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙛𝙪𝙧𝙩𝙝𝙚𝙧? Let's dive into the "𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗮𝘄𝘀 𝗼𝗳
Yanai Elazar (@yanaiela) 's Twitter Profile Photo

💡 New ICLR paper! 💡 "On Linear Representations and Pretraining Data Frequency in Language Models": We provide an explanation for when & why linear representations form in large (or small) language models. Led by Jack Merullo , w/ Noah A. Smith & Sarah Wiegreffe

💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when &amp; why linear representations form in large (or small) language models.

Led by <a href="/jack_merullo_/">Jack Merullo</a> , w/ <a href="/nlpnoah/">Noah A. Smith</a> &amp; <a href="/sarahwiegreffe/">Sarah Wiegreffe</a>
Peter West (@peterwesttm) 's Twitter Profile Photo

I’ve been fascinated lately by the question: what kinds of capabilities might base LLMs lose when they are aligned? i.e. where can alignment make models WORSE? I’ve been looking into this with Christopher Potts and here's one piece of the answer: randomness and creativity

I’ve been fascinated lately by the question: what kinds of capabilities might base LLMs lose when they are aligned? i.e. where can alignment make models WORSE? I’ve been looking into this with <a href="/ChrisGPotts/">Christopher Potts</a> and here's one piece of the answer: randomness and creativity
Yong Zheng-Xin (Yong) (@yong_zhengxin) 's Twitter Profile Photo

📣 New paper! We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern. However, this does not mean they reason the same way across all languages or in new domains. [1/N]

📣 New paper!

We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern.

However, this does not mean they reason the same way across all languages or in new domains. 

[1/N]
David Ifeoluwa Adelani 🇳🇬 (@davlanade) 's Twitter Profile Photo

Thank you Saarland Informatics Campus for the Eduard Martin Prize 2024 Every year, the Eduard Martin Prize is awarded by Saarland University and the Saarland University Society to doctoral students for outstanding achievements. linkedin.com/feed/update/ur… unigesellschaft-saarland.de/eduard-martin-…

Thank you <a href="/SIC_Saar/">Saarland Informatics Campus</a> for the Eduard Martin Prize 2024

Every year, the Eduard Martin Prize is awarded by Saarland University and the Saarland University Society to doctoral students for outstanding achievements. 
linkedin.com/feed/update/ur…

unigesellschaft-saarland.de/eduard-martin-…
EleutherAI (@aieleuther) 's Twitter Profile Photo

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

Can you train a performant language models without using unlicensed text?

We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&amp;2
Constantin Venhoff (@cvenhoff00) 's Twitter Profile Photo

🔍 New paper: How do vision-language models actually align visual- and language representations? We used sparse autoencoders to peek inside VLMs and found something surprising about when and where cross-modal alignment happens! Presented at XAI4CV Workshop @ CVPR 🧵 (1/6)

🔍 New paper: How do vision-language models actually align visual- and language representations?

We used sparse autoencoders to peek inside VLMs and found something surprising about when and where cross-modal alignment happens!

Presented at XAI4CV Workshop @ CVPR
 
🧵 (1/6)
Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

🚨 New preprint! 🚨 Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, grokking; they seem to suggest the model has learned a discrete, interpretable concept. Unfortunately, they’re pretty rare—or are they?

🚨 New preprint! 🚨 Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, grokking; they seem to suggest the model has learned a discrete, interpretable concept. Unfortunately, they’re pretty rare—or are they?