Lucas Torroba-Hennigen (@ltorroba1) 's Twitter Profile
Lucas Torroba-Hennigen

@ltorroba1

PhD student at MIT working in NLP.

ID: 1229354866298040320

linkhttp://ltorroba.github.io calendar_today17-02-2020 10:40:22

141 Tweet

450 Followers

571 Following

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Data-dependent decay and state dimension expansion are the key for Mamba/GLA matching Transformers!🚀 Also excited to present my NeurIPS spotlight paper [arxiv.org/abs/2311.04823] this Wednesday, which also shows the crucial role of data-dependent decay. Come and chat about RNNs!

Pratyusha Sharma (@pratyusha_ps) 's Twitter Profile Photo

What if I told you that you can simultaneously enhance an LLM's task performance and reduce its size with no additional training? We find selective low-rank reduction of matrices in a transformer can improve its performance on language understanding tasks, at times by 30% pts!🧵

What if I told you that you can simultaneously enhance an LLM's task performance and reduce its size with no additional training?

We find selective low-rank reduction of matrices in a transformer can improve its performance on language understanding tasks, at times by 30% pts!🧵
Edoardo Ponti (@pontiedoardo) 's Twitter Profile Photo

I am still looking for PhD students starting in September 2024! The deadline to apply for the CDT in NLP is the 11th of March. If you wish to do research in modular and efficient LLMs, here are some highlights of my lab's research from the past year ⬇️🧵

Vaibhav Adlakha (@vaibhav_adlakha) 's Twitter Profile Photo

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N Paper: arxiv.org/abs/2404.05961

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N

Paper: arxiv.org/abs/2404.05961
Shannon Shen (@shannonzshen) 's Twitter Profile Photo

Lucas Torroba-Hennigen and I will present SymGen at Conference on Language Modeling on Wednesday afternoon! Looking forward to seeing y’all there! I'd love to chat about llm generation attribution & verification and human agent collaboration for scientific discovery! plz dm/email me :)

<a href="/ltorroba1/">Lucas Torroba-Hennigen</a> and I will present SymGen at <a href="/COLM_conf/">Conference on Language Modeling</a>  on Wednesday afternoon! Looking forward to seeing y’all there!

I'd love to chat about llm generation attribution &amp; verification and human agent collaboration for scientific discovery! plz dm/email me :)
Shannon Shen (@shannonzshen) 's Twitter Profile Photo

SymGen is featured on MIT News news.mit.edu/2024/making-it… -- please take a look at the great piece by Adam and the editor team!

Or Honovich (@ohonovich) 's Twitter Profile Photo

Scaling inference compute by repeated sampling boosts coverage (% problems solved), but could this be due to lucky guesses, rather than correct reasoning? We show that sometimes, guessing beats repeated sampling 🎲 Gal Yona Omer Levy roeeaharoni arxiv.org/abs/2410.15466

Scaling inference compute by repeated sampling boosts coverage (% problems solved), but could this be due to lucky guesses, rather than correct reasoning?

We show that sometimes, guessing beats repeated sampling 🎲

<a href="/_galyo/">Gal Yona</a> <a href="/omerlevy_/">Omer Levy</a> <a href="/roeeaharoni/">roeeaharoni</a>

arxiv.org/abs/2410.15466
Jyo Pari (@jyo_pari) 's Twitter Profile Photo

Over the past year I have been working on using multiple specialized models in a collective fashion to solve novel tasks. We investigated Mixture of Experts (MoE) style routing for merging. However, we find that feature based merging is likely not scalable paradigm. Read on!

Alex Warstadt (@a_stadt) 's Twitter Profile Photo

I'm excited to announce my new lab: UCSD's Learning Meaning and Natural Language Lab. a.k.a. LeM🍋N Lab! And 📢WE ARE RECRUITING📢 PhD students to join us in sunny San Diego in either Linguistics OR Data Science. Apply by Dec 4: connect.grad.ucsd.edu/apply/ More about the lab👇

I'm excited to announce my new lab: UCSD's Learning Meaning and Natural Language Lab.
     a.k.a. LeM🍋N Lab!

And 📢WE ARE RECRUITING📢 PhD students to join us in sunny San Diego in either Linguistics OR Data Science. Apply by Dec 4: connect.grad.ucsd.edu/apply/

More about the lab👇
Siva Reddy (@sivareddyg) 's Twitter Profile Photo

I have multiple vacancies for PhD and Masters students at Mila - Institut québécois d'IA McGill NLP in NLP/ML focusing on representation learning, reasoning, multimodality and alignment. Deadline for applications is Dec 1st. More details: mila.quebec/en/prospective…

Jyo Pari (@jyo_pari) 's Twitter Profile Photo

Turn a single pre-trained model’s layers into MoE “experts” and reuse them? Finetuning a “router” slightly cuts loss—cool proof of concept. Can we combine dynamic compute paths/reuse + coconut-like latent reasoning? jyopari.github.io/posts/reuse

MIT NLP (@nlp_mit) 's Twitter Profile Photo

Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠
Mehul Damani @ ICLR (@mehuldamani2) 's Twitter Profile Photo

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --

🚨New Paper!🚨
We trained reasoning LLMs to reason about what they don't know.

o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more.

Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --