Lucas Torroba-Hennigen (@ltorroba1) Twitter Tweets • TwiCopy

Songlin Yang

2 years ago

Data-dependent decay and state dimension expansion are the key for Mamba/GLA matching Transformers!🚀 Also excited to present my NeurIPS spotlight paper [arxiv.org/abs/2311.04823] this Wednesday, which also shows the crucial role of data-dependent decay. Come and chat about RNNs!

thumb_up_off_alt57

chat_bubble_outline1

repeat13

shareShare

Pratyusha Sharma

@pratyusha_ps

2 years ago

What if I told you that you can simultaneously enhance an LLM's task performance and reduce its size with no additional training? We find selective low-rank reduction of matrices in a transformer can improve its performance on language understanding tasks, at times by 30% pts!🧵

thumb_up_off_alt1,1K

chat_bubble_outline39

repeat277

shareShare

Edoardo Ponti

@pontiedoardo

2 years ago

I am still looking for PhD students starting in September 2024! The deadline to apply for the CDT in NLP is the 11th of March. If you wish to do research in modular and efficient LLMs, here are some highlights of my lab's research from the past year ⬇️🧵

thumb_up_off_alt149

chat_bubble_outline10

repeat51

shareShare

Vaibhav Adlakha

@vaibhav_adlakha

2 years ago

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N Paper: arxiv.org/abs/2404.05961

thumb_up_off_alt877

chat_bubble_outline13

repeat169

shareShare

Han Guo

@hanguo97

a year ago

Bailin is going to be a great mentor, so consider working with him.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Shannon Shen

@shannonzshen

a year ago

Lucas Torroba-Hennigen and I will present SymGen at Conference on Language Modeling on Wednesday afternoon! Looking forward to seeing y’all there! I'd love to chat about llm generation attribution & verification and human agent collaboration for scientific discovery! plz dm/email me :)

<a href="/ltorroba1/">Lucas Torroba-Hennigen</a> and I will present SymGen at <a href="/COLM_conf/">Conference on Language Modeling</a> on Wednesday afternoon! Looking forward to seeing y’all there!

I'd love to chat about llm generation attribution & verification and human agent collaboration for scientific discovery! plz dm/email me :)

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Shannon Shen

@shannonzshen

a year ago

SymGen is featured on MIT News news.mit.edu/2024/making-it… -- please take a look at the great piece by Adam and the editor team!

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Or Honovich

@ohonovich

a year ago

Scaling inference compute by repeated sampling boosts coverage (% problems solved), but could this be due to lucky guesses, rather than correct reasoning? We show that sometimes, guessing beats repeated sampling 🎲 Gal Yona Omer Levy roeeaharoni arxiv.org/abs/2410.15466

thumb_up_off_alt36

chat_bubble_outline1

repeat12

shareShare

Jyo Pari

@jyo_pari

a year ago

Over the past year I have been working on using multiple specialized models in a collective fashion to solve novel tasks. We investigated Mixture of Experts (MoE) style routing for merging. However, we find that feature based merging is likely not scalable paradigm. Read on!

thumb_up_off_alt99

chat_bubble_outline2

repeat27

shareShare

Alex Warstadt

@a_stadt

a year ago

I'm excited to announce my new lab: UCSD's Learning Meaning and Natural Language Lab. a.k.a. LeM🍋N Lab! And 📢WE ARE RECRUITING📢 PhD students to join us in sunny San Diego in either Linguistics OR Data Science. Apply by Dec 4: connect.grad.ucsd.edu/apply/ More about the lab👇

thumb_up_off_alt451

chat_bubble_outline12

repeat77

shareShare

Siva Reddy

@sivareddyg

a year ago

I have multiple vacancies for PhD and Masters students at Mila - Institut québécois d'IA McGill NLP in NLP/ML focusing on representation learning, reasoning, multimodality and alignment. Deadline for applications is Dec 1st. More details: mila.quebec/en/prospective…

thumb_up_off_alt160

chat_bubble_outline1

repeat64

shareShare

Jyo Pari

@jyo_pari

8 months ago

Turn a single pre-trained model’s layers into MoE “experts” and reuse them? Finetuning a “router” slightly cuts loss—cool proof of concept. Can we combine dynamic compute paths/reuse + coconut-like latent reasoning? jyopari.github.io/posts/reuse

thumb_up_off_alt22

chat_bubble_outline1

repeat5

shareShare

MIT NLP

@nlp_mit

7 months ago

Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

thumb_up_off_alt543

chat_bubble_outline26

repeat52

shareShare

Mehul Damani @ ICLR

@mehuldamani2

3 months ago

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --

thumb_up_off_alt892

chat_bubble_outline11

repeat286

shareShare