Tanya Goyal (@tanyaagoyal) Twitter Tweets • TwiCopy

Alex Wettig

a year ago

How to train long-context LMs? (and beat Llama-3.1 🏆) Many takeaways from our new paper! - Focus on diverse & reliable evaluations (not just perplexity) - Find good sources of long data and high-quality short data - ... A 🧵 on how we produced ProLong, a SoTA 8B 512K model

thumb_up_off_alt121

chat_bubble_outline3

repeat29

shareShare

Lucy Zhao

@lucy_xyzhao

a year ago

1/ When does synthetic data help with long-context extension and why? 🤖 while more realistic data usually helps, symbolic data can be surprisingly effective 🔍effective synthetic data induces similar retrieval heads–but often only subsets of those learned on real data!

thumb_up_off_alt93

chat_bubble_outline1

repeat25

shareShare

Jessy Li

@jessyjli

a year ago

Our department UT Linguistics Dept is hiring 2 new faculty in computational linguistics! NLP at UT is an absolutely lovely family so join us 🥰 apply.interfolio.com/158280

Our department <a href="/UT_Linguistics/">UT Linguistics Dept</a> is hiring 2 new faculty in computational linguistics!
NLP at UT is an absolutely lovely family so join us 🥰

apply.interfolio.com/158280

thumb_up_off_alt121

chat_bubble_outline0

repeat25

shareShare

Marzena Karpinska

@mar_kar_

a year ago

Will be presenting #nocha at #EMNLP2024 (Tue 16:00-17:30 (Riverfront Hall). Also happy to share that we have updated the dataset and our analysis! 🧚‍♀️🔮

thumb_up_off_alt13

chat_bubble_outline0

repeat7

shareShare

Wenting Zhao

@wzhao_nlp

a year ago

I’m at #EMNLP2024 this week presenting our work on reformulating unanswerable questions (Nov 12, 16-17:30). These days, I think about how to use formal tools and harder evals to get LMs closer to intelligence. I’m also on the faculty job market for 2024-2025! Please come say hi!

thumb_up_off_alt96

chat_bubble_outline4

repeat12

shareShare

Tanya Goyal

@tanyaagoyal

a year ago

Everyone should hire Wenting Zhao!

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

John Thickstun

@jwthickstun

10 months ago

I am recruiting PhD students for Fall '25 at Cornell! I plan to admit multiple students interested in building more controllable generative models, music technologies (or both!). 🎶 Please apply to Cornell Computer Science.

thumb_up_off_alt248

chat_bubble_outline3

repeat47

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

10 months ago

I'm on the faculty market and at #NeurIPS!👩‍🏫 homes.cs.washington.edu/~niloofar/ I work on privacy, memorization, and emerging challenges in data use for AI. Privacy isn't about PII removal but about controlling the flow of information contextually, & LLMs are still really bad at this!

thumb_up_off_alt413

chat_bubble_outline9

repeat86

shareShare

Wenting Zhao

@wzhao_nlp

10 months ago

Eval platforms like Chatbot Arena attract users to provide preference votes. But what are the incentives of these users? Are they apathetic, or are they adversarial and just aiming to inflate their model rankings? We show 10% adversarial votes change the model rankings by a lot!

thumb_up_off_alt89

chat_bubble_outline3

repeat18

shareShare

Tanya Goyal

@tanyaagoyal

10 months ago

Getting high-quality human annotations is always tricky, even for targeted domains/tasks. Check out Wenting Zhao's work where we analyze how this manifests in open community data collection efforts with minimal quality checks by design.

thumb_up_off_alt23

chat_bubble_outline0

repeat3

shareShare

Sasha Rush

@srush_nlp

10 months ago

This year, I have an exceptional student on the academic market. Wenting Zhao (Wenting Zhao) builds systems that reason in natural settings. She combines AI & NLP to study newly emerging problems. She recently released WildChat (wildchat.allen.ai) and Commit-0

This year, I have an exceptional student on the academic market.

Wenting Zhao (<a href="/wzhao_nlp/">Wenting Zhao</a>) builds systems that reason in natural settings. She combines AI & NLP to study newly emerging problems.

She recently released WildChat (wildchat.allen.ai) and Commit-0

thumb_up_off_alt372

chat_bubble_outline6

repeat67

shareShare

Marzena Karpinska

@mar_kar_

9 months ago

We've added #o1 and #Llama 3.3 70B to the #Nocha leaderboard for long-context narrative reasoning! Surprisingly, o1 performs worse than o1-preview, and Llama 3.3 70B matches proprietary models like gpt4o-mini & gemini-Flash. Check out our website for more results! More in 🧵

thumb_up_off_alt24

chat_bubble_outline1

repeat6

shareShare

Alex Wettig

@_awettig

8 months ago

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

thumb_up_off_alt195

chat_bubble_outline5

repeat48

shareShare

Fangyuan Xu

@brunchavecmoi

7 months ago

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

thumb_up_off_alt110

chat_bubble_outline1

repeat27

shareShare

Wenting Zhao

@wzhao_nlp

6 months ago

Time to revisit our paper: Open community-driven evaluation platforms could be corrupted from a few sources of bad annotations, making their results not as trustworthy as we'd like. arxiv.org/pdf/2412.04363

thumb_up_off_alt57

chat_bubble_outline0

repeat7

shareShare

Kabir

@kabirahuja004

6 months ago

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ Melanie Sclar, and tsvetshop 1/n

thumb_up_off_alt241

chat_bubble_outline3

repeat46

shareShare

Philippe Laban

@philippelaban

5 months ago

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

thumb_up_off_alt126

chat_bubble_outline5

repeat30

shareShare

Oliver Li

@oliver54244160

5 months ago

🤯 GPT-4o knows H&M left Russia in 2022 but still recommends shopping at H&M in Moscow. 🤔 LLMs store conflicting facts from different times, leading to inconsistent responses. We dig into how to better update LLMs with fresh facts that contradict their prior knowledge. 🧵 1/6

thumb_up_off_alt24

chat_bubble_outline3

repeat10

shareShare

Tanya Goyal

@tanyaagoyal

5 months ago

Check out Oliver's paper on learning new knowledge and resolving knowledge conflicts in LLMs! Surprising finding: conditioning on self-generated contexts during training gives massive performance gains! We are excited to extend this ideas to other domains!

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Anmol Mekala

@anmol_mekala

4 months ago

📢 New Paper 📢 Struggling to fit in very long contexts on your LLM? Considering 4-bit quantization to 2x your context window? Prior work says 4-bit is “good enough,” but at long-context tasks it can drop 16%: with up to 59% drops on specific models❗❗ Details in 🧵👇

thumb_up_off_alt35

chat_bubble_outline4

repeat13

shareShare