Michelle Yuan (@michyuan) 's Twitter Profile
Michelle Yuan

@michyuan

Principal Applied Scientist @oracle doing NLP/ML/AI research.

ID: 998069783789817856

linkhttps://forest-snow.github.io/ calendar_today20-05-2018 05:15:54

99 Tweet

489 Followers

209 Following

Jungo Kasai 笠井淳吾 (@jungokasai) 's Twitter Profile Photo

#NLProc: Stop annotating randomly-sampled data! Select-then-annotate to make language models better few-shot in-context learners (ICL)! 12% gain with GPT-3 etc over 10 tasks. 10x annotation efficiency. Different from the finetuning (FT) paradigm. arxiv.org/abs/2209.01975 1/5

#NLProc: Stop annotating randomly-sampled data! Select-then-annotate to make language models better few-shot in-context learners (ICL)! 12% gain with GPT-3 etc over 10 tasks. 10x annotation efficiency. Different from the finetuning (FT) paradigm. 
arxiv.org/abs/2209.01975
1/5
Michelle Yuan (@michyuan) 's Twitter Profile Photo

My PhD thesis, "Transfer Learning in NLP through Interactive Feedback" is now online drum.lib.umd.edu/handle/1903/29… Grateful for the support from Jordan, my committee, and @ClipUMD. PhD has truly been an unforgettable journey! Now I am in NY at AWS as Applied Scientist šŸ—½

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

šŸ“¢ A 🧵 on the Trends in NLP Datasets. What’s changed since SQuAD was all the rage in 2016? A: A LOT. šŸ”­ 1. Generic āž”ļø Niche Tasks 2. Task-specific Training+Eval āž”ļø Eval Only 3. Dataset āž”ļø Benchmark āž”ļø Massive Collections 4. Datasets āž”ļø Diagnostics 1/

šŸ“¢ A 🧵 on the Trends in NLP Datasets.

What’s changed since SQuAD was all the rage in 2016? A: A LOT. šŸ”­

1. Generic āž”ļø Niche Tasks
2. Task-specific Training+Eval āž”ļø Eval Only
3. Dataset āž”ļø Benchmark āž”ļø Massive Collections
4. Datasets āž”ļø Diagnostics

1/
Michelle Yuan (@michyuan) 's Twitter Profile Photo

Hokkien is the second most popular dialect in Taiwan, one of the most linguistically diverse places in the world. Some say that Taiwan is the ā€œUrheimatā€ of the Austronesian languages family. Would love to see AI developed to help preserve these beautiful, spoken languages.

UMD CLIP Lab (@clipumd) 's Twitter Profile Photo

Our researchers won a Best Paper Award AACL 2025 for their work to make visual question answering (VQA) systems more effective for blind users. The paper was coauthored by Yang (Trista) Cao, Kyle Seelman, Kyungjun Lee and Hal DaumƩ III. Learn more: go.umd.edu/cDe

Our researchers won a Best Paper Award <a href="/aaclmeeting/">AACL 2025</a> for their work to make visual question answering (VQA) systems more effective for blind users.

The paper was coauthored by Yang (Trista) Cao, <a href="/KyleSeelman/">Kyle Seelman</a>, <a href="/kyungjun_l/">Kyungjun Lee</a> and <a href="/haldaume3/">Hal DaumƩ III</a>.

Learn more: go.umd.edu/cDe
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Toolformer: Language Models Can Teach Themselves to Use Tools Presents Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. abs: arxiv.org/abs/2302.04761

Toolformer: Language Models Can Teach Themselves to Use Tools 

Presents Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. 

abs: arxiv.org/abs/2302.04761
Samson Tan (@samsontmr) 's Twitter Profile Photo

🚨BREAKING NEWS🚨 Looking to work on LLM research this summer? Our team in Amazon Web Services AI Research and Education has reopened intern hiring and we're looking for talented PhD interns! DM me your resume if you work on robustness or any of the topics below šŸ‘‡šŸ»

Adam Selipsky (@aselipsky) 's Twitter Profile Photo

We’re announcing Amazon Bedrock—giving customers the easiest way to build and scale generative AI applications with access to the leading foundation models including Anthropic, AI21 Labs, Stability AI and Amazon Titan FMs. Choice rules! aws.amazon.com/blogs/machine-…

Patrick Fernandes (@psanfernandes) 's Twitter Profile Photo

*Human feedback* was the necessary secret sauce in making #chatgpt so human-like But what exactly is feedback? And how can we leverage it to improve our models? Check out our new survey on the use of (human) feedback in Natural Language Generation! arxiv.org/abs/2305.00955 1/16

*Human feedback* was the necessary secret sauce in making #chatgpt so human-like
But what exactly is feedback? And how can we leverage it to improve our models?

Check out our new survey on the use of (human) feedback in Natural Language Generation!

arxiv.org/abs/2305.00955

1/16
Michelle Yuan (@michyuan) 's Twitter Profile Photo

Check out this large, multilingual dataset with passages from millions of Wikipedia articles and material from their cited sources!

Julen Etxaniz (@juletxara) 's Twitter Profile Photo

Do multilingual language models think better in English? šŸ¤” Yes, they do! We show that using an LLM to translate its input into English and performing the task over the translated input works better than using the original non-English input! 😯 arxiv.org/abs/2308.01223

Do multilingual language models think better in English? šŸ¤”

Yes, they do! We show that using an LLM to translate its input into English and performing the task over the translated input works better than using the original non-English input! 😯

arxiv.org/abs/2308.01223
Wes Gurnee (@wesg52) 's Twitter Profile Photo

Do language models have an internal world model? A sense of time? At multiple spatiotemporal scales? In a new paper with Max Tegmark we provide evidence that they do by finding a literal map of the world inside the activations of Llama-2!

Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Vector databases have raised billions offering hosted management of text embeddings. What are you revealing in these vectors ? šŸ”’ We demonstrate a practical method for full recovery of 90% of sentence length embeddings (arxiv.org/abs/2310.06816).

Adam Tauman Kalai (@adamfungi) 's Twitter Profile Photo

This new paper with Santosh Vempala gives a simple statistical justification for why and when Language Models *should* hallucinate using standard pretraining, even under ideal in-distribution training conditions. [1/7] arxiv.org/abs/2311.14648

HyoJung Han (@h__j___han) 's Twitter Profile Photo

Do the best translations go beyond literal meaning? Excited to share our work at #EMNLP2023 on Automatic Explicitation in translation w Marine Carpuat and Jordan Boyd-Graber! Check out our poster session on Dec 8th Fri (today!) at East Foyer from 2pm to 3:30pm. In our paper, ...

Do the best translations go beyond literal meaning?
Excited to share our work at #EMNLP2023 on Automatic Explicitation in translation w <a href="/MarineCarpuat/">Marine Carpuat</a> and <a href="/boydgraber/">Jordan Boyd-Graber</a>!
Check out our poster session on Dec 8th Fri (today!) at East Foyer from 2pm to 3:30pm.

In our paper, ...
Deedy (@deedydas) 's Twitter Profile Photo

Stories have 6 primary arcs: • ā€œRags to richesā€ (rise) • ā€œTragedyā€ (fall) • ā€œMan in a holeā€ (fall-rise) • ā€œIcarusā€ (rise-fall) • ā€œCinderellaā€ (rise-fall-rise) • ā€œOedipusā€ (fall-rise-fall) Programmatic analysis validates Kurt Vonnegut's legendary rejected masters thesis.

Stories have 6 primary arcs:

• ā€œRags to richesā€ (rise)
• ā€œTragedyā€ (fall)
• ā€œMan in a holeā€ (fall-rise)
• ā€œIcarusā€ (rise-fall)
• ā€œCinderellaā€ (rise-fall-rise)
• ā€œOedipusā€ (fall-rise-fall)

Programmatic analysis validates Kurt Vonnegut's legendary rejected masters thesis.
CLS (@chengleisi) 's Twitter Profile Photo

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.