David Wadden (@davidjwadden) 's Twitter Profile
David Wadden

@davidjwadden

Graduate student at @uwcse studying NLP.

ID: 1256342994237583360

linkhttps://dwadden.github.io calendar_today01-05-2020 22:01:45

53 Tweet

389 Followers

101 Following

Ai2 (@allen_ai) 's Twitter Profile Photo

Today we're thrilled to announce our new undertaking to collaboratively build the best open language model in the world: AI2 OLMo. Uniquely open, 70B parameters, coming early 2024 – join us! blog.allenai.org/announcing-ai2…

Yizhong Wang (@yizhongwyz) 's Twitter Profile Photo

🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT? 🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B! 📜arxiv.org/abs/2306.04751

🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT?

🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B!

📜arxiv.org/abs/2306.04751
Yanai Elazar (@yanaiela) 's Twitter Profile Photo

Does arXiving have a casual effect on acceptance? The answer is nuanced, and depends on what assumptions you are willing to make, but arguably more importantly, we observe no difference in acceptance for different groups. arxiv.org/abs/2306.13891

Does arXiving have a casual effect on acceptance?

The answer is nuanced, and depends on what assumptions you are willing to make, but arguably more importantly, we observe no difference in acceptance for different groups.

arxiv.org/abs/2306.13891
Ashish Sharma (@sharma_ashish_2) 's Twitter Profile Photo

Absolutely thrilled🎉 to receive the ACL 2025 #ACL2023NLP 🏆Outstanding Paper Award🏆 for our work on cognitive reframing of negative thoughts! A huge shoutout to the diverse team behind this work Allen School UW NLP Mental Health America and Stanford Health Care 💖

Daniel Weld (@dsweld) 's Twitter Profile Photo

Interested in a better way to explore #VLDB2023 papers? Try exp-sum.apps.allenai.org for an LLM-powered way to probe those papers… * Ask questions w/ a single click * Explore answer provenance using the ending “ * Dive deep w/ recursive questions Powered by Semantic Scholar

Orion Weller @ ICLR 2025 (@orionweller) 's Twitter Profile Photo

Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going 📈 But do these approaches work for all IR models and for different types of distribution shifts? Turns out its actually more 📉 🚨 📝 (arxiv soon): orionweller.github.io/assets/pdf/LLM…

Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going  📈

But do these approaches work for all IR models and for different types of distribution shifts?  Turns out its actually more 📉 🚨

📝 (arxiv soon): orionweller.github.io/assets/pdf/LLM…
Hamish Ivison (@hamishivi) 's Twitter Profile Photo

Check out the Tulu 2 suite 🐪, a set of Llama-2 models finetuned+DPO-trained on a mixture of publicly available datasets! Our best-performing models are competitive with SoTA open models on a range of benchmarks incl. AlpacaEval and MT-Bench. 📜Paper: arxiv.org/abs/2311.10702

Check out the Tulu 2 suite 🐪, a set of Llama-2 models finetuned+DPO-trained on a mixture of publicly available datasets! Our best-performing models are competitive with SoTA open models on a range of benchmarks incl. AlpacaEval and MT-Bench.
📜Paper: arxiv.org/abs/2311.10702
Semantic Scholar (@semanticscholar) 's Twitter Profile Photo

New feature alert 🚨On each paper page, scroll down to find AI-generated Topic pages related to the paper, which include topic definitions, papers most cited for the topic, and more! Now available for Computer Science fields. Here’s an example: semanticscholar.org/paper/SPECTER%…

Yanai Elazar (@yanaiela) 's Twitter Profile Photo

This is fantastic news!! Somewhat of a coincidence, but our paper that studies the effect of early arxiving on acceptance that suggested this effect is small and that it does not fill its purpose was accepted to CLeaR (Causal Learning and Reasoning) 2024 x.com/yanaiela/statu…

Ai2 (@allen_ai) 's Twitter Profile Photo

OLMo is here! And it’s 100% open. It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here: blog.allenai.org/olmo-open-lang…

Semantic Scholar Research @ AI2 (@ai2_s2research) 's Twitter Profile Photo

📣 Job opportunities at Semantic Scholar Research @ the Allen Institute for AI (AI2) for post-doctoral & pre-doctoral researchers starting in 2024! 📣 Our team works on NLP and HCI research with a focus on open LLMs and LLM-powered research support tools and assistants.

📣 Job opportunities at Semantic Scholar Research @ the Allen Institute for AI (AI2) for post-doctoral & pre-doctoral researchers starting in 2024! 📣

Our team works on NLP and HCI research with a focus on open LLMs and LLM-powered research support tools and assistants.
Fangyuan Xu (@brunchavecmoi) 's Twitter Profile Photo

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task? We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task?

We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.
Nathan Lambert (@natolambert) 's Twitter Profile Photo

Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models. 1. We evaluated 30+ of the currently available RMs (w/ DPO too). 2. We created new datasets covering chat, safety, code, math, etc. We learned a lot.

Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models.
1. We evaluated 30+ of the currently available RMs (w/ DPO too).
2. We created new datasets covering chat, safety, code, math, etc. We learned a lot.
Hanna Hajishirzi (@hannahajishirzi) 's Twitter Profile Photo

Introducing our best OLMo yet. OLMo 1.7-7B outperforms LLaMa2-7B, approaching LLaMa2-13B at MMLU and GSM8k. High-quality data and staged training are key. I am so proud of our team making such significant improvement in a short period after our first release.

Introducing our best OLMo yet. OLMo 1.7-7B outperforms LLaMa2-7B, approaching LLaMa2-13B at MMLU and GSM8k. High-quality data and staged training are key. 

I am so proud of our team making such significant improvement in a short period after our first release.
Ai2 (@allen_ai) 's Twitter Profile Photo

Looking for a dataset to enhance language model instruction-following over scientific literature? Introducing SciRIFF, a dataset of 137K expert-written demonstrations spanning 5 essential task categories for literature understanding: information extraction, summarization,

Looking for a dataset to enhance language model instruction-following over scientific literature? Introducing SciRIFF, a dataset of 137K expert-written demonstrations spanning 5 essential task categories for literature understanding: information extraction, summarization,
Kejian Shi (@shi_kejian) 's Twitter Profile Photo

Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:

Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:
Yuling Gu (@gu_yuling) 's Twitter Profile Photo

LLMs are evaluated on the same tasks in so many different ways! 🤯 ✨ We introduce OLMES – a standard for reproducible LLM evaluations that is open, practical, completely documented, and can be applied to current leaderboards & eval code bases! ✨ 📜 arxiv.org/abs/2406.08446 1/

LLMs are evaluated on the same tasks in so many different ways! 🤯

✨ We introduce OLMES –  a standard for reproducible LLM evaluations that is open, practical, completely documented, and can be applied to current leaderboards & eval code bases! ✨

📜 arxiv.org/abs/2406.08446
1/
Kyle Lo (@kylelostat) 's Twitter Profile Photo

Luca Soldaini 🎀 and I are arriving to #ACL2024 🇹🇭today! come find us at our talks & poster sessions for our OLMo & Dolma projects with Ai2 & frens 🤩 also dont miss our poster on KIWI🥝for interactive science QA w/ our intern Fangyuan Xu & mentors Eunsol Choi David Wadden

<a href="/soldni/">Luca Soldaini 🎀</a> and I are arriving to #ACL2024 🇹🇭today!

come find us at our talks &amp; poster sessions for our OLMo &amp; Dolma projects with <a href="/allen_ai/">Ai2</a> &amp; frens 🤩

also dont miss our poster on KIWI🥝for interactive science QA w/ our intern <a href="/brunchavecmoi/">Fangyuan Xu</a> &amp; mentors <a href="/eunsolc/">Eunsol Choi</a> <a href="/davidjwadden/">David Wadden</a>