Silviu Pitis (@silviupitis) Twitter Tweets • TwiCopy

Keiran Paster

2 years ago

Introducing OpenWebMath, a massive dataset containing every math document found on the internet - with equations in LaTeX format! 🤗 Download on @HuggingFace: huggingface.co/datasets/open-… 📝 Read the paper: arxiv.org/abs/2310.06786 w/ Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba!

thumb_up_off_alt1,1K

chat_bubble_outline42

repeat244

shareShare

Yongchao Zhou

@yongchao_zhou_

2 years ago

🎉 Excited to introduce DistillSpec! Accelerate your LLM inference using Speculative Decoding with a more aligned draft model, consistently delivering a remarkable 10-45% performance uplift over standard Speculative Decoding. ✨ Combining Distilled Target with Distilled Draft,

thumb_up_off_alt149

chat_bubble_outline3

repeat28

shareShare

Ziang Xiao

@ziangxiao

2 years ago

What constitutes a good #eval metric? We bridge #MeasurementTheory in #EducationalTesting and #Psychometrics to #NLProc. We propose #MetricEval, a theory-driven framework to conceptualize and operationalize measurement error sources to evaluate #NLG metrics. #EMNLP2023 🧵

thumb_up_off_alt83

chat_bubble_outline1

repeat18

shareShare

Rachel Freedman

@freedmanrach

2 years ago

RLHF typically assumes that all training feedback comes from a single teacher, but teachers can disagree up to 37% of the time in practice. In our new paper, we introduce active teacher selection to learn from different teachers. (1/n)

thumb_up_off_alt96

chat_bubble_outline3

repeat33

shareShare

Jiahai Feng

@feng_jiahai

2 years ago

When given context about a “green square” and a “blue circle”, how do language models bind corresponding shapes and colors? Using causal experiments, we find that large enough language models learn simple structured representations for binding! A thread (1/n)

thumb_up_off_alt338

chat_bubble_outline4

repeat53

shareShare

Alan Chan

@_achan96_

2 years ago

OpenAI just announced GPTs and the Assistants API for “ helping developers build agent-like experiences”, but what does that mean and how does it change how we should govern AI? Some early thoughts relating to my ongoing work 🧵:

thumb_up_off_alt52

chat_bubble_outline2

repeat11

shareShare

Yangjun Ruan

@yangjunr

2 years ago

#OpenAI’s GPTs & Assistants APIs are a blast, making it much easier to build customized agents with new tools. But are they safe to deploy? 🚨 A simple & quick test against prompt injections reveals that it is fairly easy to make GPTs delete all your files 💀

thumb_up_off_alt30

chat_bubble_outline3

repeat4

shareShare

Silviu Pitis

@silviupitis

2 years ago

I will be at #NeurIPS2023 Dec 11-16 Shoot me an email to connect! Particularly interested in: - LM eval for long-horizon / agents - Alignment / rewards generally Will present my paper on multi-objective reward aggregation at Poster sess 6 Thurs eve (neurips.cc/virtual/2023/p…)

thumb_up_off_alt29

chat_bubble_outline0

repeat1

shareShare

Lucas Caccia

@lucaspcaccia

2 years ago

Our team at MSR Montréal is looking for interns! Subjects range from efficient modular adaptation to building complex systems by stacking LLMs. Consider applying here : aka.ms/AAo5t0x

thumb_up_off_alt43

chat_bubble_outline1

repeat7

shareShare

Yangjun Ruan

@yangjunr

2 years ago

ToolEmu has been accepted at #ICLR2024 as a Spotlight presentation🔥 Explore our LLM-based emulation framework for identifying LLM agent risks at scale! 🎯 Demo: demo.toolemu.com 📄 Paper: arxiv.org/abs/2309.15817 🔗 Code: github.com/ryoungj/ToolEmu 🧵⬇️

thumb_up_off_alt80

chat_bubble_outline0

repeat16

shareShare

Roger Grosse

@rogergrosse

2 years ago

Here's what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.

thumb_up_off_alt599

chat_bubble_outline23

repeat122

shareShare

Yangjun Ruan

@yangjunr

2 years ago

We are presenting ToolEmu at #ICLR2024 tomorrow! ⏲️ Friday 4:30pm-6:30pm CEST 📍 Spotlight poster session, Hall B #80 I won't be able to attend ICLR this year but don't miss the chance to meet our amazing collaborators!

thumb_up_off_alt15

chat_bubble_outline1

repeat3

shareShare

Blair Yang

@blairyang12

a year ago

🔍 Current LLM evaluations fall short: • Lack nuanced understanding of model capabilities • Overly focused on quantitative metrics • Difficult for humans to interpret Introducing LLM Report Cards! A novel approach for qualitative, interpretable model evaluation. 1/N

thumb_up_off_alt9

chat_bubble_outline2

repeat4

shareShare

Michael Zhang

@michaelrzhang

a year ago

📝 How do you choose which language model to use? Quantitative benchmarks can be uninformative and fall prey to Goodhart's Law, and even Chatbot Arena performance can be optimized for. In our new preprint, we propose generating qualitative report cards... 🧵

thumb_up_off_alt33

chat_bubble_outline1

repeat10

shareShare

Schwartz Reisman Institute

@torontosri

a year ago

“What objective function do we want AI to optimize for? If we aggregate values from society, what weights do we use, and whose values?” Learn more about SRI Grad Affiliate Silviu Pitis's research, supported by an OpenAI Superalignment Fast Grant. 🔗 uoft.me/aWX

thumb_up_off_alt26

chat_bubble_outline0

repeat2

shareShare