Marzena Karpinska (@mar_kar_) 's Twitter Profile
Marzena Karpinska

@mar_kar_

nlp evaluation of long-form input/output, mt/multilingual nlp, creative text generation
🇵🇱 ➯ 🇯🇵 ➯ 🇺🇸
Former: Postdoc @UMASS_NLP

ID: 393138789

linkhttp://marzenakrp.github.io calendar_today18-10-2011 02:36:27

395 Tweet

733 Followers

1,1K Following

Abhilasha Ravichander (@lasha_nlp) 's Twitter Profile Photo

✈️ I'm in Vienna for #ACL2025NLP! Would love to meet and chat about training data, factuality, transparency, doing a PhD in AI🤖, or anything else. Please say hi if you see me!☕️🍰 I am hiring PhD students + interns (shorturl.at/fZnOq), let's chat if you are looking!

Satya Nadella (@satyanadella) 's Twitter Profile Photo

Today we’re introducing Copilot Mode in Edge, our first step in reinventing the browser for the AI age. My favorite feature is multi-tab RAG. You can use Copilot to analyze your open tabs, like I do here with papers our team has published in nature journals over the last year.

Hita K (@_hitakam) 's Twitter Profile Photo

Are you a researcher in CS or a CS-adjacent field who could use help in refining your research ideas? Want to try our new AI-powered tool that helps with just that in a paid user study? Details and sign up here! forms.gle/UPFjyJ59uuZ5Zb…

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

This is why there is years of research on usable privacy. This is not clear to lay users. We are building technology for “people” not ourselves.

Mohit Iyyer (@mohitiyyer) 's Twitter Profile Photo

GPT-5 lands first place on NoCha, our long-context book understanding benchmark. That said, this is a tiny improvement (~1%) over o1-preview, which was released almost one year ago. Have long-context models hit a wall? Accuracy of human readers is >97%... Long way to go!

GPT-5 lands first place on NoCha, our long-context book understanding benchmark.

That said, this is a tiny improvement (~1%) over o1-preview, which was released almost one year ago. Have long-context models hit a wall?

Accuracy of human readers is >97%... Long way to go!
Marzena Karpinska (@mar_kar_) 's Twitter Profile Photo

We've added #gpt5 to #NoCha (benchmark testing how well models can process long input) While it is doing well, it is ONLY 1% of improvement over a period of ONE YEAR. On the bright side, open-weights models, which started much lower (below random) are now getting closer.

鴨井 遼 (@ryokamoi_ja) 's Twitter Profile Photo

NLPでアメリカの大学院に興味がある人はぜひ声をかけてください!今年出願の人はもちろん、数年先という人も大歓迎です。語学留学、交換留学、修士、博士と全部やってるので、どれについての相談でも良いです。 私のウェブサイトからメールを送れます→ ryokamoi.github.io

Mosh Levy (@mosh_levy) 's Twitter Profile Photo

Producing reasoning texts boosts the capabilities of AI models, but do we humans correctly understand these texts? Our latest research suggests that we do not. This highlights a new angle on the "Are they transparent?" debate: they might be, but we misinterpret them. 🧵

Producing reasoning texts boosts the capabilities of AI models, but do we humans correctly understand these texts? Our latest research suggests that we do not.
This highlights a new angle on the "Are they transparent?" debate: they might be, but we misinterpret them. 🧵
Najoung Kim 🫠 (@najoungkim) 's Twitter Profile Photo

Pulling this opportunity on research agent evaluation up one more time! The official title of the position will be "Senior research technician". Feel free to email either Sebastian Schuster or me directly if you have any questions. Link for more detailed info and where to apply in 🧵

Sebastian Schuster (@sebschu) 's Twitter Profile Photo

Work with me and Najoung Kim 🫠 on a cool project evaluating the latest agents. Joining us at BU would be preferable but if you make a strong case, we **may** also be able to hire you in Vienna. #nlproc

EMNLP 2025 (@emnlpmeeting) 's Twitter Profile Photo

#EMNLP2025 is offering Virtual Registration Subsidies for those who would otherwise be unable to attend. Note that these are only available for participants who are NOT registering any paper. To apply, read the details here, and fill out the linked form: 2025.emnlp.org/calls/virtual_…

Clémentine Fourrier 🍊 (@clefourrier) 's Twitter Profile Photo

Updated the evaluation guidebook with a new deep dive! 2025 panorama of all the important and next level evaluations that you need to know to build *actually impactful and useful* models! (Assistant tasks, games, forecasting, and more) Tell me wyt! :) github.com/huggingface/ev…

Dayeon (Zoey) Ki (@zoeykii) 's Twitter Profile Photo

1/ 🌍 Do #LLMs really treat all languages equally when citing evidence? 📑 In our new work, we uncover linguistic nepotism: models often trade off citation quality for language preference 👇

1/ 🌍 Do #LLMs really treat all languages equally when citing evidence? 📑 

In our new work, we uncover linguistic nepotism: models often trade off citation quality for language preference 👇
Chantal (@chantalshaib) 's Twitter Profile Photo

"AI slop" seems to be everywhere, but what exactly makes text feel like slop? In our new work (w/ Tuhin Chakrabarty, Diego Garcia-Olano, byron wallace) we provide a systematic attempt at measuring AI slop in text! arxiv.org/abs/2509.19163 🧵 (1/7)

"AI slop" seems to be everywhere, but what exactly makes text feel like slop?

In our new work (w/ <a href="/TuhinChakr/">Tuhin Chakrabarty</a>, <a href="/dgolano/">Diego Garcia-Olano</a>, <a href="/byron_c_wallace/">byron wallace</a>) we provide a systematic attempt at measuring AI slop in text!

arxiv.org/abs/2509.19163

🧵 (1/7)
Tuhin Chakrabarty (@tuhinchakr) 's Twitter Profile Photo

Low-quality AI-generated text is often referred to as #AISlop !! But how do humans quantify Slop? Are they consistent in their judgments? Is GPT5-thinking absolutely wrong in "thinking" its text is not Slop ⛳️ ? Look at Chantal's new work addressing these questions 👇

Jessy Li (@jessyjli) 's Twitter Profile Photo

All of us (Kyle Mahowald, Kanishka Misra 🌊 and me) are looking for PhD students this cycle! If computational linguistics/NLP is your passion, join us at UT Austin! For my areas see jessyli.com

Abhilasha Ravichander (@lasha_nlp) 's Twitter Profile Photo

It is PhD application season again 🍂 For those looking to do a PhD in AI, these are some useful resources 🤖: 1. Examples of statements of purpose (SOPs) for computer science PhD programs: cs-sop.org [1/4]