Simone Balloccu (@simoneballoccu) 's Twitter Profile
Simone Balloccu

@simoneballoccu

(he/him)
Leading the ExpNLP lab @TUDarmstadt. Researching AI w.r.t human evaluation, behaviour change, safety and controllability, expert domains.

ID: 1285850193615884289

linkhttps://uccollab.github.io/ calendar_today22-07-2020 08:13:15

707 Tweet

303 Followers

223 Following

INLG 2025 (@inlgmeeting) 's Twitter Profile Photo

First #CallForPapers for #INLG2025! Submit work on any aspect of #NaturalLanguagGeneration, incl. but not limited to: rule-based data-to-text systems, summarisation and simplification with the latest LLMs, or new evaluation methods :) Deadline: 14 July 2025.inlgmeeting.org

Edu (@educa_nlp) 's Twitter Profile Photo

Excited to fly to Albuquerque to present my latest piece at NAACL! If you want to learn how to design human-centered #NLProc evaluation UIs, visit my poster on May 1, Hall 3, 14:00-15:30. We can also chat about NLG, hallucinations, lexical change, and anything in between! :)

Excited to fly to Albuquerque to present my latest piece at <a href="/naacl/">NAACL</a>!

If you want to learn how to design human-centered #NLProc evaluation UIs, visit my poster on May 1, Hall 3, 14:00-15:30.

We can also chat about NLG, hallucinations, lexical change, and anything in between! :)
(((ل()(ل() 'yoav))))👾 (@yoavgo) 's Twitter Profile Photo

"LLM on way to replace doctors" gets published in Nature. meanwhile "LLM judgement not as good as human MDs" gets a spot in "Physical Therapy and Rehabilitation Journal".

"LLM on way to replace doctors" gets published in Nature. 

meanwhile "LLM judgement not as good as human MDs" gets a spot in "Physical Therapy and Rehabilitation Journal".
Ehud Reiter (@ehudreiter) 's Twitter Profile Photo

New blog: Key messages from my NLG book Its been 6 months since my NLG book was released. I summarise what I think are its key messages, for rule-based NLG, ML and neural NLG, requirements, evaluation, safety/testing/maintainability, and applications. ehudreiter.com/2025/05/14/key…

vas (@vasumanmoza) 's Twitter Profile Photo

Claude 4 just refactored my entire codebase in one call. 25 tool invocations. 3,000+ new lines. 12 brand new files. It modularized everything. Broke up monoliths. Cleaned up spaghetti. None of it worked. But boy was it beautiful.

Claude 4 just refactored my entire codebase in one call.

25 tool invocations. 3,000+ new lines. 12 brand new files.

It modularized everything. Broke up monoliths. Cleaned up spaghetti.

None of it worked.
But boy was it beautiful.
Simone Balloccu (@simoneballoccu) 's Twitter Profile Photo

Opinion: Authors should be allowed to add hidden prompts for LLMs in papers. If you lazily paste the paper you're supposed to judge on ChatGPT etc., you don't belong in peer reviewing.

INLG 2025 (@inlgmeeting) 's Twitter Profile Photo

The Second #CallForPapers just went out and announces two of our keynote speakers: Verena Rieser (Google DeepMind) & Minlie Huang (Tsinghua University)! Submit your work on NLG, whether LLM or rule-based :D Deadline: 14 July 2025.inlgmeeting.org (first posted elsewhere)

Dr. Dominic Ng (@drdominicng) 's Twitter Profile Photo

Microsoft claims their new AI framework diagnoses 4x better than doctors. I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵

Microsoft claims their new AI framework diagnoses 4x better than doctors.

I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵
Mickey Friedman (@mickeyxfriedman) 's Twitter Profile Photo

as a parent, i will never push a career path onto my kids. i would give them full freedom to decide which AI lab they want to join for $100 mil

Marco Guerini (@m_guerini) 's Twitter Profile Photo

I love this analysis of the limitations of the experimental setting/design. This is the kind of expert insight and methodological rigor we need when evaluating LLMs!

Vilém Zouhar (@zouharvi) 's Twitter Profile Photo

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅 We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️ (random is still a devilishly good baseline)

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅

We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️
(random is still a devilishly good baseline)
Ehud Reiter (@ehudreiter) 's Twitter Profile Photo

Motivated by recent discussion with my group: Ignore subjective statements such as "I find LLMs to be incredibly useful for XX", especially when made by people (such as AI companies or gurus) who have strong biases/incentives/COI .