Niklas Risse (@niklas2484) Twitter Tweets • TwiCopy

Konrad Rieck 🌈

a year ago

Got some hot research cooking? 🔥 Two weeks until the SaTML Conference paper deadline! We’re eager to see your work on secure, private, and fair machine learning, as well as any other aspects of machine learning system security. 👉 satml.org/participate-cf… ⏰ Deadline: Sep 18

thumb_up_off_alt16

chat_bubble_outline1

repeat8

shareShare

hardmaru

@hardmaru

a year ago

It’s good to see the community raise the issue of extreme overfitting to evaluation benchmarks for LLMs. This is the real “Reflection” that the open-source community needs to have.

thumb_up_off_alt293

chat_bubble_outline5

repeat32

shareShare

Marcel Böhme👨‍🔬

@mboehme_

a year ago

🧑‍🏫 Thrilled to give a keynote at the 27th RAID Symposium! raid2024.github.io/program.html

thumb_up_off_alt73

chat_bubble_outline3

repeat11

shareShare

Marcel Böhme👨‍🔬

@mboehme_

a year ago

How does it feel like to do world-class research? If you are a CS undergrad who is interested in our topics, the Software Security group at #MPI_SP is hiring interns for summer & winter 2025! Details: 📅 01 November 2024 ✍️ cis.mpg.de/internships/ 🛡️ mpi-softsec.github.io

thumb_up_off_alt28

chat_bubble_outline1

repeat23

shareShare

Abraham Mhaidli

@abrahammhaidli

a year ago

Hi all! I am looking to recruit 1-2 summer 2025 research interns to work with me at MPI-SP in Bochum, Germany, on topics relating to the ethics and harms of emerging technologies, including (but not limited to!) virtual reality, brain computer interfaces, and more! (1/2)

thumb_up_off_alt211

chat_bubble_outline6

repeat73

shareShare

Marcel Böhme👨‍🔬

@mboehme_

a year ago

Very lucky to receive the ERC Consolidator this year! This is 5-year funding for groundbreaking research. If you are interested in our perspective on software security analysis at scale, stick around and read on. European Research Council (ERC) #ERCCoG #MPI_SP CASA - Cluster of Excellence for Cyber Security mpi-sp.org/71953/news_pub…

thumb_up_off_alt129

chat_bubble_outline22

repeat19

shareShare

Jürgen Schmidhuber

@schmidhuberai

9 months ago

1995-2025: The Decline of Germany & Japan vs US & China. Can All-Purpose Robots Fuel a Comeback? In 1995, in terms of nominal GDP, a combined Germany and Japan were almost 1:1 economically with a combined USA and China. Only 3 decades later, this ratio is now down to 1:5!

thumb_up_off_alt402

chat_bubble_outline40

repeat162

shareShare

Nacho Mellado

@uavster

9 months ago

Apple gets it. Robots are going to be everywhere, but they won’t look like robots. Check out their new paper ELEGNT. I believe this is the future of everyday objects: helpful and human.

thumb_up_off_alt7,7K

chat_bubble_outline234

repeat861

shareShare

Niklas Risse

@niklas2484

6 months ago

Function-level vulnerability detection is dead. Excited to share that our paper "Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection" got into ISSTA 2025. We show that function-level vulnerability detection is fundamentally flawed — and

thumb_up_off_alt30

chat_bubble_outline0

repeat7

shareShare

Marcel Böhme👨‍🔬

@mboehme_

6 months ago

Our paper "Top Score on the Wrong Exam" paper will be presented at #ISSTA25 🐣 in Trondheim! 📝mpi-softsec.github.io/papers/ISSTA25… 🧑‍💻github.com/niklasrisse/To… // Niklas Risse Jing Liu (fuzzing.bsky.social).

thumb_up_off_alt59

chat_bubble_outline4

repeat13

shareShare

Marcel Böhme👨‍🔬

@mboehme_

4 months ago

Thrilled to share a recent opinion piece at the IEEE Security and Privacy (Vol. 23, Issue 3). Basically a long-term perspective on the field meant for both researchers and practitioners. 📝 ieeexplore.ieee.org/stamp/stamp.js…

thumb_up_off_alt41

chat_bubble_outline4

repeat8

shareShare

Niklas Risse

@niklas2484

4 months ago

Proud to share that our paper “Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection” received an ACM Distinguished Paper Award at ISSTA 2025 in Trondheim, Norway. If you’re interested, the paper is available here: dl.acm.org/doi/10.1145/37…

thumb_up_off_alt33

chat_bubble_outline1

repeat2

shareShare

Marcel Böhme👨‍🔬

@mboehme_

4 months ago

Can we statistically estimate how likely an LLM-generated program is correct w/o knowing what is a correct program for that task? Sounds impossible-but it's actually really simple. In fact our oracle-less eval can reliably substitute a pass@1 based eval. arxiv.org/abs/2507.00057

thumb_up_off_alt64

chat_bubble_outline5

repeat14

shareShare

Lukas Seidel

@pr0me

4 months ago

Alex Plaskett the chunk-based approach seems neat, but it's also another (func-level) ML4VD paper with the common flaws imho. I think people publishing in this domain should at least cite and address Niklas Risse's "Top Score on the Wrong Exam: On Benchmarking in Machine Learning for

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

François Chollet

@fchollet

2 months ago

The proprietary frontier models of today are ephemeral artifacts. Essentially very expensive sandcastles. Destined to be washed away by the rising tide of open source replication (first) and algorithmic disruption (later).

thumb_up_off_alt1,1K

chat_bubble_outline96

repeat327

shareShare

Marcel Böhme👨‍🔬

@mboehme_

a month ago

Looking for a PostDoc, a PhD, and 3-6mth interns as part of my ERC project. Homepage: mboehme.github.io Böhme Lab: mpi-softsec.github.io Reach out if you find this interesting. 👇

thumb_up_off_alt30

chat_bubble_outline0

repeat15

shareShare

Nando de Freitas

@nandodf

a month ago

The only bitter lesson is that LLMs have succeeded beyond any expert expectations. Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into

thumb_up_off_alt698

chat_bubble_outline41

repeat71

shareShare