shane (@shncldwll) Twitter Tweets • TwiCopy

shane

@shncldwll

+ Follow

pentester + ml eng. building hackbots

ID: 970139534653624322

linkhttps://hackbot.dad calendar_today04-03-2018 03:31:04

1,1K Tweet

457 Followers

405 Following

moo

@moo_hax

4 months ago

Will be hanging out at the Agentic Summit this Saturday. Happy to meet up and talk agent observability, evals, and deployment for cyber security. rdi.berkeley.edu/events/agentic…

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

there is truly no social media-ism that makes me stop reading faster than 'let that sink in'. i would rather see "it's not x, it's y" it's beyond self parody! give it up! the sentence can draw attention to itself if you write it correctly!

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

shane

@shncldwll

4 months ago

Wrote about evals at Dreadnode. This one is for hackers getting up to speed on agents for their use cases. How do you go from PoC to prod? Don't wait for a lab to build benchmarks that measure what you care about. Do it yourself. Here's how:

thumb_up_off_alt28

chat_bubble_outline2

repeat8

shareShare

shane

@shncldwll

4 months ago

time to copy edit the paper (you no longer remember a time before editing latex tables)

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Alexander Doria

@dorialexander

4 months ago

So far even in highly verifiable settings, RL with judge seems to work better than pure verifiable.

thumb_up_off_alt63

chat_bubble_outline5

repeat1

shareShare

shane

@shncldwll

4 months ago

before posting writing online, it's important to read it out loud to yourself. that way every time you hit a difficult to read sentence you can get really mad and delete the whole piece, preventing anyone online from suffering

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

shane

@shncldwll

4 months ago

YC Fall 2025 Startup Requests: We Are Demanding The Torment Nexus Immediately

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

shane

@shncldwll

4 months ago

damn nobody ever talks about how embarrassing it is to ask around for an arxiv endorsement.

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Alexander Doria

@dorialexander

4 months ago

Recipe gets confirmed: drop pure verifiable, take the most performant judge that can fit on GPU (latency not a bit issue, so long as batches per step are small) and ask for soft critique.

thumb_up_off_alt96

chat_bubble_outline8

repeat4

shareShare