ueaj (@_ueaj) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Spent a few days thinking "man my training script repo is getting really bloated with all my experiments, if only there was a way to maintain many slightly different versions of the same code" My transition from software development to researcher is complete. I've forgotten how

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

ueaj

@_ueaj

8 days ago

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

ueaj

@_ueaj

7 days ago

+7% wallclock for +7% token efficiency, we're so back

thumb_up_off_alt54

chat_bubble_outline1

repeat2

shareShare

kalomaze

@kalomaze

7 days ago

claude 4 opus when i asked it to make an "ambient, serene song" in the custom midi format this is what transfer learning sounds like

thumb_up_off_alt331

chat_bubble_outline22

repeat11

shareShare

ueaj

@_ueaj

6 days ago

My first research project was encoding my uniquely good ability to calculate probabilities into a computer algorithm. I developed this ability to calculate the probability a girl liked me back, the highest probability best dialog option when flirting and eventually, why she

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

ueaj

@_ueaj

5 days ago

We're gonna have to figure out soft body robots before any reasonable person trusts these within 100ft of a child. Some stupid baby is gonna stick their finger inside one of the joints one day or another

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

ueaj

@_ueaj

5 days ago

Why don't we see unified large-small models? Surely it's not impossible to use the same model with different numbers of active experts? It would obviously require specialized pretraining but the total flops would probably be less than training two separate models.

thumb_up_off_alt11

chat_bubble_outline3

repeat0

shareShare

ueaj

@_ueaj

4 days ago

Opus 4 falsely recalls what it thinks is a post-training memory but is actually 3 separate nearly identical samples from the pretraining data (2023). This is so weird/cool lmao

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

ueaj

@_ueaj

4 days ago

kalomaze xlr8harder I imagine the gap will eventually close, maybe gpt 5 will be good in codex, though apparently claude 4.1 is around the corner as well. I made a very handy diagram to explain this phenomenon