Mikhail Parakhin (@mparakhin) 's Twitter Profile
Mikhail Parakhin

@mparakhin

ID: 1506868649495199744

calendar_today24-03-2022 05:41:58

1,1K Tweet

20,20K Followers

21 Following

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

Anthropic models in C++ consistently forget an opening angle bracket '<' in complicated multi-line templates: T_MyTemplate < //missing here P_Param1, P_Param2 > I suspect this is due to interference with their internal tag system. I really hope everyone adopts Harmony...

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

Ethan is a friend, but I think the opposite: OpenAI was sitting on strawberry for way too long, because of the inference GPU availability concerns, giving others time to catch up.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Bit silly but I still watch the Apple event livestream for new iPhones, every year since the first one in 2007. It doesn't make sense but it's ok. Livestream today at 10am (in 1.5 hours). This year, crossing my fingers again for an iPhone mini that I know won't come. rip.

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

LLMs are mostly trained on texts — as in literature. Despite what we say, we don't value conciseness in essays (otherwise TLDRs would never be needed). The models transfer the same 'flowery eloquence' requirement to code generation, where it's the opposite of what we want.

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

Yesterday one of our engineers was complaining about React: “It’s bad for LLMs, I have to keep saying “make it simpler” before the results are acceptable”. I just smiled…

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

My video generation test is to create the famous episode of footage from Gibson’s Pattern Recognition. Sora 2 does a better job than Veo 3, but still not there. Fun fact: Sam is a big Neuromancer fan, I tried to convince him to call the model Wintermute once. He just laughed :-)

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

We got a lot of mileage out of this paper. In retrospect, it is kind of an obvious wrapper around the Straight-Through Estimator, but very effective. Back in Sydney days Yuan Yu and us had to build manual hierarchical discretization strategies, here they appear automagically.

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

Laugh all you want, but NTFS file streams (ADS) are tailor-made for model distribution. On Linux all the supplementary information has to be in a separate file, they inevitably desynchronize. On Windows I always keep model.pt and model.pt:extra_info

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

It appears Google has slightly limited the thinking budget for DeepThink, I have to think for myself more often again :-(. On the positive side, Demis told me last week that I am "going to be very impressed by Gemini 3" - can't wait!

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

Prediction: we will see more passion projects like this. LLMs make it much easier to implement, so, I expect someone to make OS/2 and IRIX work on modern hardware.

Mikhail Parakhin (@mparakhin) 's Twitter Profile Photo

EmEditor is in harvesting mode and has become unusable. Which editor for huge text files should I switch to on Windows - UltraEdit? 010? Loxx?