Smerity (@smerity) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

François Chollet Excited to see the return of RNNs but wish their citations were better. Our QRNN paper (2016) has variants similar/identical to minGRU & minLSTM. RWKV, S4, Mamba et al. include citations to QRNN (2016) and SRU (2017) for a richer history + better context. arxiv.org/abs/1611.01576

<a href="/fchollet/">François Chollet</a> Excited to see the return of RNNs but wish their citations were better. Our QRNN paper (2016) has variants similar/identical to minGRU & minLSTM.
RWKV, S4, Mamba et al. include citations to QRNN (2016) and SRU (2017) for a richer history + better context.
arxiv.org/abs/1611.01576

thumb_up_off_alt87

chat_bubble_outline3

repeat10

shareShare

Hugh Riminton

@hughriminton

9 months ago

Feel like a little outrage - how about this: pokies and gambling companies are claiming more tax breaks for R&D than some of the country’s biggest tech companies. 🤷‍♂️ ⁦Financial Review⁩ afr.com/rear-window/at…

thumb_up_off_alt322

chat_bubble_outline18

repeat162

shareShare

Yasmine Khosrowshahi

@yasminekho

9 months ago

I distilled my 4 years of marketing lessons into minimalistic visuals. 1. Sell Benefits. Not features.

thumb_up_off_alt262,262K

chat_bubble_outline868

repeat16,16K

shareShare

Smerity

@smerity

8 months ago

The more you use it, the more hot paths you see everywhere in Python's ecosystem - well-worn trails connecting optimized nodes, paved over time by countless developers. I argue that's Python's implicit JIT ecosystem at work. state.smerity.com/smerity/state/…

thumb_up_off_alt41

chat_bubble_outline3

repeat2

shareShare

Rudy Gilman

@rgilman33

8 months ago

Introducing darkspark, a gui for your neural network. It traces your pytorch code and brings up a visual representation for you to interact with. We have a hosted gallery of popular model architectures pre-traced and ready to explore. Here’s stable-diffusion-v1.5

thumb_up_off_alt227

chat_bubble_outline4

repeat23

shareShare

Smerity

@smerity

8 months ago

My contribution to a discussion on explorables/user interfaces for controlling ML tools > We're trying to rig soundboards to control LLMs thinking there's a well defined interface underneath when it's actually a button that drops fertilizer into the river of a complex ecosystem.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Rahel Jhirad

@raheljhirad

8 months ago

Thank you Ian Johnson 🔬🤖 for organizing this … amazing unconference So friendly and so diverse a group of talented folks Leland McInnes Linus awesome keynotes and huge shout out to Leland McInnes on cool resources Great to meet .txt Chroma And see Smerity

Thank you <a href="/enjalot/">Ian Johnson 🔬🤖</a> for organizing this … amazing unconference

So friendly and so diverse a group of talented folks

<a href="/leland_mcinnes/">Leland McInnes</a> <a href="/thesephist/">Linus</a> awesome keynotes and huge shout out to <a href="/leland_mcinnes/">Leland McInnes</a> on cool resources

Great to meet <a href="/dottxtai/">.txt</a> <a href="/trychroma/">Chroma</a>
And see <a href="/Smerity/">Smerity</a>

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Rudy Gilman

@rgilman33

5 months ago

I came across a strange creature yesterday while tracing a circuit in DINOv2: the upside-down GELU. I’d always thought of GELU as just a smoother ReLU that died less and was easier to optimize. I thought I could ignore the tiny dip into negative territory in the same way I

thumb_up_off_alt422

chat_bubble_outline19

repeat30

shareShare

Pieter Abbeel

@pabbeel

4 months ago

Founders who were PhD or post-doc in my lab at Berkeley, **largely funded by NSF / DoD grants**, start-up, market cap (collected by OpenAI Deep Research)

thumb_up_off_alt4,4K

chat_bubble_outline122

repeat504

shareShare

François Chollet

@fchollet

4 months ago

Today, we're releasing ARC-AGI-2. It's an AI benchmark designed to measure general fluid intelligence, not memorized skills – a set of never-seen-before tasks that humans find easy, but current AI struggles with. It keeps the same format as ARC-AGI-1, while significantly

thumb_up_off_alt2,2K

chat_bubble_outline59

repeat351

shareShare

Benjamin Spiegel

@superspeeg

3 months ago

Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at CogSci Society, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat180

shareShare

Linus

@thesephist

3 months ago

Very cool and inspiring work

thumb_up_off_alt32

chat_bubble_outline2

repeat1

shareShare

Caglar Gulcehre

@caglarml

3 months ago

📢I am thrilled to announce this paper. We showed that it is possible to significantly improve the FunSearch method with RL and achieve impressive algorithmic discoveries on challenging NP-complete combinatorial optimization tasks like TSP and bin-packing.

thumb_up_off_alt79

chat_bubble_outline2

repeat17

shareShare

Rudy Gilman

@rgilman33

2 months ago

The VAE used in SDXL has extremely high-magnitude "splotches" in its latents. The individual neurons in these blobs fire with magnitudes of close to a million. These aren't some accident of training or initialization—the model creates these high-magnitude splotches for a

thumb_up_off_alt941

chat_bubble_outline11

repeat85

shareShare

Lucky Iyinbor

@luckyballa

2 months ago

So Flow Matching is *just* xt = mix(x0, x1, t) loss = mse((x1 - x0) - nn(xt, t)) Nice, here it is in a fragment shader :) shadertoy.com/view/tfdXRM

thumb_up_off_alt492

chat_bubble_outline5

repeat39

shareShare

Aaron Defazio

@aaron_defazio

2 months ago

Why do gradients increase near the end of training? Read the paper to find out! We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training. arxiv.org/abs/2506.02285

thumb_up_off_alt492

chat_bubble_outline12

repeat62

shareShare

Han Guo

@hanguo97

2 months ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Smerity

@smerity

a month ago

I'm at AMD's #AdvancingAI and hoping to run into new / familiar faces I've not caught up with :) Feel free to ping!

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Albert Gu

@_albertgu

14 days ago

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

thumb_up_off_alt1,1K

chat_bubble_outline58

repeat177

shareShare

Awni Hannun

@awnihannun

14 days ago

The new Kimi K2 1T model (4-bit quant) runs on 2 512GB M3 Ultras with mlx-lm and mx.distributed. 1 trillion params, at a speed that's actually quite usable:

thumb_up_off_alt1,1K

chat_bubble_outline61

repeat156

shareShare