niki parmar (@nikiparmar09) 's Twitter Profile
niki parmar

@nikiparmar09

Working @Anthropic. Views expressed here are my own.

ID: 1698006024

calendar_today25-08-2013 03:28:00

193 Tweet

14,14K Followers

875 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

TLDR: You can get far with: vanilla Transformer (2017). Scrape a massive (though weakly-labeled) dataset, use simple supervised learning. Multi-task. Eval in zero-shot regime. More perf expected from further model+data scaling. Eval is hard. Some parts (decoding) feel hacky.

Neeva (@neeva) 's Twitter Profile Photo

10/ We are releasing our model (huggingface.co/neeva/query2qu…) and golden set used for eval (huggingface.co/datasets/neeva…) on Hugging Face. Take a look at our latest blog post for more information 👁️ ⤵️ neeva.com/blog/state-of-…

niki parmar (@nikiparmar09) 's Twitter Profile Photo

This particular example that generates a 2min long video based on a changing story is really cool Congrats to all the authors!

Nathan Benaich (@nathanbenaich) 's Twitter Profile Photo

🪩The State of AI 2022 is live!🪩 In its 5th year, the #stateofai report condenses what you *need* to know in AI research, industry, safety, and politics. This open-access report is our contribution to the AI ecosystem. Here's my director's cut 🧵: stateof.ai

niki parmar (@nikiparmar09) 's Twitter Profile Photo

Today is as good a day as any to share that I joined Anthropic last Dec :) Claude 3.7 is a remarkable model at complex tasks, especially coding, and I'm thrilled to have contributed to its development. From winning Pokémon badges to vibes coding, Claude's got you covered!

Alexander Ku (@alex_y_ku) 's Twitter Profile Photo

(1/11) Evolutionary biology offers powerful lens into Transformers learning dynamics! Two learning modes in Transformers (in-weights & in-context) mirror adaptive strategies in evolution. Crucially, environmental predictability shapes both systems similarly.

(1/11) Evolutionary biology offers powerful lens into Transformers learning dynamics! Two learning modes in Transformers (in-weights & in-context) mirror adaptive strategies in evolution. Crucially, environmental predictability shapes both systems similarly.
niki parmar (@nikiparmar09) 's Twitter Profile Photo

Claude Opus 4 and Sonnet 4 are the best coding models, setting new records across the board. 🚀 We are pushing the limits (80.2% on SWE-Bench!!), advancing the frontier while keeping up the momentum. The benchmarks may soon become saturated but the capabilities will not!

Aurko Roy (@happylemon56775) 's Twitter Profile Photo

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales

Excited to share what I worked on during my time at Meta.

- We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention

- We show how to adapt RoPE to tri-linear forms

- We show 2-simplicial attention scales