Alexandros (@alexk_z) Twitter Tweets • TwiCopy

Rohan Paul

a year ago

Self-attention is actually doing kernel PCA under the hood - now we can make it robust So how does Self-attention works: it's kernel PCA in disguise, as proposed in this paper. 🤔 Original Problem: Self-attention in transformers has been developed through heuristics and

thumb_up_off_alt235

chat_bubble_outline9

repeat46

shareShare

Will Bryk

@williambryk

a year ago

Spent the weekend hacking together Exa embeddings over 4500 NeurIPS 2024 papers - neurips.exa.ai Let's you: - do otherwise impossible searches ("transformer architectures inspired by neuroscience") - explore a 2D t-SNE plot - chat with Claude about multiple papers

thumb_up_off_alt675

chat_bubble_outline28

repeat79

shareShare

Sakana AI

@sakanaailabs

a year ago

Introducing An Evolved Universal Transformer Memory sakana.ai/namm Neural Attention Memory Models (NAMMs) are a new kind of neural memory system for Transformers that not only boost their performance and efficiency but are also transferable to other foundation models,

thumb_up_off_alt479

chat_bubble_outline8

repeat119

shareShare

Xiaojian Ma

@jeasinema

a year ago

📢 Attending #NeurIPS2024 ? Come by our workshop on open-world agents! everything 👉 owa-workshop.github.io Put your questions for the panel here: forms.gle/XLumiMHAWjwydy… Our speakers & panelists lining up: Sherry Yang Tao Yu Ted Xiao Natasha Jaques Jiajun Wu

thumb_up_off_alt56

chat_bubble_outline2

repeat12

shareShare

chansung

@algo_diver

a year ago

NeurIPS Conference 2024 reimagined with AI !! - summaries for instant insights 🧠 - easy-to-understand audio podcasts 🎙️ - quick links to NeurIPS Proc., Hugging Face & more 🌐 - Full papers, topic & affiliation filters 📂 All your research needs, in one hub. Dive in now! 👇

thumb_up_off_alt163

chat_bubble_outline6

repeat57

shareShare

elvis

@omarsar0

a year ago

Training LLMs to Reason in a Continuous Latent Space Meta presents Coconut (Chain of Continuous Thought), a novel paradigm that enables LLMs to reason in continuous latent space rather than natural language. Coconut takes the last hidden state of the LLM as the reasoning state

thumb_up_off_alt430

chat_bubble_outline17

repeat91

shareShare

Konstantinos Papadimitrakis

@papadimitrakisk

a year ago

Kosmas Marinakis Εδώ το βίντεο:

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat607

shareShare

Mahesh Sathiamoorthy

@madiator

a year ago

I think the community is excited about DeepSeek v3 not because it's yet another powerful model but because it's a story of human ingenuity in the face of constraints. Despite all the restrictions due to export control and limited budget, the humans of DeepSeek have created a

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat185

shareShare

pokaaaahh

@pokaaaahh

9 months ago

Ενώ το ραντεβού μας στις #28_Φλεβαρη για τα #Τεμπη_συγκαλυψη στους δρόμους είναι σε λίγες ώρες και το πόρισμα του #ΕΟΔΑΣΑΑΜ για τα #τεμπη_εγκλημα μίλησε για 2,5 τόνους εύφλεκτης "άγνωστης" ουσίας και ασύλληπτες και εγκληματικές παραλείψεις (#Justice_for_Tempi), με τον… 1/36

thumb_up_off_alt82

chat_bubble_outline3

repeat22

shareShare

mgostIH

@mgostih

9 months ago

This paper is pretty cool: The Belief State Transformer Very simple technique and fast to train, makes transformers (or other seq models) better at modelling state and can additionally condition on the end! I wonder what this is like for RL, we might condition on high end reward!

thumb_up_off_alt647

chat_bubble_outline16

repeat97

shareShare

Maria Karystianou

@mkaristianou

8 months ago

Το χειροκρότημα των βολεμένων στην καρέκλα τους Βουλευτών της Νέας Δημοκρατίας για την απόρριψη της πρότασης δυσπιστίας, επισφράγισε τη συμμετοχή τους στα όσα σκανδαλώδη απο τη πρώτη στιγμή οργανώνει, μεθοδεύει και διαπράττει ο Πρωθυπουργός και οι Υπουργοί τους. Κανένας από

thumb_up_off_alt8,8K

chat_bubble_outline596

repeat2,2K

shareShare

Mistral AI

@mistralai

8 months ago

Introducing Mistral Small 3.1. Multimodal, Apache 2.0, outperforms Gemma 3 and GPT 4o-mini. mistral.ai/news/mistral-s…

thumb_up_off_alt7,7K

chat_bubble_outline271

repeat1,1K

shareShare

Quentin Gallouédec

@qgallouedec

8 months ago

☄️ GRPO now scales to 70B+ models with multi-node training and super-fast performance. Install the latest v0.16 version of TRL pip install trl With all these the freshest features and optimizations that we've added, you can train up to 60 times faster! More details in the

thumb_up_off_alt744

chat_bubble_outline16

repeat94

shareShare

Yifei Zhou

@yifeizhou02

8 months ago

📢LLM and RL folks! 📢 No good RL algorithm for credit assignment for multi-turn LLM agents on reasoning-heavy tasks? Do not even have a good benchmark for studying it? In SWEET-RL, we give you both (a vibe coding benchmark and SWEET algorithm). A thread 🧵(1/n)

thumb_up_off_alt379

chat_bubble_outline3

repeat80

shareShare

Praxis Review

@praxis_review

8 months ago

"Ας αφήσουμε τα παιδιά του Μωάμεθ να αποτελειώσουν τα παιδιά του Ροβεσπιέρου" (Παλαιών Πατρών Γερμανός, 1820): Πως αντιμετωπίστηκαν πολλοί ήρωες του 1821 απο ντόπιους αστούς, κοτζαμπάσηδες, αρχιρασοφόρους κ.α που πήραν τελικά την θέση των Οθωμανών στην εκμετάλλευση του λαού; Ο

thumb_up_off_alt627

chat_bubble_outline11

repeat227

shareShare

Dimitri Bertsekas

@dbertsekas

7 months ago

I am pleased to share the full set of videolectures, slides, textbook, and other supporting material of the 7th offering of my Reinforcement Learning class at ASU, which was completed two days ago; check web.mit.edu/dimitrib/www/R…

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat239

shareShare

hardmaru

@hardmaru

7 months ago

Tim Rocktäschel’s keynote talk at #ICLR2025 about Open-Endedness and AI. “Almost no prerequisite to any major invention was invented with that invention in mind.” “Basically almost everybody in my lab at UCL and at DeepMind have read this book: Why Greatness Cannot Be Planned.”

thumb_up_off_alt432

chat_bubble_outline8

repeat69

shareShare

Kenneth Stanley

@kenneth0stanley

6 months ago

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known

thumb_up_off_alt864

chat_bubble_outline36

repeat133

shareShare

Sakana AI

@sakanaailabs

5 months ago

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: sakana.ai/rlt Paper: arxiv.org/abs/2506.08388 Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and

thumb_up_off_alt947

chat_bubble_outline21

repeat221

shareShare

cascadian realism fan 🌲

@realism_fan

5 months ago

Ahahahaha, the James Webb Space Telescope continues to deliver massive L’s for astrophysics. A new paper shows that the “Cosmic Microwave Background Radiation” can be explained entirely by the energy of recently discovered Early Mature Galaxies — massive galaxies that the JWST

thumb_up_off_alt19,19K

chat_bubble_outline1,1K

repeat2,2K

shareShare