Gullal S. Cheema (@gullal7) Twitter Tweets • TwiCopy

Alex Dimakis

6 months ago

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat190

shareShare

TuringPost

@theturingpost

6 months ago

11 alignment and optimization algorithms for LLMs ▪️ PPO (Proximal Policy Optimization) ▪️ DPO (Direct Preference Optimization) ▪️ GRPO (Group Relative Policy Optimization) ▪️ DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) ▪️ AMPO (Active Multi-Preference

thumb_up_off_alt575

chat_bubble_outline6

repeat128

shareShare

eric zakariasson

@ericzakariasson

6 months ago

we get a lot of questions about which model to use in Cursor, so we put together a guide on how you can think about selecting models based on what we've seen work well

we get a lot of questions about which model to use in <a href="/cursor_ai/">Cursor</a>, so we put together a guide on how you can think about selecting models based on what we've seen work well

thumb_up_off_alt3,3K

chat_bubble_outline118

repeat292

shareShare

Pietro Schirano

@skirano

6 months ago

People aren’t talking enough about how most of OpenAI’s tech stack runs on Python.

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat90

shareShare

Yifei Hu

@hu_yifei

6 months ago

SFT Qwen3 without lossing thinking capabilities - from official qwen doc: qwen.readthedocs.io/en/latest/trai…

thumb_up_off_alt123

chat_bubble_outline0

repeat21

shareShare

Rohan Paul

@rohanpaul_ai

6 months ago

Adapting pretrained LLMs for vision tasks often degrades their language abilities or requires full retraining. X-Fusion introduces a dual-tower design. It freezes LLM weights, adding a trainable vision tower. This enables multimodal tasks while preserving original language

thumb_up_off_alt146

chat_bubble_outline7

repeat34

shareShare

Yangyi Chen (on job market)

@yangyichen6666

6 months ago

🐂🍺Introducing our recent preprint: Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training! We present PRIOR, a simple vision-language pre-training algorithm that addresses the challenge of irrelevant textual content in image-caption pairs. PRIOR enhances

thumb_up_off_alt146

chat_bubble_outline3

repeat30

shareShare

DailyPapers

@huggingpapers

6 months ago

J1 just launched on Hugging Face A Reinforcement Learning recipe for training Thinking-LLM-as-a-Judge models. It trains J1-Llama-8B and J1-Llama-70B that outperform existing models.

thumb_up_off_alt59

chat_bubble_outline3

repeat15

shareShare

Kenneth Stanley

@kenneth0stanley

6 months ago

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known

thumb_up_off_alt864

chat_bubble_outline36

repeat133

shareShare

Kevin Patrick Murphy

@sirbayes

6 months ago

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

thumb_up_off_alt2,2K

chat_bubble_outline23

repeat445

shareShare

Kyunghyun Cho

@kchonyc

6 months ago

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. Sungmin Cha and i decided to see if we can come up with the simplest working description of KD in this work. we ended

thumb_up_off_alt359

chat_bubble_outline7

repeat43

shareShare

Sagnik Mukherjee

@saagnikkk

6 months ago

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive

thumb_up_off_alt844

chat_bubble_outline17

repeat125

shareShare

Aran Komatsuzaki

@arankomatsuzaki

5 months ago

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Presents an autoregressive U-Net that processes raw bytes and learns hierarchical token representation Matches strong BPE baselines, with deeper hierarchies demonstrating promising scaling trends

thumb_up_off_alt365

chat_bubble_outline3

repeat56

shareShare

Sebastian Raschka

@rasbt

5 months ago

Understanding and Coding KV Caching From Scratch -- The Extended Edition magazine.sebastianraschka.com/p/coding-the-k…

thumb_up_off_alt762

chat_bubble_outline10

repeat106

shareShare

Anshul Kundaje (anshulkundaje@bluesky)

@anshulkundaje

5 months ago

Ok a few quick things. Most CS students who get into elite PhD programs in AI especially have already usually published multiple first author papers in "top" conferences. 1/

thumb_up_off_alt1,1K

chat_bubble_outline12

repeat74

shareShare

Unsloth AI

@unslothai

5 months ago

We made a Guide on mastering LoRA Hyperparameters, so you can learn to fine-tune LLMs correctly! Learn to: • Train smarter models with fewer hallucinations • Choose optimal: learning rates, epochs, LoRA rank, alpha • Avoid overfitting & underfitting 🔗docs.unsloth.ai/get-started/fi…

thumb_up_off_alt681

chat_bubble_outline12

repeat129

shareShare

himanshu dubey

@himanshustwts

5 months ago

went through this but don't just only skim over it ig. every question is a good research paper and worth a read.

thumb_up_off_alt2,2K

chat_bubble_outline14

repeat152

shareShare

Daniel Khashabi 🕊️

@danielkhashabi

5 months ago

What’s really going on inside LLMs when they handle non-English queries? Niyati Bafna's recent work introduces the **translation barrier hypothesis**, a framework for understanding multilingual model behavior. This hypothesis says that : (1) Multilingual generation, internally,

What’s really going on inside LLMs when they handle non-English queries?

<a href="/BafnaNiyati/">Niyati Bafna</a>'s recent work introduces the **translation barrier hypothesis**, a framework for understanding multilingual model behavior.

This hypothesis says that : (1) Multilingual generation, internally,

thumb_up_off_alt10

chat_bubble_outline0

repeat4

shareShare

Jean de Nyandwi

@jeande_d

4 months ago

Reinforcement Learning of Large Language Models, Spring 2025(UCLA) Great set of new lectures on reinforcement learning of LLMs. Covers a wide range of topics related to RLxLLMs such as basics/foundations, test-time compute, RLHF, and RL with verifiable rewards(RLVR).

thumb_up_off_alt1,1K

chat_bubble_outline6

repeat236

shareShare

Femke Plantinga

@femke_plantinga

4 months ago

Stop optimizing your retrieval. Fix your chunking first. It's not your embedding model, prompt engineering, or vector database. It's your chunking strategy creating invisible walls between your users and the information they need. 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 is the pre-processing step of

thumb_up_off_alt998

chat_bubble_outline12

repeat192

shareShare