Gullal S. Cheema (@gullal7) 's Twitter Profile
Gullal S. Cheema

@gullal7

Research Assistant @l3s_luh

Previously Marie Sklodowska Curie ESR (PhD) @TIBHannover, Germany

MUWS Workshop: muws-workshop.github.io

Views are personal.

ID: 1042742865992937474

calendar_today20-09-2018 11:50:47

685 Tweet

74 Followers

170 Following

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. 

In the "One Training example" paper 
the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and
TuringPost (@theturingpost) 's Twitter Profile Photo

11 alignment and optimization algorithms for LLMs ▪️ PPO (Proximal Policy Optimization) ▪️ DPO (Direct Preference Optimization) ▪️ GRPO (Group Relative Policy Optimization) ▪️ DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) ▪️ AMPO (Active Multi-Preference

11 alignment and optimization algorithms for LLMs

▪️ PPO (Proximal Policy Optimization)
▪️ DPO (Direct Preference Optimization)
▪️ GRPO (Group Relative Policy Optimization)
▪️ DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization)
▪️ AMPO (Active Multi-Preference
eric zakariasson (@ericzakariasson) 's Twitter Profile Photo

we get a lot of questions about which model to use in Cursor, so we put together a guide on how you can think about selecting models based on what we've seen work well

we get a lot of questions about which model to use in <a href="/cursor_ai/">Cursor</a>, so we put together a guide on how you can think about selecting models based on what we've seen work well
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Adapting pretrained LLMs for vision tasks often degrades their language abilities or requires full retraining. X-Fusion introduces a dual-tower design. It freezes LLM weights, adding a trainable vision tower. This enables multimodal tasks while preserving original language

Adapting pretrained LLMs for vision tasks often degrades their language abilities or requires full retraining.

X-Fusion introduces a dual-tower design.

It freezes LLM weights, adding a trainable vision tower.

This enables multimodal tasks while preserving original language
Yangyi Chen (on job market) (@yangyichen6666) 's Twitter Profile Photo

🐂🍺Introducing our recent preprint: Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training! We present PRIOR, a simple vision-language pre-training algorithm that addresses the challenge of irrelevant textual content in image-caption pairs. PRIOR enhances

🐂🍺Introducing our recent preprint: Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training! 

We present PRIOR, a simple vision-language pre-training algorithm that addresses the challenge of irrelevant textual content in image-caption pairs. PRIOR enhances
DailyPapers (@huggingpapers) 's Twitter Profile Photo

J1 just launched on Hugging Face A Reinforcement Learning recipe for training Thinking-LLM-as-a-Judge models. It trains J1-Llama-8B and J1-Llama-70B that outperform existing models.

J1 just launched on Hugging Face

A Reinforcement Learning recipe for training Thinking-LLM-as-a-Judge models. It trains J1-Llama-8B and J1-Llama-70B that outperform existing models.
Kenneth Stanley (@kenneth0stanley) 's Twitter Profile Photo

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known
Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy!
arxiv.org/abs/2412.05265
Kyunghyun Cho (@kchonyc) 's Twitter Profile Photo

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. Sungmin Cha and i decided to see if we can come up with the simplest working description of KD in this work. we ended

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. <a href="/_sungmin_cha/">Sungmin Cha</a> and i decided to see if we can come up with the simplest working description of KD in this work. 

we ended
Sagnik Mukherjee (@saagnikkk) 's Twitter Profile Photo

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models”

From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮
And this isn’t a one-off. The pattern holds across RL algorithms and models.
🧵A Deep Dive
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Presents an autoregressive U-Net that processes raw bytes and learns hierarchical token representation Matches strong BPE baselines, with deeper hierarchies demonstrating promising scaling trends

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Presents an autoregressive U-Net that processes raw bytes and learns hierarchical token representation

Matches strong BPE baselines, with deeper hierarchies demonstrating promising scaling trends
Anshul Kundaje (anshulkundaje@bluesky) (@anshulkundaje) 's Twitter Profile Photo

Ok a few quick things. Most CS students who get into elite PhD programs in AI especially have already usually published multiple first author papers in "top" conferences. 1/

Unsloth AI (@unslothai) 's Twitter Profile Photo

We made a Guide on mastering LoRA Hyperparameters, so you can learn to fine-tune LLMs correctly! Learn to: • Train smarter models with fewer hallucinations • Choose optimal: learning rates, epochs, LoRA rank, alpha • Avoid overfitting & underfitting 🔗docs.unsloth.ai/get-started/fi…

We made a Guide on mastering LoRA Hyperparameters, so you can learn to fine-tune LLMs correctly!

Learn to:
• Train smarter models with fewer hallucinations
• Choose optimal: learning rates, epochs, LoRA rank, alpha
• Avoid overfitting &amp; underfitting

🔗docs.unsloth.ai/get-started/fi…
Daniel Khashabi 🕊️ (@danielkhashabi) 's Twitter Profile Photo

What’s really going on inside LLMs when they handle non-English queries? Niyati Bafna's recent work introduces the **translation barrier hypothesis**, a framework for understanding multilingual model behavior. This hypothesis says that : (1) Multilingual generation, internally,

What’s really going on inside LLMs when they handle non-English queries?

<a href="/BafnaNiyati/">Niyati Bafna</a>'s recent work introduces the **translation barrier hypothesis**, a framework for understanding multilingual model behavior.

This hypothesis says that : (1) Multilingual generation, internally,
Jean de Nyandwi (@jeande_d) 's Twitter Profile Photo

Reinforcement Learning of Large Language Models, Spring 2025(UCLA) Great set of new lectures on reinforcement learning of LLMs. Covers a wide range of topics related to RLxLLMs such as basics/foundations, test-time compute, RLHF, and RL with verifiable rewards(RLVR).

Reinforcement Learning of Large Language Models, Spring 2025(UCLA)

Great set of new lectures on reinforcement learning of LLMs. Covers a wide range of topics related to RLxLLMs such as basics/foundations, test-time compute, RLHF, and RL with verifiable rewards(RLVR).
Femke Plantinga (@femke_plantinga) 's Twitter Profile Photo

Stop optimizing your retrieval. Fix your chunking first. It's not your embedding model, prompt engineering, or vector database. It's your chunking strategy creating invisible walls between your users and the information they need. 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 is the pre-processing step of

Stop optimizing your retrieval. Fix your chunking first.

It's not your embedding model, prompt engineering, or vector database. It's your chunking strategy creating invisible walls between your users and the information they need.

𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 is the pre-processing step of