Babak Hodjat (@babakatwork) Twitter Tweets • TwiCopy

Babak Hodjat

a year ago

Getting started with AI agents (part 1): Capturing processes, roles and connections venturebeat.com/ai/getting-sta… via VentureBeat

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare

Babak Hodjat

@babakatwork

a year ago

Getting started with AI agents (part 2): Autonomy, safeguards and pitfalls venturebeat.com/ai/getting-sta… via VentureBeat

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

.Babak Hodjat explains why #DeepSeek isn’t the Sputnik moment it’s hyped up to be, but instead an important accelerator for enterprise #AI adoption. Read the full article here on @Techzine ➡️ bit.ly/44Faih2

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Environmental Data Science

@envdatascience

7 months ago

New article! Discovering effective policies for land-use planning with neuroevolution 👉 bit.ly/3ZoUBHf By Daniel Young, Olivier Francon, Elliot Meyerson, Clemens Schwingshackl, Jacob Bieker, Hugo Cunha, Babak Hodjat & Risto Miikkulainen Open Climate Fix Cognizant

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

AK

@_akhaliq

2 months ago

Evolution Strategies at Scale LLM Fine-Tuning Beyond Reinforcement Learning

thumb_up_off_alt111

chat_bubble_outline2

repeat22

shareShare

AK

@_akhaliq

2 months ago

discuss with author: huggingface.co/papers/2509.24…

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Yulu Gan

@yule_gan

2 months ago

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for

thumb_up_off_alt2,2K

chat_bubble_outline85

repeat352

shareShare

Yulu Gan

@yule_gan

2 months ago

As noted in DeepSeek-R1 and other studies, RL fine-tuning has several limitations, including challenges with long-horizon and outcome-only rewards, low sample efficiency, high-variance credit assignment, instability, and reward hacking. ES sidesteps these issues: it perturbs

thumb_up_off_alt85

chat_bubble_outline4

repeat7

shareShare

Yulu Gan

@yule_gan

2 months ago

On the symbolic-reasoning Countdown task, ES beats PPO/GRPO across Qwen-2.5 (0.5B–7B) & Llama-3 (1B–8B) with huge gains. Moreover, as shown in TinyZero by Jiayi Pan and DeepSeek-R1, RL fails on small models like Qwen-0.5B — yet ES succeeds! 🚀

On the symbolic-reasoning Countdown task, ES beats PPO/GRPO across Qwen-2.5 (0.5B–7B) & Llama-3 (1B–8B) with huge gains.

Moreover, as shown in TinyZero by <a href="/jiayi_pirate/">Jiayi Pan</a> and DeepSeek-R1, RL fails on small models like Qwen-0.5B — yet ES succeeds! 🚀

thumb_up_off_alt69

chat_bubble_outline1

repeat4

shareShare

Yulu Gan

@yule_gan

2 months ago

Another key advantage of ES fine-tuning is its reliability. It runs stably across seeds, barely depends on hyperparameters, and avoids reward hacking — all while skipping gradients and actor-critic setups. In the figure, you can see ES finds a much better reward–KL balance than

thumb_up_off_alt72

chat_bubble_outline1

repeat4

shareShare

Yulu Gan

@yule_gan

2 months ago

To recap — ES can outperform RL for LLM fine-tuning. No gradients. No reward hacking. Just stability, efficiency, and scalability. ES shows low variance across seeds, minimal hyperparameter sensitivity, and strong reward–KL tradeoffs — all without actor-critic complexity.

thumb_up_off_alt80

chat_bubble_outline0

repeat3

shareShare

Kenneth Stanley

@kenneth0stanley

2 months ago

Nice to see an exploration of the potential for ES (evolution strategies) in LLM fine tuning! Many potential advantages are discussed in this thread from Yulu Gan ✈️ NeurIPS'25 .

thumb_up_off_alt146

chat_bubble_outline4

repeat16

shareShare

hardmaru

@hardmaru

2 months ago

Evolution Strategies can be applied at scale to fine-tune LLMs, and outperforms PPO and GRPO in many model settings! Fantastic paper “Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning” by Yulu Gan ✈️ NeurIPS'25, Risto Miikkulainen and team. arxiv.org/abs/2509.24372

thumb_up_off_alt278

chat_bubble_outline10

repeat35

shareShare

Paul Jarratt

@jarrattp

2 months ago

🧠 AGI: A Reality Check As AGI hype grows (and billionaires build bunkers), we need grounded voices. Cognizant’s Babak Hodjat reminds us: “LLMs don’t have meta-cognition… they don’t know what they know.” bbc.com/news/articles/…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Paul Jarratt

@jarrattp

a month ago

LLMs hit a ceiling on long, complex reasoning. Tiny errors compound fast. Cognizant AI Lab just showed a new path: MAKER, a multi-agent system that solved a 1,000,000-step reasoning task with zero errors. tinyurl.com/2v47u665 #multiagentsystems #LLMs

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Babak Hodjat

@babakatwork

a month ago

MAKER cognizant.com/content/cogniz…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Roberto Dailey

@robertodailey1

a month ago

New work from Cognizant AI lab: Solving a Million-step LLM Task with Zero Errors. Existing LLMs struggle on long task horizons as persistent error rates compound, even when the LLMs know how to solve the task. Apple’s “Illusion of thinking” demonstrated that state of the art

thumb_up_off_alt641

chat_bubble_outline13

repeat97

shareShare

Roberto Dailey

@robertodailey1

a month ago

Our subtask breakdown was to provide an llm agent with the current Towers of Hanoi state and the last move made. The agent would then use first-to-ahead-by-k voting along with abnormal response flagging to decide what move it wanted to do and provide the board state for the next

thumb_up_off_alt20

chat_bubble_outline1

repeat3

shareShare

Roberto Dailey

@robertodailey1

a month ago

Second, as I mentioned earlier, right now this framework is limited to tasks where decomposition is provided. We are preliminarily testing generalized methods that preform both subtasks and task decomposition, and we are seeing promising results on boosting arithmetic abilities

thumb_up_off_alt11

chat_bubble_outline2

repeat1

shareShare

Babak Hodjat

Babak Hodjat

Babak Hodjat

Cognizant News

Environmental Data Science

AK

AK

Yulu Gan

Yulu Gan

Yulu Gan

Yulu Gan

Yulu Gan

Kenneth Stanley

hardmaru

Paul Jarratt

Paul Jarratt

Babak Hodjat

Roberto Dailey

Roberto Dailey

Roberto Dailey