Hadi Khalaf (@hskhalaf) Twitter Tweets • TwiCopy

Hadi Khalaf

@hskhalaf

+ Follow

PhD student @ Harvard SEAS, thinking about alignment, information theory, and the likes

ID: 1888695141298257921

calendar_today09-02-2025 21:03:47

12 Tweet

12 Followers

23 Following

Hadi Khalaf

@hskhalaf

9 months ago

I used to see llama as a base model in most experiments, now qwen has taken over. Diversity in base models in experiments is much much more valuable than any hyperparam tuning or extra runs!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

9 months ago

Is there an LLM out there that asks follow-up questions? 😅 Would be my go-to if it exists

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

9 months ago

Is it still cool to do PPO/DPO or must I do reasoning

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

On my reading list this week: "the first theoretical result on how to identify the ideal depth for safety alignment... indicating that broader ensembles can compensate for shallower alignments"!!!! arxiv.org/abs/2502.00669

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

8 months ago

Yes 👍🏼

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

7 months ago

Does anyone like arxiv html? I immediately switch to the pdf view

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

7 months ago

gemini is crazy at coding, insanely crazy good

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ai2

@allen_ai

7 months ago

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

thumb_up_off_alt659

chat_bubble_outline11

repeat121

shareShare

Sara Hooker

@sarahookr

6 months ago

It is critical for scientific integrity that we trust our measure of progress. The lmarena.ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on lmarena.ai, despite best intentions.

It is critical for scientific integrity that we trust our measure of progress.

The <a href="/lmarena_ai/">lmarena.ai</a> has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on <a href="/lmarena_ai/">lmarena.ai</a>, despite best intentions.

thumb_up_off_alt712

chat_bubble_outline21

repeat132

shareShare

Hadi Khalaf

@hskhalaf

4 months ago

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

4 months ago

@ whoever is on the google ai studio team: please fix the chat history never being saved! i cannot access most of my gemini conversations... and this has been an issue since january 🫤

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hadi Khalaf

@hskhalaf

3 months ago

I judge llms by how bayesian they are #1 gemini 2.5 pro (channeling bayes himself) #2 gpt 5 #3 o3 #4 gpt 4o Please stop the bayesian propaganda

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare