dilara (@dilarafsoylu) 's Twitter Profile
dilara

@dilarafsoylu

phd student @StanfordNLP

ID: 1485454823281463297

calendar_today24-01-2022 03:33:37

102 Tweet

348 Followers

1,1K Following

Csordás Róbert (@robert_csordas) 's Twitter Profile Photo

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations.

In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Calling learning natural-language rules “not real learning” is so backwards. Interacting with an environment to generate abstract hypotheses and turn them into actionable natural-language rules is as “learning” as the word’s natural connotations get. Though gradient-based

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Siyan Sylvia Li 🦋 (@sylvia_sparkle) 's Twitter Profile Photo

🎉 Excited to announce that the 4th HCI+NLP workshop will be co-located with @EMNLP in Suzhou, China! 🌍📍 Join us to explore the intersection of human-computer interaction and NLP. 🧵 1/

Brando Miranda (@brandohablando) 's Twitter Profile Photo

🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML! 🚨Why? ≈46 % of GitHub commits are AI-generated—but can we verify them correct? 📢 VeriBench challenges agents; turn Python into Lean code! 🧵1/14 📃 Paper: openreview.net/forum?id=rWkGF…

🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML!

🚨Why?  ≈46 % of GitHub commits are AI-generated—but can we verify them correct?
📢 VeriBench challenges agents; turn Python into Lean code!
🧵1/14
📃 Paper: openreview.net/forum?id=rWkGF…
Ahmed Ahmed (@ahmedsqrd) 's Twitter Profile Photo

Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯 We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below

Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of  a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯

We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below
Harshit Joshi (@harshitj__) 's Twitter Profile Photo

flying to Vienna 🇦🇹 for ACL to present Genie Worksheets (Monday 11am)! come and say hi if you want to talk about how to create controllable and reliable application layers on top of LLMs, knowledge discovery and curation, or just wanna hang

flying to Vienna 🇦🇹 for ACL to present Genie Worksheets (Monday 11am)!

come and say hi if you want to talk about how to create controllable and reliable application layers on top of LLMs, knowledge discovery and curation, or just wanna hang
Omar Shaikh (@oshaikh13) 's Twitter Profile Photo

BREAKING NEWS! Most people aren’t prompting models with IMO problems :) They’re prompting with tasks that need more context, like “plz make talk slides.” In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) & introduce a benchmark: RIFTS. 🧵

BREAKING NEWS! Most people aren’t prompting models with IMO problems :)

They’re prompting with tasks that need more context, like “plz make talk slides.”

In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) & introduce a benchmark: RIFTS.

🧵
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

How does prompt optimization compare to RL algos like GRPO?

GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't.

Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Lakshya A Agrawal Obligatory tagging of Andrej Karpathy's take. Comparing prompt learning against GRPO by Lakshya A Agrawal, Dilara Soylu, Noah Ziems and team. See also earlier evidence of prompt optimization vs. offline RL in a narrower setting dspy.BetterTogether (EMNLP'24)! x.com/karpathy/statu…