Sida Wang (@sidawxyz) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

I’m excited to release a paper (and model weights!) for InCoder: a generative code model that can infill as well as do left-to-right generation. Project page: sites.google.com/view/incoder-c… Demo: huggingface.co/spaces/faceboo… Paper: github.com/dpfried/incode… Thread (1/n):

thumb_up_off_alt495

chat_bubble_outline8

repeat121

shareShare

Freda Shi

@fredahshi

3 years ago

Late post but let’s do this! Happy to share our #EMNLP2022 work on translating natural language to executable code with execution-aware minimum Bayes risk decoding 📝Paper: arxiv.org/pdf/2204.11454… 📇Code: github.com/facebookresear… 📦Data (codex output):dl.fbaipublicfiles.com/mbr-exec/mbr-e… (1/n)

thumb_up_off_alt103

chat_bubble_outline3

repeat19

shareShare

Tianyi Zhang

@tianyi_zh

3 years ago

🧑‍💻Code review is an important practice in software development. We take this idea into code generation and propose Coder-Reviewer reranking. Our method reranks via the product of a coder model p(code|instruction) and a reviewer model p(instruction|code). arxiv.org/abs/2211.16490

thumb_up_off_alt134

chat_bubble_outline3

repeat29

shareShare

Ansong Ni

@ansongni

2 years ago

Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. 🧵👇 (1/n)

thumb_up_off_alt244

chat_bubble_outline3

repeat43

shareShare

Alex Gu @ iclr

@minimario1729

2 years ago

📢Introducing CRUXEval, a benchmark to measure Python code execution! 🏠Homepage: crux-eval.github.io 📜Paper: crux-eval.github.io/paper/cruxeval… 🏆Leaderboard: crux-eval.github.io/leaderboard.ht… 🔎Sample Explorer: crux-eval.github.io/demo.html 📊HF Dataset: huggingface.co/datasets/cruxe…

thumb_up_off_alt245

chat_bubble_outline5

repeat48

shareShare

Naman Jain @ ICLR

@stringchaos

a year ago

📢📢Excited to introduce our new work LiveCodeBench! 📈 Live evaluations to ensure fairness and reliability 🔍 Holistic evaluations using 4 code-related scenarios 💡Insights from comparing 20+ code models 🚨🚨We use problem release dates to detect and prevent contamination

thumb_up_off_alt211

chat_bubble_outline9

repeat45

shareShare

Jonas Gehring

@jnsgehring

10 months ago

LLMs for code should do much better if they can iterate on tests -- but they don't. Our new work (RLEF) addresses this with execution feedback at RL *training time* to use execution feedback at *inference time*. arxiv.org/abs/2410.02089 is just out! 1/6

thumb_up_off_alt341

chat_bubble_outline7

repeat88

shareShare

John Yang

@jyangballin

10 months ago

We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 17 JavaScript repos - Each task has an image! Existing agents struggle here! We present SWE-agent Multimodal to remedy some issues Led w/ carlos 🧵

thumb_up_off_alt268

chat_bubble_outline8

repeat62

shareShare

Gabriel Synnaeve

@syhw

5 months ago

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution arxiv.org/abs/2502.18449 by Yuxiang Wei Sida Wang and the whole team! Get started with your favorite model here github.com/facebookresear…

thumb_up_off_alt119

chat_bubble_outline1

repeat29

shareShare