BigCode (@bigcodeproject) Twitter Tweets • TwiCopy

ServiceNow Research

2 years ago

It’s been a year since the release of BigCode’s 💫 StarCoder models and paper: May the source be with you! Join us as we celebrate the anniversary, and share what you’ve done using #StarCoder. Read how StarCoder has helped ServiceNow developers: servicenow.com/blogs/2024/big…

thumb_up_off_alt23

chat_bubble_outline0

repeat7

shareShare

Philipp Schmid

@_philschmid

2 years ago

Self-Instruct for CodeLLMs! 👀 BigCode released a new StarCoder2-Instruct, the first entirely self-aligned code LLM trained with a transparent and permissive pipeline. 🧑🏻‍💻 It used itself to generate thousands of instruction-response pairs, which were then used to

Self-Instruct for CodeLLMs! 👀 <a href="/BigCodeProject/">BigCode</a> released a new StarCoder2-Instruct, the first entirely self-aligned code LLM trained with a transparent and permissive pipeline. 🧑🏻‍💻 It used itself to generate thousands of instruction-response pairs, which were then used to

thumb_up_off_alt160

chat_bubble_outline2

repeat34

shareShare

Daniel van Strien

@vanstriendaniel

a year ago

Check out this collection I made to show what you can create using this pipeline. It focuses on a sentence transformer model for detecting coding prompt similarities in a BigCode dataset. huggingface.co/collections/da…

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Ksenia Se

@kseniase_

a year ago

Foundation models are central to AI's impact on the economy and society, making transparency crucial for accountability and understanding. The Foundation Model Transparency Index v1.1 (FMTI) analyzes transparency of 14 leading foundation model developers with 100 indicators. 🧵

thumb_up_off_alt25

chat_bubble_outline3

repeat9

shareShare

Imperial Open Access

@oaimperial

a year ago

Going "Beyond #OpenResearch" in session 3, Jennifer Ding (The Alan Turing Institute) discusses how we can scrutinise the reliability of data-based applications like #models & #AI and the key role of public involvement. Who has a stake in the technology and at which points can they be involved?

Going "Beyond #OpenResearch" in session 3, <a href="/jen_gineered/">Jennifer Ding</a> (<a href="/turinginst/">The Alan Turing Institute</a>) discusses how we can scrutinise the reliability of data-based applications like #models & #AI and the key role of public involvement. Who has a stake in the technology and at which points can they be involved?

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Terry Yue Zhuo

@terryyuezhuo

a year ago

In the past few months, we’ve seen SOTA LLMs saturating basic coding benchmarks with short and simplified coding tasks. It's time to enter the next stage of coding challenge under comprehensive and realistic scenarios! -- Here comes BigCodeBench, benchmarking LLMs on solving

thumb_up_off_alt124

chat_bubble_outline1

repeat38

shareShare

Philipp Schmid

@_philschmid

a year ago

It is time to deprecate HumanEval! 🧑🏻‍💻 BigCode just released BigCodeBench, a new benchmark to evaluate LLMs on challenging and complex coding tasks focused on realistic, function-level tasks that require the use of diverse libraries and complex reasoning! 👀 🧩 Contains

It is time to deprecate HumanEval! 🧑🏻‍💻 <a href="/BigCodeProject/">BigCode</a> just released BigCodeBench, a new benchmark to evaluate LLMs on challenging and complex coding tasks focused on realistic, function-level tasks that require the use of diverse libraries and complex reasoning! 👀

🧩 Contains

thumb_up_off_alt243

chat_bubble_outline4

repeat53

shareShare

Terry Yue Zhuo

@terryyuezhuo

a year ago

Ppl are curious about the performance of DeepSeek-Coder-V2-Lite on BigCodeBench. We've added its results, along with a few other models, to the leaderboard! huggingface.co/spaces/bigcode… DeepSeek-Coder-V2-Lite-Instruct is a beast indeed, similar to Magicoder-S-DS-6.7B, but with only

thumb_up_off_alt25

chat_bubble_outline0

repeat8

shareShare

Rajiv Shah

@rajistics

a year ago

BigCodeBench dataset🌸 Use it as inspiration when building your Generative AI evaluations. BigCodeBench h/t: BigCode Terry Yue Zhuo @ SF 🏖️ Leandro von Werra Clémentine Fourrier 🍊 Hugging Face (to name just a few of the people involved)

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

BigCode

@bigcodeproject

a year ago

Releasing BigCodeBench-Hard: a subset of more challenging and user-facing tasks. BigCodeBench-Hard provides more accurate model performance evaluations and we also investigate some recent model updates. Read more: huggingface.co/blog/terryyz/b… Leaderboard: huggingface.co/spaces/bigcode…

thumb_up_off_alt99

chat_bubble_outline0

repeat23

shareShare

Terry Yue Zhuo

@terryyuezhuo

a year ago

Today, we are happy to announce the beta mode of real-time Code Execution for BigCodeBench BigCode, which has been integrated into our Hugging Face leaderboard. We understand that setting up a dependency-based execution environment can be cumbersome, even with the

Today, we are happy to announce the beta mode of real-time Code Execution for BigCodeBench <a href="/BigCodeProject/">BigCode</a>, which has been integrated into our Hugging Face leaderboard.

We understand that setting up a dependency-based execution environment can be cumbersome, even with the

thumb_up_off_alt50

chat_bubble_outline1

repeat14

shareShare

Arjun Guha

@arjunguha

a year ago

This work will appear at OOPSLA 2024. New since last year: the StarCoder2 LLM from BigCode uses MultiPL-T as part of its pretraining corpus.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Qian Liu

@sivil_taram

a year ago

By popular demand, I have released the StarCoder2 code documentation dataset, please check it out ⬇️ hf.co/datasets/Sivil…

thumb_up_off_alt50

chat_bubble_outline0

repeat11

shareShare

Terry Yue Zhuo

@terryyuezhuo

a year ago

People may think BigCodeBench BigCode is nothing more than a straightforward coding benchmark, but it is not. BigCodeBench is a rigorous testbed for LLM agents using code to solve complex and practical challenges. Each task demands significant reasoning capabilities for

thumb_up_off_alt41

chat_bubble_outline5

repeat9

shareShare

Josh

@joshpurtell

a year ago

Evaluating LM agents has come a long way since gpt-4 released in March of 2023. We now have SWE-Bench, (Visual) Web Arena, and other evaluations that tell us a lot about how the best models + architectures do on hard and important tasks. There's still lots to do, though 🧵

thumb_up_off_alt43

chat_bubble_outline2

repeat11

shareShare

Terry Yue Zhuo

@terryyuezhuo

a year ago

BigCodeBench BigCode evaluation framework has been fully upgraded! Just pip install -U bigcodebench With v0.2.0, it's now much easier to use compared to the previous v0.1.* versions. The new version adopts the Gradio Client API interface from Hugging Face Spaces by

BigCodeBench <a href="/BigCodeProject/">BigCode</a> evaluation framework has been fully upgraded! Just pip install -U bigcodebench

With v0.2.0, it's now much easier to use compared to the previous v0.1.* versions. The new version adopts the <a href="/Gradio/">Gradio</a> Client API interface from <a href="/huggingface/">Hugging Face</a> Spaces by

thumb_up_off_alt32

chat_bubble_outline1

repeat6

shareShare

Terry Yue Zhuo

@terryyuezhuo

10 months ago

Happy to release SWE Arena, your vibe coding platform! SWE Arena supports real-time code execution and rendering, covering various frontier LLMs & VLMs! We actually had this idea two years ago inside BigCode with Arjun Guha and Daniel Fried. However, there wasn't much tech

thumb_up_off_alt45

chat_bubble_outline3

repeat12

shareShare