BigCode (@bigcodeproject) 's Twitter Profile
BigCode

@bigcodeproject

Open and responsible research and development of large language models for code. #BigCodeProject run by @huggingface + @ServiceNowRSRCH

ID: 1554445522664148993

linkhttp://www.bigcode-project.org calendar_today02-08-2022 12:34:57

266 Tweet

9,9K Followers

3 Following

ServiceNow Research (@servicenowrsrch) 's Twitter Profile Photo

It’s been a year since the release of BigCode’s 💫 StarCoder models and paper: May the source be with you! Join us as we celebrate the anniversary, and share what you’ve done using #StarCoder. Read how StarCoder has helped ServiceNow developers: servicenow.com/blogs/2024/big…

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Self-Instruct for CodeLLMs! 👀 BigCode released a new StarCoder2-Instruct, the first entirely self-aligned code LLM trained with a transparent and permissive pipeline. 🧑🏻‍💻 It used itself to generate thousands of instruction-response pairs, which were then used to

Self-Instruct for CodeLLMs! 👀 <a href="/BigCodeProject/">BigCode</a> released a new StarCoder2-Instruct, the first entirely self-aligned code LLM trained with a transparent and permissive pipeline. 🧑🏻‍💻 It used itself to generate thousands of instruction-response pairs, which were then used to
Daniel van Strien (@vanstriendaniel) 's Twitter Profile Photo

Check out this collection I made to show what you can create using this pipeline. It focuses on a sentence transformer model for detecting coding prompt similarities in a BigCode dataset. huggingface.co/collections/da…

Ksenia Se (@kseniase_) 's Twitter Profile Photo

Foundation models are central to AI's impact on the economy and society, making transparency crucial for accountability and understanding. The Foundation Model Transparency Index v1.1 (FMTI) analyzes transparency of 14 leading foundation model developers with 100 indicators. 🧵

Foundation models are central to AI's impact on the economy and society, making transparency crucial for accountability and understanding.

The Foundation Model Transparency Index v1.1 (FMTI) analyzes transparency of 14 leading foundation model developers with 100 indicators.

🧵
Imperial Open Access (@oaimperial) 's Twitter Profile Photo

Going "Beyond #OpenResearch" in session 3, Jennifer Ding (The Alan Turing Institute) discusses how we can scrutinise the reliability of data-based applications like #models & #AI and the key role of public involvement. Who has a stake in the technology and at which points can they be involved?

Going "Beyond #OpenResearch" in session 3, <a href="/jen_gineered/">Jennifer Ding</a> (<a href="/turinginst/">The Alan Turing Institute</a>) discusses how we can scrutinise the reliability of data-based applications like #models &amp; #AI and the key role of public involvement. Who has a stake in the technology and at which points can they be involved?
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

In the past few months, we’ve seen SOTA LLMs saturating basic coding benchmarks with short and simplified coding tasks. It's time to enter the next stage of coding challenge under comprehensive and realistic scenarios! -- Here comes BigCodeBench, benchmarking LLMs on solving

In the past few months, we’ve seen SOTA LLMs saturating basic coding benchmarks with short and simplified coding tasks. It's time to enter the next stage of coding challenge under comprehensive and realistic scenarios! 

-- Here comes BigCodeBench, benchmarking LLMs on solving
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

It is time to deprecate HumanEval! 🧑🏻‍💻 BigCode just released BigCodeBench, a new benchmark to evaluate LLMs on challenging and complex coding tasks focused on realistic, function-level tasks that require the use of diverse libraries and complex reasoning! 👀 🧩 Contains

It is time to deprecate HumanEval! 🧑🏻‍💻 <a href="/BigCodeProject/">BigCode</a>  just released BigCodeBench, a new benchmark to evaluate LLMs on challenging and complex coding tasks focused on realistic, function-level tasks that require the use of diverse libraries and complex reasoning! 👀

🧩 Contains
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Ppl are curious about the performance of DeepSeek-Coder-V2-Lite on BigCodeBench. We've added its results, along with a few other models, to the leaderboard! huggingface.co/spaces/bigcode… DeepSeek-Coder-V2-Lite-Instruct is a beast indeed, similar to Magicoder-S-DS-6.7B, but with only

Ppl are curious about the performance of DeepSeek-Coder-V2-Lite on BigCodeBench. We've added its results, along with a few other models, to the leaderboard! huggingface.co/spaces/bigcode…

DeepSeek-Coder-V2-Lite-Instruct is a beast indeed, similar to Magicoder-S-DS-6.7B, but with only
BigCode (@bigcodeproject) 's Twitter Profile Photo

Releasing BigCodeBench-Hard: a subset of more challenging and user-facing tasks. BigCodeBench-Hard provides more accurate model performance evaluations and we also investigate some recent model updates. Read more: huggingface.co/blog/terryyz/b… Leaderboard: huggingface.co/spaces/bigcode…

Releasing BigCodeBench-Hard: a subset of more challenging and user-facing tasks.

BigCodeBench-Hard provides more accurate model performance evaluations and we also investigate some recent model updates.

Read more: huggingface.co/blog/terryyz/b…
Leaderboard: huggingface.co/spaces/bigcode…
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Today, we are happy to announce the beta mode of real-time Code Execution for BigCodeBench BigCode, which has been integrated into our Hugging Face leaderboard. We understand that setting up a dependency-based execution environment can be cumbersome, even with the

Today, we are happy to announce the beta mode of real-time Code Execution for BigCodeBench <a href="/BigCodeProject/">BigCode</a>, which has been integrated into our Hugging Face leaderboard.

We understand that setting up a dependency-based execution environment can be cumbersome, even with the
Arjun Guha (@arjunguha) 's Twitter Profile Photo

This work will appear at OOPSLA 2024. New since last year: the StarCoder2 LLM from BigCode uses MultiPL-T as part of its pretraining corpus.

Qian Liu (@sivil_taram) 's Twitter Profile Photo

By popular demand, I have released the StarCoder2 code documentation dataset, please check it out ⬇️ hf.co/datasets/Sivil…

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

People may think BigCodeBench BigCode is nothing more than a straightforward coding benchmark, but it is not. BigCodeBench is a rigorous testbed for LLM agents using code to solve complex and practical challenges. Each task demands significant reasoning capabilities for

Josh (@joshpurtell) 's Twitter Profile Photo

Evaluating LM agents has come a long way since gpt-4 released in March of 2023. We now have SWE-Bench, (Visual) Web Arena, and other evaluations that tell us a lot about how the best models + architectures do on hard and important tasks. There's still lots to do, though 🧵

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

BigCodeBench BigCode evaluation framework has been fully upgraded! Just pip install -U bigcodebench With v0.2.0, it's now much easier to use compared to the previous v0.1.* versions. The new version adopts the Gradio Client API interface from Hugging Face Spaces by

BigCodeBench <a href="/BigCodeProject/">BigCode</a>  evaluation framework has been fully upgraded! Just pip install -U bigcodebench

With v0.2.0, it's now much easier to use compared to the previous v0.1.* versions. The new version adopts the <a href="/Gradio/">Gradio</a>  Client API interface from <a href="/huggingface/">Hugging Face</a>  Spaces by
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Happy to release SWE Arena, your vibe coding platform! SWE Arena supports real-time code execution and rendering, covering various frontier LLMs & VLMs! We actually had this idea two years ago inside BigCode with Arjun Guha and Daniel Fried. However, there wasn't much tech