Minyang Tian (@minyangtian1) Twitter Tweets • TwiCopy

Minyang Tian

@minyangtian1

+ Follow

PhD candidate at UIUC, co-advised by @haopeng_nlp and Eliu Huerta @argonne and @UChicago

ID: 1813179338654949379

calendar_today16-07-2024 11:50:26

19 Tweet

129 Followers

115 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Join us on August 14th at 3PM Eastern / 12PM Pacific to learn about the three new benchmarks we've recently released: SciCode, AssistantBench and CiteMe. We will also have some SWE-bench updates. The event will be on Zoom. lu.ma/4240w5us

thumb_up_off_alt28

chat_bubble_outline2

repeat11

shareShare

Ofir Press

@ofirpress

a year ago

Announcing Ofir's Gelato Challenge: At NeurIPS 2024, I will buy gelato for the team that has the highest combined score on SWE-bench Lite, AssistantBench, CiteME, and SciCode. Final submission is by December 3, 2024.

thumb_up_off_alt49

chat_bubble_outline4

repeat8

shareShare

Ofir Press

@ofirpress

10 months ago

SciCode is our new benchmark, with very tough programming challenges written by real scientists. scicode-bench.github.io for more details.

thumb_up_off_alt34

chat_bubble_outline2

repeat3

shareShare

Akari Asai

@akariasai

8 months ago

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 UW NLP Ai2 With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat281

shareShare

Minyang Tian

@minyangtian1

7 months ago

We're presenting SciCode tomorrow (Thu) at the 11 AM poster session, West Ballroom A-D #5204

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Ofir Press

@ofirpress

7 months ago

Thanks everyone for coming to our poster yesterday! Lots of SWE-agent news coming soon. In 30 mins, with Minyang Tian et al we'll present SciCode, a super tough scientific coding benchmark that o1 gets 7% on. West Ballroom A-D #5204. Come through :)

Thanks everyone for coming to our poster yesterday! Lots of SWE-agent news coming soon.

In 30 mins, with <a href="/MinyangTian1/">Minyang Tian</a> et al we'll present SciCode, a super tough scientific coding benchmark that o1 gets 7% on. West Ballroom A-D #5204. Come through :)

thumb_up_off_alt46

chat_bubble_outline1

repeat3

shareShare

Ofir Press

@ofirpress

6 months ago

SciCode is our super tough coding benchmark testing the abilities of LMs to program code based on research in physics/biology/material science/... o1 is the SoTA with 7%. To make it easier to use we're putting it into the Inspect AI format, as a few groups were asking for this.

thumb_up_off_alt50

chat_bubble_outline4

repeat9

shareShare

Ofir Press

@ofirpress

6 months ago

Congrats to o3-mini on setting a new high score on SciCode!! R1 clocks in at an impressive 4.6%, matching Claude 3.5. SciCode is our super-tough programming benchmark written by PhDs in various scientific domains.

thumb_up_off_alt43

chat_bubble_outline10

repeat3

shareShare

Ofir Press

@ofirpress

4 months ago

Proud to see companies starting to use our SciCode to eval LMs. SciCode has some questions taken from Nobel-winning research in physics so it's super exciting to get more people to work on improving these abilities. scicode-bench.github.io

thumb_up_off_alt27

chat_bubble_outline2

repeat2

shareShare

Shivam Agarwal

@shivamag12

2 months ago

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮 At inference EM can beat GPT4o Claude 3 opus & Gemini 1.5 pro on challenging scientific coding w/o any data/model update

thumb_up_off_alt408

chat_bubble_outline12

repeat64

shareShare