Ting-En Lin (@tnlin_tw) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Yujia Qin@ICLR2025

@tsingyoga

a year ago

我们离理想的AutoGPT还有多远？[1/4] AutoGPT[1]已经163k star了，AutoGPT的开发者雕花了一年多，但它仍然停留在demo阶段，算不上产品（即使面向开发者)。这和传统开源软件的发展轨迹相差甚远，核心原因是Agent的上限由底座模型决定

thumb_up_off_alt276

chat_bubble_outline20

repeat54

shareShare

Excited to share what I did Sierra with Noah Shinn pedram and Karthik Narasimhan ! 𝜏-bench evaluates critical agent capabilities omitted by current benchmarks: robustness, complex rule following, and human interaction skills. Try it out!

thumb_up_off_alt58

chat_bubble_outline2

repeat11

shareShare

Keming (Luke) Lu @ ICLR2025

@keminglu612

a year ago

How can we improve instruction-following abilities without manual efforts? 🤔️ We present Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models from kabi ! Paper: arxiv.org/pdf/2406.13542 More⬇️

thumb_up_off_alt130

chat_bubble_outline10

repeat28

shareShare

Ting-En Lin

@tnlin_tw

a year ago

Lingma Agent achieved a new SOTA on SWE-Bench Lite! (33%). 🚀 Congratulations on this remarkable achievement. 👏

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Amrith Setlur

@setlur_amrith

a year ago

🚨 Interested in synthetic data and LLM reasoning? Our new work studies scaling laws for synthetic data and RL for math reasoning. TLDR: Step-level RL (per-step DPO in fig) on self-generated answers improves sample efficiency of synthetic data by 8x! arxiv.org/abs/2406.14532 1/🧵

thumb_up_off_alt147

chat_bubble_outline3

repeat42

shareShare

theblackat102

@zraytam

a year ago

Bit bored decided to use some time type this new paper my team recently just pub, if you are working IRL deployment of LLM service you should check it out arxiv.org/abs/2406.08747

thumb_up_off_alt4

chat_bubble_outline2

repeat1

shareShare

Yujia Qin@ICLR2025

@tsingyoga

a year ago

视觉-语言模型(VLM)领域在研究些什么？🧐 VLM是一个从去年末开始快速发展的领域，对研究者来说尚有大量“金矿”未被发掘，且当前探索仍然非常初步，对大模型的初学者上手难度较小🥰 以下是帮你快速掌握VLM领域目前发展的文章推荐📰： 1.

thumb_up_off_alt318

chat_bubble_outline12

repeat70

shareShare

Cheng Han Chiang (姜成翰)

@dcml0714

a year ago

❗ New Paper❗ 📄 In '23, we proposed LLM-as-judge for NLP research 🤔 Any real-world applications? 💯 Now, we use LLM as an automatic assignment evaluator in a course with 1000+ students at National Taiwan University, led by Hung-yi Lee (李宏毅) with me as a TA 🔗 arxiv.org/abs/2407.05216

thumb_up_off_alt50

chat_bubble_outline1

repeat8

shareShare

Yam Peleg

@yampeleg

a year ago

Full benchmarks for LLaMA-3.1 (both instruct and base):

thumb_up_off_alt257

chat_bubble_outline14

repeat29

shareShare

Ting-En Lin

@tnlin_tw

a year ago

I’ll be at ACL 2024 in Thailand from August 11-16, sharing about Tongyi CoAI at our booth. Come join us and let’s connect! 🎉🌟 #ACL2024 #NLProc ACL 2025

thumb_up_off_alt11

chat_bubble_outline2

repeat1

shareShare

Ting-En Lin

@tnlin_tw

a year ago

I will be sharing about "Tongyi CoAI: Your Personalized Conversational Agent for Complex Applications" at our ACL 2024 booth this afternoon (Aug 12, 15:30). Come join us and let’s connect! #ACL2024 #NLProc ACL 2025

thumb_up_off_alt13

chat_bubble_outline1

repeat2

shareShare

Yingwei Ma

@yingweim98560

9 months ago

🚀 Open source model first autonomously resolved over 30% of real GitHub issues (on SWE-bench Verified) 🌟 Announcing Lingma SWE-GPT: an open development-process-centric llm for automated software improvement! 📄 Paper: arxiv.org/abs/2411.00622 💻 Code: github.com/LingmaTongyi/L…

thumb_up_off_alt8

chat_bubble_outline1

repeat3

shareShare

OpenAI

@openai

8 months ago

GPT-4o got an update 🎉 The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses.

thumb_up_off_alt14,14K

chat_bubble_outline977

repeat1,1K

shareShare

OpenAI

@openai

8 months ago

12 days. 12 livestreams. A bunch of new things, big and small. 12 Days of OpenAI starts tomorrow.

thumb_up_off_alt19,19K

chat_bubble_outline1,1K

repeat2,2K

shareShare

Jason Wei

@_jasonwei

7 months ago

Nice paper from Deepmind takes a fresh angle on factuality: arxiv.org/abs/2501.03200 While most existing factuality datasets focus on public world knowledge, this paper evaluates whether responses are consistent with a provided document as context. This is an elegant and

thumb_up_off_alt721

chat_bubble_outline18

repeat114

shareShare

Haibin

@eric_haibin_lin

4 months ago

Recent updates on verl project (RL lib for LLMs): Engine: - Megatron qwen & GRPO support, v0.11 upgrade - vllm v0.7 integration with v1 mode - experimental sglang integration Algorithm & recipes: - vision language reasoning with qwen2.5-vl - PRIME, RLOO, remax, math-verify

thumb_up_off_alt297

chat_bubble_outline3

repeat29

shareShare

Ting-En Lin

Gate.io

Yujia Qin@ICLR2025

Shunyu Yao

Keming (Luke) Lu @ ICLR2025

Ting-En Lin

Amrith Setlur

theblackat102

Yujia Qin@ICLR2025

Cheng Han Chiang (姜成翰)

Yam Peleg

Ting-En Lin

Ting-En Lin

Yingwei Ma

OpenAI

OpenAI

Jason Wei

Haibin