Lechen Zhang (@leczhang) 's Twitter Profile
Lechen Zhang

@leczhang

Incoming CS PhD @UofIllinois | MSc @UMich | BEng @SJTU1896.
Interested in #NLProc & #AI.

ID: 1566273820817768449

linkhttps://leczhang.com/ calendar_today04-09-2022 03:56:09

51 Tweet

97 Followers

303 Following

Hua Shen✨ (@huashen218) 's Twitter Profile Photo

🚀 Are you passionate about #Alignment Research? Exciting news! Join us at the ICLR 2025 Workshop on 👫<>🤖Bidirectional Human-AI Alignment (April 27 or 28, Singapore). We're inviting researchers in AI, HCI, NLP, Speech, Vision, Social Science, and beyond domains to submit their

🚀 Are you passionate about #Alignment Research? Exciting news! Join us at the ICLR 2025 Workshop on 👫&lt;&gt;🤖Bidirectional Human-AI Alignment (April 27 or 28, Singapore).  We're inviting researchers in AI, HCI, NLP, Speech, Vision, Social Science, and beyond domains to submit their
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

Happy New Year! On the last day of 2024, I want to take a moment to reflect on what’s ahead in 2025. I don’t want to talk about buzzwords like "agents", instead, I’d like to summarize my thoughts with three keywords: Interactivity, Efficiency, and Humans. - Interactivity: O1&3

Andrew Lee (@a_jy_l) 's Twitter Profile Photo

New paper 🥳🚨 Interested in inference-time scaling? In-context Learning? Mech Interp? LMs can solve novel in-context tasks, with sufficient examples (longer contexts). Why? Bcus they dynamically form *in-context representations*! 1/N

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

🚀 DeepSeek-R1 is here!

⚡ Performance on par with OpenAI-o1
📖 Fully open-source model &amp; technical report
🏆 MIT licensed: Distill &amp; commercialize freely!

🌐 Website &amp; API are live now! Try DeepThink at chat.deepseek.com today!

🐋 1/n
Jungsoo Park (@jungsoo___park) 's Twitter Profile Photo

🚨 Just Out Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior? We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵

🚨 Just Out

Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior?

We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵
Hong Chen (@_hong_chen) 's Twitter Profile Photo

How accurately do citations reflect the original research? Do authors truly engage with what they cite? In a new study [arxiv.org/abs/2502.20581] with David Jurgens is now on BlueSky only and Misha Teplitskiy | Science of Science, we analyze millions of citation sentence pairs to measure citation fidelity and reveal how

How accurately do citations reflect the original research? Do authors truly engage with what they cite?

In a new study [arxiv.org/abs/2502.20581] with <a href="/david__jurgens/">David Jurgens is now on BlueSky only</a> and <a href="/MishaTeplitskiy/">Misha Teplitskiy | Science of Science</a>, we analyze millions of citation sentence pairs to measure citation fidelity and reveal how
Yunxiang Zhang (@yunxiangzhang4) 's Twitter Profile Photo

🚨 New Benchmark Drop! Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions? We built MLRC-BENCH to find out. Paper: arxiv.org/abs/2504.09702 Leaderboard: huggingface.co/spaces/launch/… Code: github.com/yunx-z/MLRC-Be…

🚨 New Benchmark Drop!
Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions?
We built MLRC-BENCH to find out.
Paper: arxiv.org/abs/2504.09702
Leaderboard: huggingface.co/spaces/launch/…
Code: github.com/yunx-z/MLRC-Be…
Ayoung Lee (@o_cube01) 's Twitter Profile Photo

📢New benchmark out! We introduce CLASH, a benchmark of 345💥high-stakes dilemmas and 3,795 perspectives to evaluate how well LLMs handle complex value reasoning. GPT-4 and Claude? Not quite there. 📄 arxiv.org/pdf/2504.10823 🤗 huggingface.co/datasets/launc…

📢New benchmark out!

We introduce CLASH, a benchmark of 345💥high-stakes dilemmas and 3,795 perspectives to evaluate how well LLMs handle complex value reasoning.

GPT-4 and Claude? Not quite there.

📄 arxiv.org/pdf/2504.10823
🤗 huggingface.co/datasets/launc…
Hua Shen✨ (@huashen218) 's Twitter Profile Photo

✨Personal Milestone✨ Thrilled to share I’ll be a tenure-track Assistant Professor in Computer Science at NYU Shanghai, affiliated with NYU Tandon, starting Fall 2025! 😊🌏NYU Shanghai, NYU Tandon, New York University 🧠I’ll be recruiting students via NYU Courant CS & NYU Tandon CSE

Qwen (@alibaba_qwen) 's Twitter Profile Photo

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

Introducing Qwen3! 

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general
Aparna Ananthasubramaniam (@aparnaananth729) 's Twitter Profile Photo

How did #IGotAThingFor become a thing? Louise Zhu David Jurgens is now on BlueSky only Daniel Romero and I explored the roles of networks and identity in the adoption of hashtags in our new The Web Conference paper (Poster 01, Thu 5pm)! dl.acm.org/doi/pdf/10.114… #www2025 #thewebconf2025 1/9

Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨 The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. 🌐 scalr-workshop.github.io

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨

The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to <a href="/COLM_conf/">Conference on Language Modeling</a>  in Montreal this October!

This is the first workshop dedicated to this growing research area.

🌐 scalr-workshop.github.io
Jiaxin Pei (@jiaxin_pei) 's Twitter Profile Photo

AI Shopping/Sales Agents sound very cool! But what if both the buyer and seller use AI agents? Our recent study found that stronger agents can exploit weaker ones to get a better deal, and delegating negotiation to AI agents might lead to economic losses. arxiv.org/abs/2506.00073

AI Shopping/Sales Agents sound very cool! But what if both the buyer and seller use AI agents? Our recent study found that stronger agents can exploit weaker ones to get a better deal, and delegating negotiation to AI agents might lead to economic losses.
arxiv.org/abs/2506.00073
Omar Shaikh (@oshaikh13) 's Twitter Profile Photo

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

Tal August (@tal_august) 's Twitter Profile Photo

🚨Calling all writing tutors & instructors! Can writing tools give guidance, but not text suggestions? We built a prototype based on conversations with tutors, and would love your thoughts: 📝 Try it out tinyurl.com/writor-system 🧾 Take a short survey to enter a $20 raffle

Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling &amp; Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a>  is approaching!🚨

scalr-workshop.github.io

🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. 

Topics
Kai Zou (@zkjzou) 's Twitter Profile Photo

🔥 Excited to introduce ManyICLBench (ACL 2025) 🧐 Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar examples or learn from many examples? We carefully analyzed numerous tasks and categorized them. 📄 Paper: arxiv.org/abs/2411.07130 #ACL2025

Wei Hu (@weihu_) 's Twitter Profile Photo

What happens behind the "abrupt learning" curve in Transformer training? Our new work (led by Pulkit Gopalani) reveals universal characteristics of Transformers' early-phase training dynamics—uncovering the implicit biases and the degenerate state the model gets stuck in. ⬇️