Lechen Zhang (@leczhang) Twitter Tweets • TwiCopy

Hua Shen✨

a year ago

🚀 Are you passionate about #Alignment Research? Exciting news! Join us at the ICLR 2025 Workshop on 👫<>🤖Bidirectional Human-AI Alignment (April 27 or 28, Singapore). We're inviting researchers in AI, HCI, NLP, Speech, Vision, Social Science, and beyond domains to submit their

thumb_up_off_alt159

chat_bubble_outline4

repeat36

shareShare

Xin Eric Wang @ ICLR 2025

@xwang_lk

10 months ago

Happy New Year! On the last day of 2024, I want to take a moment to reflect on what’s ahead in 2025. I don’t want to talk about buzzwords like "agents", instead, I’d like to summarize my thoughts with three keywords: Interactivity, Efficiency, and Humans. - Interactivity: O1&3

thumb_up_off_alt39

chat_bubble_outline2

repeat4

shareShare

Andrew Lee

@a_jy_l

10 months ago

New paper 🥳🚨 Interested in inference-time scaling? In-context Learning? Mech Interp? LMs can solve novel in-context tasks, with sufficient examples (longer contexts). Why? Bcus they dynamically form *in-context representations*! 1/N

thumb_up_off_alt384

chat_bubble_outline6

repeat71

shareShare

DeepSeek

@deepseek_ai

9 months ago

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

thumb_up_off_alt37,37K

chat_bubble_outline2,2K

repeat7,7K

shareShare

Shivani Kumar

@shivani_220

8 months ago

Can AI grasp how humans across cultures reason through moral dilemmas? ✨Meet UniMoral-a unique multilingual dataset merging psychology & NLP to model moral reasoning. It enables LLMs to reason about decisions and their ethical implications across languages. 🧵(1/n)

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Jungsoo Park

@jungsoo___park

8 months ago

🚨 Just Out Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior? We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵

thumb_up_off_alt39

chat_bubble_outline1

repeat12

shareShare

Hong Chen

@_hong_chen

8 months ago

How accurately do citations reflect the original research? Do authors truly engage with what they cite? In a new study [arxiv.org/abs/2502.20581] with David Jurgens is now on BlueSky only and Misha Teplitskiy | Science of Science, we analyze millions of citation sentence pairs to measure citation fidelity and reveal how

thumb_up_off_alt278

chat_bubble_outline8

repeat77

shareShare

Yunxiang Zhang

@yunxiangzhang4

6 months ago

🚨 New Benchmark Drop! Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions? We built MLRC-BENCH to find out. Paper: arxiv.org/abs/2504.09702 Leaderboard: huggingface.co/spaces/launch/… Code: github.com/yunx-z/MLRC-Be…

thumb_up_off_alt102

chat_bubble_outline3

repeat35

shareShare

Ayoung Lee

@o_cube01

6 months ago

📢New benchmark out! We introduce CLASH, a benchmark of 345💥high-stakes dilemmas and 3,795 perspectives to evaluate how well LLMs handle complex value reasoning. GPT-4 and Claude? Not quite there. 📄 arxiv.org/pdf/2504.10823 🤗 huggingface.co/datasets/launc…

thumb_up_off_alt80

chat_bubble_outline3

repeat24

shareShare

Hua Shen✨

@huashen218

6 months ago

✨Personal Milestone✨ Thrilled to share I’ll be a tenure-track Assistant Professor in Computer Science at NYU Shanghai, affiliated with NYU Tandon, starting Fall 2025! 😊🌏NYU Shanghai, NYU Tandon, New York University 🧠I’ll be recruiting students via NYU Courant CS & NYU Tandon CSE

thumb_up_off_alt414

chat_bubble_outline73

repeat16

shareShare

Qwen

@alibaba_qwen

6 months ago

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

thumb_up_off_alt7,7K

chat_bubble_outline316

repeat1,1K

shareShare

Aparna Ananthasubramaniam

@aparnaananth729

6 months ago

How did #IGotAThingFor become a thing? Louise Zhu David Jurgens is now on BlueSky only Daniel Romero and I explored the roles of networks and identity in the adoption of hashtags in our new The Web Conference paper (Poster 01, Thu 5pm)! dl.acm.org/doi/pdf/10.114… #www2025 #thewebconf2025 1/9

thumb_up_off_alt4

chat_bubble_outline9

repeat4

shareShare

Muhammad Khalifa

@mkhalifaaaa

5 months ago

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨 The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. 🌐 scalr-workshop.github.io

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨

The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to <a href="/COLM_conf/">Conference on Language Modeling</a> in Montreal this October!

This is the first workshop dedicated to this growing research area.

🌐 scalr-workshop.github.io

thumb_up_off_alt44

chat_bubble_outline1

repeat17

shareShare

Jiaxin Pei

@jiaxin_pei

5 months ago

AI Shopping/Sales Agents sound very cool! But what if both the buyer and seller use AI agents? Our recent study found that stronger agents can exploit weaker ones to get a better deal, and delegating negotiation to AI agents might lead to economic losses. arxiv.org/abs/2506.00073

thumb_up_off_alt88

chat_bubble_outline2

repeat15

shareShare

Omar Shaikh

@oshaikh13

5 months ago

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

thumb_up_off_alt181

chat_bubble_outline12

repeat57

shareShare

Jie Ruan

@jieruan75

5 months ago

🔍LLMs now give medical diagnoses, legal advice, and even tackle scientific problems. ❓Your LLM sounds smart. But what if it’s just good at faking expertise? 🚀We built ExpertLongBench to find out. 📉And the results? They revealed several concerns.👇 🔗 huggingface.co/spaces/launch/…

thumb_up_off_alt19

chat_bubble_outline1

repeat11

shareShare

Tal August

@tal_august

5 months ago

🚨Calling all writing tutors & instructors! Can writing tools give guidance, but not text suggestions? We built a prototype based on conversations with tutors, and would love your thoughts: 📝 Try it out tinyurl.com/writor-system 🧾 Take a short survey to enter a $20 raffle

thumb_up_off_alt9

chat_bubble_outline0

repeat6

shareShare

Muhammad Khalifa

@mkhalifaaaa

4 months ago

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a> is approaching!🚨

scalr-workshop.github.io

🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24.

Topics

thumb_up_off_alt16

chat_bubble_outline0

repeat8

shareShare

Kai Zou

@zkjzou

4 months ago

🔥 Excited to introduce ManyICLBench (ACL 2025) 🧐 Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar examples or learn from many examples? We carefully analyzed numerous tasks and categorized them. 📄 Paper: arxiv.org/abs/2411.07130 #ACL2025

thumb_up_off_alt26

chat_bubble_outline1

repeat16

shareShare

Wei Hu

@weihu_

4 months ago

What happens behind the "abrupt learning" curve in Transformer training? Our new work (led by Pulkit Gopalani) reveals universal characteristics of Transformers' early-phase training dynamics—uncovering the implicit biases and the degenerate state the model gets stuck in. ⬇️

thumb_up_off_alt23

chat_bubble_outline0

repeat8

shareShare