Ben Zhou
@benzhou96
Assistant Professor @SCAI_ASU Also known as Xuanyu Zhou
ID: 1033928440645382149
http://xuanyu.me 27-08-2018 04:05:24
27 Tweet
310 Followers
248 Following
During a long-chained reasoning process... ๐คAre LLMs following correct reasoning paths or relying on semantic shortcuts? ๐ตโ๐ซ What contributes to the failure of LLMs? ๐Will prompting/RAG solve it? Check out our paper, which is accepted by #NAACL2024 at: vincentleebang.github.io/eureqa.github.โฆ
Can Text-to-Image models understand common sense? ๐ค Can they generate images that fit everyday common sense? ๐ค tldr; NO, they are far less intelligent than us ๐๐ปโโ๏ธ Introducing Commonsense-T2I ๐ก zeyofu.github.io/CommonsenseT2I/, a novel evaluation and benchmark designed to measure
๐จ LLMs can generate math proofs and solve competition-level math problems, but do they truly understand them? ๐Introducing CounterMATH, a benchmark for assessing LLMsโ mathematical reasoning via counterexample-based proofs. ๐Paper link: arxiv.org/abs/2502.10454 ๐งตRead on!โฌ๏ธ
๐๐๐ฑ๐๐ข๐ญ๐๐ ๐ญ๐จ ๐ฌ๐ก๐๐ซ๐ ๐ญ๐ก๐๐ญ ๐จ๐ฎ๐ซ ๐ง๐๐ฐ ๐ฉ๐๐ฉ๐๐ซ, "ThinkTuning", ๐ข๐ฌ ๐ง๐จ๐ฐ ๐จ๐ฎ๐ญ! ๐ ๐RL merely draws out behaviors already present in the base models. Sophisticated thinking behaviors like self-reflection, self-correction and other multi-step reasoning
Excited to share that two of my first-author papers were accepted to #EMNLP2025! โจ๐ 1๏ธโฃ Code Execution as Grounded Supervision for LLM Reasoning (Main) 2๏ธโฃ Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation (Findings) Huge thanks to my collaborators๐