Joshua Ong (@joshuaongg21) 's Twitter Profile
Joshua Ong

@joshuaongg21

ID: 1760327381540020224

calendar_today21-02-2024 15:35:42

131 Tweet

20 Followers

123 Following

Joshua Ong (@joshuaongg21) 's Twitter Profile Photo

We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across

We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses.

✏️PiCSAR is generalisable across
Eleonora Giunchiglia (@e_giunchiglia) 's Twitter Profile Photo

🚀 Excited to see our work on PiCSAR out! Thrilled to have Joshua as a co-author — and even more thrilled that he’ll be joining my group this academic year. Big things ahead!

Yifu Qiu (@yifuqiu98) 's Twitter Profile Photo

Happy to share that our work is accepted by NeurIPS 2025 LAW workshop: Bridging Language, Agent, and World Models! ☀️ May see you in San Diego.

Siddarth Venkatraman (@siddarthv66) 's Twitter Profile Photo

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

NO verifiers. NO Tools.
Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling.

Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of!
Then we use aggregation-aware RL to push further!! 📈📈
🧵below!
Alessio Devoto (@devoto_alessio) 's Twitter Profile Photo

Expected Attention compresses the KV Cache by estimating the attention score from future queries! Both during prefilling & decoding 🚀 All code released in our KVPress library & Leaderboard!

Expected Attention compresses the KV Cache by estimating the attention score from future queries! Both during prefilling & decoding 🚀 All code released in our KVPress library & Leaderboard!
Zheng Zhao @NeurIPS2024 (@zhengzhao97) 's Twitter Profile Photo

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR AI at Meta. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR <a href="/metaai/">AI at Meta</a>. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.