JuYoung Suk (@scott_sjy) 's Twitter Profile
JuYoung Suk

@scott_sjy

MS student at KAIST AI

ID: 1636010791135744004

calendar_today15-03-2023 14:26:17

49 Tweet

332 Followers

1,1K Following

hyunji amy lee (@hyunji_amy_lee) 's Twitter Profile Photo

New preprint "Semiparametric Token-Sequence Co-Supervision" We introduce semiparametric token-sequence co-supervision, which trains LM by simultaneously leveraging supervision from a parametric token and a nonparametric sequence embedding space. arxiv.org/abs/2403.09024

New preprint "Semiparametric Token-Sequence Co-Supervision"   

We introduce semiparametric token-sequence co-supervision, which trains LM by simultaneously leveraging supervision from a parametric token and a nonparametric sequence embedding space.

arxiv.org/abs/2403.09024
Hyeonbin Hwang (@ronalhwang) 's Twitter Profile Photo

🚨 New LLM Reasoning Paper 🚨 Q. How can LLMs self-improve their reasoning ability? ⇒ Introducing Self-Explore⛰️🧭, a training method specifically designed to help LLMs avoid reasoning pits by learning from their own outputs! [1/N]

🚨 New LLM Reasoning Paper 🚨

Q. How can LLMs self-improve their reasoning ability?

⇒ Introducing Self-Explore⛰️🧭, a training method specifically designed to help LLMs avoid reasoning pits by learning from their own outputs! [1/N]
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Presents a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements repo: github.com/prometheus-eva… abs: arxiv.org/abs/2405.01535

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Presents  a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements

repo: github.com/prometheus-eva…
abs: arxiv.org/abs/2405.01535
AK (@_akhaliq) 's Twitter Profile Photo

Prometheus 2 An Open Source Language Model Specialized in Evaluating Other Language Models Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability

Prometheus 2

An Open Source Language Model Specialized in Evaluating Other Language Models

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability
elvis (@omarsar0) 's Twitter Profile Photo

An Open Source LM Specialized in Evaluating Other LMs Open-source Prometheus 2 (7B & 8x7B), state-of-the-art open evaluator LLMs that closely mirror human and GPT-4 judgments. They support both direct assessments and pair-wise ranking formats grouped with user-defined

An Open Source LM Specialized in Evaluating Other LMs

Open-source Prometheus 2 (7B & 8x7B), state-of-the-art open evaluator LLMs that closely mirror human and GPT-4 judgments. 

They support both direct assessments and pair-wise ranking formats grouped with user-defined
Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

#NLProc Introducing 🔥Prometheus 2, an open-source LM specialized on evaluating other language models. ✅Supports both direct assessment & pairwise ranking. ✅ Improved evaluation capabilities compared to its predecessor. ✅Can assess based on user-defined evaluation criteria.

#NLProc
Introducing 🔥Prometheus 2, an open-source LM specialized on evaluating other language models.

✅Supports both direct assessment & pairwise ranking.
✅ Improved evaluation capabilities compared to its predecessor.
✅Can assess based on user-defined evaluation criteria.
Alvaro Bartolome (@alvarobartt) 's Twitter Profile Photo

🔥 Prometheus 2 was recently released by KAIST AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus! And we Argilla have already implemented it in `distilabel`, but first, let’s see what Prometheus 2 has to offer:

🔥 Prometheus 2 was recently released by <a href="/kaist_ai/">KAIST AI</a> as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!

And we <a href="/argilla_io/">Argilla</a> have already implemented it in `distilabel`, but first, let’s see what Prometheus 2 has to offer:
Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA? Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.

🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA?

Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.
Hoyeon Chang (@hoyeon_chang) 's Twitter Profile Photo

🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:

🚨 New paper 🚨
How Large Language Models Acquire Factual Knowledge During Pretraining?

I’m thrilled to announce the release of my new paper! 🎉

This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:
Doyoung Kim (@doyoungkim_ml) 's Twitter Profile Photo

🤔 Humans excel at generalizing planning into extrapolated data or rapidly adapting with limited train data. How is it possible for language models? Introducing 🧠Cognitive Map for Language Models, a framework achieving Optimal Planning via Verbally Representing the World Model🌍

🤔 Humans excel at generalizing planning into extrapolated data or rapidly adapting with limited train data. How is it possible for language models?
Introducing 🧠Cognitive Map for Language Models, a framework achieving Optimal Planning via Verbally Representing the World Model🌍
MiyoungKo (@miyoung_ko) 's Twitter Profile Photo

📢 Excited to share our latest paper on the reasoning capabilities of LLMs! Our research dives into how these models recall and utilize factual knowledge during solving complex questions. [🧵1 / 10] arxiv.org/abs/2406.19502

📢 Excited to share our latest paper on the reasoning capabilities of LLMs! Our research dives into how these models recall and utilize factual knowledge during solving complex questions. [🧵1 / 10]
arxiv.org/abs/2406.19502
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Prometheus-2 is a great alternative of GPT-4 evaluation when doing fine-grained evaluation of an underlying LLM & a Reward model for Reinforcement Learning from Human Feedback (RLHF).✨ It's a state-of-the-art evaluator language model offering significant improvements over its

Prometheus-2 is a great alternative of GPT-4 evaluation when doing fine-grained evaluation of an underlying LLM &amp; a Reward model for Reinforcement Learning from Human Feedback (RLHF).✨

It's a state-of-the-art evaluator language model offering significant improvements over its
Seonghyeon Ye (@seonghyeonye) 's Twitter Profile Photo

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking

JuYoung Suk (@scott_sjy) 's Twitter Profile Photo

MM-Eval is a new multilingual benchmark for LLM-as-Judge and reward models! We hope this work can contribute towards better multilingual alignment for LLMs :)

Nathan Lambert (@natolambert) 's Twitter Profile Photo

As more and more attention shifts back to on-policy RL for LLM post training, thx o1, (away from just using DPO-like methods for alignment) it's been clear we need a better reward model ecosystem. The good news is, we're starting to get a lot of evals. True progress only comes

Dongkeun Yoon (@dongkeun_yoon) 's Twitter Profile Photo

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

🙁 LLMs are overconfident even when they are dead wrong.

🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”?

❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.
Hoyeon Chang (@hoyeon_chang) 's Twitter Profile Photo

New preprint 📄 (with Jinho Park ) Can neural nets really reason compositionally, or just match patterns? We present the Coverage Principle: a data-centric framework that predicts when pattern-matching models will generalize (validated on Transformers). 🧵👇

New preprint 📄 (with <a href="/jinho___park/">Jinho Park</a> )

Can neural nets really reason compositionally, or just match patterns?  
We present the Coverage Principle: a data-centric framework that predicts when pattern-matching models will generalize (validated on Transformers). 🧵👇
Hyeonbin Hwang (@ronalhwang) 's Twitter Profile Photo

🚨 New Paper co-led with byeongguk jeon 🚨 Q. Can we adapt Language Models, trained to predict next token, to reason in sentence-level? I think LMs operating in higher-level abstraction would be a promising path towards advancing its reasoning, and I am excited to share our

🚨 New Paper co-led with <a href="/bkjeon1211/">byeongguk jeon</a> 🚨

Q. Can we adapt Language Models, trained to predict next token, to reason in sentence-level? 

I think LMs operating in higher-level abstraction would be a promising path towards advancing its reasoning, and I am excited to share our
Zeyuan Allen-Zhu, Sc.D. (@zeyuanallenzhu) 's Twitter Profile Photo

Phase 1 of Physics of Language Models code release ✅our Part 3.1 + 4.1 = all you need to pretrain strong 8B base model in 42k GPU-hours ✅Canon layers = strong, scalable gains ✅Real open-source (data/train/weights) ✅Apache 2.0 license (commercial ok!) 🔗github.com/facebookresear…

Phase 1 of Physics of Language Models code release
✅our Part 3.1 + 4.1 = all you need to pretrain strong 8B base model in 42k GPU-hours
✅Canon layers = strong, scalable gains
✅Real open-source (data/train/weights)
✅Apache 2.0 license (commercial ok!)
🔗github.com/facebookresear…