Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile
Ehsan Shareghi

@ehsanshareghi

Assistant Prof @ Monash. working on NLProc (mostly LLMs these days). Opinions are my own.

ID: 1365097706704703488

linkhttps://eehsan.github.io/ calendar_today26-02-2021 00:33:57

96 Tweet

235 Followers

162 Following

Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

Designing language agents: (1) Why should you care about uncertainty? (2) Given a small set of data, is it better to fine-tune your agent, or to calibrate its uncertainty? (3) Could you just rely on LLM's verbal uncertainty? We answer these questions: uala-agent.github.io

Designing language agents: (1) Why should you care about uncertainty? (2) Given a small set of data, is it better to fine-tune your agent, or to calibrate its uncertainty? (3) Could you just rely on LLM's verbal uncertainty? We answer these questions: uala-agent.github.io
Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

Simple working idea: Taking a mixture of training data, train a task router that will guide each input to the right mode of solving. A single LoRA (not an MoE) instruction-tuned to make both Task Routing and Task Solving decisions. More: raven-lm.github.io #EACL2024 #NLProc

Simple working idea: Taking a mixture of training data, train a task router that will guide each input to the right mode of solving. A single LoRA (not an MoE) instruction-tuned to make both Task Routing and Task Solving decisions. More: raven-lm.github.io #EACL2024 #NLProc
Yinhong Liu (@yinhongliu2) 's Twitter Profile Photo

🔥New paper!📜 Struggle to align LLM evaluators with human judgements?🤔 Introducing PairS🌟: By exploiting transitivity, we push the potential of pairwise preference in efficient ranking evaluations that has better alignment!🧑‍⚖️ 📖arxiv.org/abs/2403.16950 💻github.com/cambridgeltl/p…

🔥New paper!📜
Struggle to align LLM evaluators with human judgements?🤔
Introducing PairS🌟: By exploiting transitivity, we push the potential of pairwise preference in efficient ranking evaluations that has better alignment!🧑‍⚖️
📖arxiv.org/abs/2403.16950
💻github.com/cambridgeltl/p…
Victor Prokhorov (@victor_p91) 's Twitter Profile Photo

Interested in representation learning and Conditional Neural Processes (CNPs)? Together with Siddharth N (ExLab) and Ivan Titov we propose Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNPs context points as latent variables.

Interested in representation learning and Conditional Neural Processes (CNPs)?

Together with Siddharth N (<a href="/an_exlab/">ExLab</a>) and <a href="/iatitov/">Ivan Titov</a> we propose Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNPs context points as latent variables.
Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

This is now accepted to #ACL2024. Just a scratch on the surface of integrating uncertainty in language agents, but an important step.

Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

Should all major LLM builders brace for impact?🤔 Not about the quality of this LLM but the fact than NVIDIA - as a main GPU provider has become a model producer too! Could create a very odd dynamic in the coming years.

Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

IMHO, this is the frontier of LLM (or Language Agents) for reasoning: LLMs+ICL+Tools do well on standalone Math/Logic/Coding/etc problems. BUT for reasoning in the wild there are more serious and realistic challenges to face and the research space on this is still quite thin:(1)

IMHO, this is the frontier of LLM (or Language Agents) for reasoning: LLMs+ICL+Tools do well on standalone Math/Logic/Coding/etc problems. BUT for reasoning in the wild there are more serious and realistic challenges to face and the research space on this is still quite thin:(1)
Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

This is still a work in-progress, but presents a new view on how context should be compressed and retrieved. One of the challenging next steps will involve optimising for various granularities of the information captured. More to read: arxiv.org/pdf/2409.01495

This is still a work in-progress, but presents a new view on how context should be compressed and retrieved. One of the challenging next steps will involve optimising for various granularities of the information captured. More to read: arxiv.org/pdf/2409.01495
Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

Speech (or audio to be more specific) related safety is literally unexplored beyond content. If the focus is only placed on safeguarding "what" is being said but not "how" it is sad or in "which context" it is said, then we are left with very weak safety measures for speech. A

Speech (or audio to be more specific) related safety is literally unexplored beyond content. If the focus is only placed on safeguarding "what" is being said but not "how" it is sad or in "which context" it is said, then we are left with very weak safety measures for speech.  A
Kyunghyun Cho (@kchonyc) 's Twitter Profile Photo

congratulations, Ian Goodfellow, for the test-of-time award at NeurIPS Conference ! this award reminds me of how GAN started with this one email ian sent to the Mila - Institut québécois d'IA lab mailing list in May 2014. super insightful and amazing execution!

congratulations, <a href="/goodfellow_ian/">Ian Goodfellow</a>, for the test-of-time award at <a href="/NeurIPSConf/">NeurIPS Conference</a> ! 

this award reminds me of how GAN started with this one email ian sent to the <a href="/Mila_Quebec/">Mila - Institut québécois d'IA</a> lab mailing list in May 2014. super insightful and amazing execution!
Yinhong Liu (@yinhongliu2) 's Twitter Profile Photo

🚨 New Paper Alert! 🚨 When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔 Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚 🔗 Read more: arxiv.org/abs/2410.02205

🚨 New Paper Alert! 🚨
When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔
Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚

🔗 Read more: arxiv.org/abs/2410.02205
Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

Hao's PhD research in audio-safety red teaming of LLMs has now extended into a new exciting direction in his latest #NAACL2025 paper. In his recent work "Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models" we ask the following questions: (1) Do text-only LLMs

Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

This is not a popular opinion, but I somehow feel I learned so much and yet nothing from the deepseek-r1 paper. Except for the general outline of the RL-only recipe of the Zero model, the critical details about data are vaguely described and mostly undisclosed. Credit to them for

Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

"Process verification" and "process-level annotation for verification" are both critical and yet never really addressed. This again comes up in the Deepseek-R1 paper as a challenge. IMHO, I don't think the inference time scaling debate will be meaningful without thinking about

Ehsan Shareghi (@ehsanshareghi) 's Twitter Profile Photo

Dear all, if you happen to know/have an Iranian colleague, friend, or peer, please write to them. You do not need to take a side - if you are not comfortable or do not know enough - or write a lengthy message. Just a few words of morale support could mean a lot. Thank you.