McGill NLP (@mcgill_nlp) 's Twitter Profile
McGill NLP

@mcgill_nlp

mcgill-nlp.github.io/people/

ID: 1445445382423474180

calendar_today05-10-2021 17:47:44

28 Tweet

98 Followers

28 Following

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

VinePPO, a straightforward modification to PPO, unlocks RLโ€™s true potential for LLM Reasoning. It beats RL-free methods (DPO and RestEM) and PPO, surpassing it in less steps(up to 9x), less time(up to 3x), and less KL with half memory. Time to rethink RL post-training๐Ÿงต: [1/n]

VinePPO, a straightforward modification to PPO, unlocks RLโ€™s true potential for LLM Reasoning.

It beats RL-free methods (DPO and RestEM) and PPO, surpassing it in less steps(up to 9x), less time(up to 3x), and less KL with half memory.

Time to rethink RL post-training๐Ÿงต: [1/n]
Siva Reddy (@sivareddyg) 's Twitter Profile Photo

COLM 2025 will be in Montreal ๐Ÿ‡จ๐Ÿ‡ฆ! Looking forward to welcoming people working on all aspects of language models. See you in October 2025 Conference on Language Modeling

COLM 2025 will be in Montreal ๐Ÿ‡จ๐Ÿ‡ฆ! Looking forward to welcoming people working  on all aspects of language models. See you in October 2025 <a href="/COLM_conf/">Conference on Language Modeling</a>
David Ifeoluwa Adelani ๐Ÿ‡ณ๐Ÿ‡ฌ (@davlanade) 's Twitter Profile Photo

Join my lab! Iโ€™m currently recruiting new students (MSc & PhD) for admission in the fall of 2025 at Mila - Institut quรฉbรฉcois d'IA mila.quebec/en/prospectiveโ€ฆ Are you interested in multilingual NLP? I would encourage you to apply. Deadline: December 1

Ian Porada (@ian_porada) 's Twitter Profile Photo

LLMs that "solve" challenge sets might still be relatively inaccurate at resolving diverse, attested instances of the same phenomenon. We show this in the case of Winograd schemas and other related pronominal ambiguities. In CoNLL 2024. #CoNLL2024 #EMNLP2024 #NLProc 1/

LLMs that "solve" challenge sets might still be relatively inaccurate at resolving diverse, attested instances of the same phenomenon. We show this in the case of Winograd schemas and other related pronominal ambiguities. In CoNLL 2024. #CoNLL2024 #EMNLP2024 #NLProc 1/
Siva Reddy (@sivareddyg) 's Twitter Profile Photo

I have multiple vacancies for PhD and Masters students at Mila - Institut quรฉbรฉcois d'IA McGill NLP in NLP/ML focusing on representation learning, reasoning, multimodality and alignment. Deadline for applications is Dec 1st. More details: mila.quebec/en/prospectiveโ€ฆ

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

I will be at #NeurIPS2024 Wed and Thu. Tomorrow at UBC for the Future of NLP event presenting "Learning to reason with Generative Models", covering post-training methods and inference time reasoning for LLMs and vision (diffusion) models. Happy to meet anyone interested!

I will be at #NeurIPS2024 Wed and Thu. Tomorrow at UBC for the Future of NLP event presenting "Learning to reason with Generative Models", covering post-training methods and inference time reasoning for LLMs and vision (diffusion) models. Happy to meet anyone interested!
Arkil Patel (@arkil_patel) 's Twitter Profile Photo

Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ Work w/ fantastic advisors ๐Ÿ‡บ๐Ÿ‡ฆ Dzmitry Bahdanau and Siva Reddy Thread ๐Ÿงต:

Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ

Work w/ fantastic advisors <a href="/DBahdanau/">๐Ÿ‡บ๐Ÿ‡ฆ Dzmitry Bahdanau</a> and <a href="/sivareddyg/">Siva Reddy</a>

Thread ๐Ÿงต:
Karolina Stanczak (@karstanczak) 's Twitter Profile Photo

๐Ÿ“ขNew Paper Alert!๐Ÿš€ Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?๐Ÿค” Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment๐Ÿงต

๐Ÿ“ขNew Paper Alert!๐Ÿš€
Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?๐Ÿค”
Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment๐Ÿงต
Xing Han Lu (@xhluca) 's Twitter Profile Photo

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation? To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation?

To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web
Parishad BehnamGhader (@parishadbehnam) 's Twitter Profile Photo

Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! ๐ŸŒ๐Ÿ’ฃ Retrievers need to be aligned too! ๐Ÿšจ๐Ÿšจ๐Ÿšจ Work done with the wonderful Nicholas Meade and Siva Reddy ๐Ÿ”— mcgill-nlp.github.io/malicious-ir/ Thread: ๐Ÿงต๐Ÿ‘‡

Nouha Dziri (@nouhadziri) 's Twitter Profile Photo

Clock is ticking โณโณsubmit your agent work to the first workshop for Agent Language Models #ACL2025NLP in Vienna ๐ŸŽผ๐ŸŽถ We have an exciting lineup of speakers๐Ÿ”ฅ ๐Ÿ—“๏ธDeadline *March 31st* realm-workshop.github.io

VLMs4All - CVPR 2025 Workshop (@vlms4all) 's Twitter Profile Photo

๐Ÿ“ขExcited to announce our upcoming workshop - Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All) #CVPR2025 2025! ๐ŸŒ sites.google.com/view/vlms4all

๐Ÿ“ขExcited to announce our upcoming workshop - Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All) <a href="/CVPR/">#CVPR2025</a> 2025!
๐ŸŒ sites.google.com/view/vlms4all
Sara Vera Marjanoviฤ‡ (@saraveramarjano) 's Twitter Profile Photo

Models like DeepSeek-R1 ๐Ÿ‹ mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1โ€™s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour. ๐Ÿ”—: mcgill-nlp.github.io/thoughtology/

Models like DeepSeek-R1 ๐Ÿ‹ mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1โ€™s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
๐Ÿ”—: mcgill-nlp.github.io/thoughtology/
Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Introducing the DeepSeek-R1 Thoughtology -- the most comprehensive study of R1 reasoning chains/thoughts โœจ. Probably everything you need to know about R1 thoughts. If we missed something, please let us know.

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction๐Ÿ’†โ€โ™‚๏ธ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (&lt;700 lines)

- super hackable
- no TRL / Verl, no abstraction๐Ÿ’†โ€โ™‚๏ธ
- Single GPU, full param tuning, 3B LLM
- Efficient (R1-zero countdown &lt; 10h)

comes with a from-scratch, fully spelled out YT video [1/n]
Xing Han Lu (@xhluca) 's Twitter Profile Photo

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories  

We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories.

We find that rule-based evals underreport success rates, and