Rishabh Maheshwary (@rmahesh__) 's Twitter Profile
Rishabh Maheshwary

@rmahesh__

Applied Scientist @ServiceNow | Prev - AI Resident @AIatMeta.

ID: 1659902586

linkhttps://rishabhmaheshwary.github.io/ calendar_today10-08-2013 11:46:24

10 Tweet

129 Followers

2,2K Following

Vikas Yadav (@vikas_nlp_ua) 's Twitter Profile Photo

Thrilled to share our work has been accepted at @EMNLP2024 (Findings)๐ŸŽ‰๐Ÿ”ฅ. -๐—œ๐˜๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—”๐—น๐—ถ๐—ด๐—ป๐—บ๐—ฒ๐—ป๐˜ ๐—ผ๐—ณ ๐—Ÿ๐—Ÿ๐— ๐˜€ โœ… -Curriculum DPO training โœ… -Impressive gains across Vicuna bench, WizardLM, MT-bench, and UltraFeedback Paper - arxiv.org/abs/2403.07230 (1/2)

Thrilled to share our work has been accepted at @EMNLP2024 (Findings)๐ŸŽ‰๐Ÿ”ฅ.
-๐—œ๐˜๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—”๐—น๐—ถ๐—ด๐—ป๐—บ๐—ฒ๐—ป๐˜ ๐—ผ๐—ณ ๐—Ÿ๐—Ÿ๐— ๐˜€ โœ… 
-Curriculum DPO training โœ…
-Impressive gains across Vicuna bench, WizardLM, MT-bench, and UltraFeedback
Paper - arxiv.org/abs/2403.07230
(1/2)
Srishti Gureja (@srishti_gureja) 's Twitter Profile Photo

โœจ New Evaluation Benchmark for Reward Models - We Go Multilingual! โœจ Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks. Paper, code, dataset: m-rewardbench.github.io Our contributions: 1/9

โœจ New Evaluation Benchmark for Reward Models - We Go Multilingual! โœจ

Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks.
Paper, code, dataset: m-rewardbench.github.io

Our contributions:
1/9
Marzieh Fadaee (@mziizm) 's Twitter Profile Photo

Evaluation drives progress โ›ฐ๏ธ We're excited to share our latest work! ๐ŸŒ We built a multilingual evaluation set to see how reward models really hold up across languages and ran extensive benchmarks on top LLMs.

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

๐ŸŒ As multilingual language models grow in reach and impact, the need for robust evaluation datasets intensifies.ย  ๐Ÿšจ We present a multilingual reward benchmarking dataset, designed to rigorously evaluate models and reveal any blind spots in current multilingual model training.

๐ŸŒ As multilingual language models grow in reach and impact, the need for robust evaluation datasets intensifies.ย 

๐Ÿšจ We present a multilingual reward benchmarking dataset, designed to rigorously evaluate models and reveal any blind spots in current multilingual model training.
Angelika Romanou (@agromanou) 's Twitter Profile Photo

๐Ÿš€ Introducing INCLUDE ๐ŸŒ: A multilingual LLM evaluation benchmark spanning 44 languages! Contains *newly-collected* data, prioritizing *regional knowledge*. Setting the stage for truly global AI evaluation. Ready to see how your model measures up? #AI #Multilingual #LLM #NLProc

๐Ÿš€ Introducing INCLUDE ๐ŸŒ: A multilingual LLM evaluation benchmark spanning 44 languages!
Contains *newly-collected* data, prioritizing *regional knowledge*.

Setting the stage for truly global AI evaluation.
Ready to see how your model measures up?
#AI #Multilingual #LLM #NLProc
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

What would it take for AI evaluations to truly support our global experiences? ๐ŸŒ Our cross-institutional paper introduces INCLUDE, a multilingual LLM evaluation benchmark of local exams capturing in-language nuances & cultural context for truly localized AI evaluation.

What would it take for AI evaluations to truly support our global experiences? ๐ŸŒ

Our cross-institutional paper introduces INCLUDE, a multilingual LLM evaluation benchmark of local exams capturing in-language nuances & cultural context for truly localized AI evaluation.
Sara Hooker (@sarahookr) 's Twitter Profile Photo

๐Ÿ”ฅ INCLUDE is an ambitious and critical release. Very proud of cross-instutional collaboration. Most extensive collection to-date of in-language examinations from across the world. ๐ŸŒŽ๐ŸŒ๐ŸŒ Critical work to ensure AI progress is not overfitting to knowledge of US exam subjects.

Shivalika Singh (@singhshiviii) 's Twitter Profile Photo

Thrilled to see INCLUDE accepted as a Spotlight at ICLR 2025! ๐ŸŽ‰ This was a massive open science effort! Amazing work led by Angelika Romanou Negar Foroutan, Anna โค๏ธ Was lovely collaborating with them as well as harsha Rishabh Maheshwary and others from Cohere For AI community! ๐Ÿ™Œ

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

One standout project, โ€œEvaluating Reward Models in Multilingual Settingsโ€ introduced a benchmark dataset for 23 languages, showing performance gaps between English and non-English languages, and highlights the impact of translation quality.ย ย  ๐Ÿ“œ:arxiv.org/abs/2410.15522

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

๐Ÿš€ We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark. ๐Ÿ“Œ Most VLM benchmarks are English-centric or rely on translationsโ€”missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual ๐ŸŒŽ & multimodal ๐Ÿ‘€ VLMs evaluation

๐Ÿš€ We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark.

๐Ÿ“Œ Most VLM benchmarks are English-centric or rely on translationsโ€”missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual ๐ŸŒŽ & multimodal ๐Ÿ‘€ VLMs evaluation
Srishti Gureja (@srishti_gureja) 's Twitter Profile Photo

Our paper M-RewardBench got accepted to ACL main: arxiv.org/abs/2410.15522 We construct the first-of-its-kind multilingual RM evaluation benchmark and leverage it to look into the performances of several Reward Models in non-English settings along w/ other interesting insights.

Vikas Yadav (@vikas_nlp_ua) 's Twitter Profile Photo

๐ŸŽ‰ Our work โ€œVariable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMsโ€ is accepted at #ACLFindings2025 ๐Ÿ“Ž arxiv.org/abs/2406.17415 โ€” Keep key layers high-precision, push others lower โ†’ compact LLMs w/ ~no accuracy loss โ€” Simple LIM & ZD scores rank layers