Maor Ivgi (@maorivg) 's Twitter Profile
Maor Ivgi

@maorivg

NLP researcher / Ph.D. candidate at Tel-Aviv University

ID: 1381580813767213057

linkhttp://mivg.github.io calendar_today12-04-2021 12:12:11

187 Tweet

524 Followers

180 Following

Akash Shetty (@akashlives) 's Twitter Profile Photo

🚀 Exciting news! Apple has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets. 💡Why should you be excited? 1. The datasets and tools released as part of this research lay the groundwork for future advancements in

🚀 Exciting news! <a href="/Apple/">Apple</a> has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets.

💡Why should you be excited?

1. The datasets and tools released as part of this research lay the groundwork for future advancements in
Ori Yoran (@oriyoran) 's Twitter Profile Photo

Can AI agents solve realistic, time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?" We introduce AssistantBench, a benchmark with 214 such tasks. Our new GPT-4 based agent gets just 25% accuracy! assistantbench.github.io

Achal Dave (@achalddave) 's Twitter Profile Photo

Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!

Excited to share our new-and-improved 1B models trained with DataComp-LM!

- 1.4B model trained on 4.3T tokens
- 5-shot MMLU 47.5 (base model) =&gt; 51.4 (w/ instruction tuning)
- Fully open models: public code, weights, dataset!
Maor Ivgi (@maorivg) 's Twitter Profile Photo

It’s incredible to witness the power of high-quality data. Today, we’re excited to share a fully open-source 1B model that outperforms many larger ones.

Achal Dave (@achalddave) 's Twitter Profile Photo

Training DataComp-LM models meant we needed fast training code: here's a quick summary of how we sped up training in OpenLM by 60%, reducing costs by ~40%!

Training DataComp-LM models meant we needed fast training code: here's a quick summary of how we sped up training in OpenLM by 60%, reducing costs by ~40%!
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Datacomp-LM (DCLM) was presented today in ICLM FOMO workshop. DCLM is a data-centric benchmark for LLMs. It is also the state of the art open-source LLM and the state of the art open training dataset. Probably the most important finding is that data curation algorithms that

Datacomp-LM (DCLM) was presented today in ICLM FOMO workshop. DCLM is a data-centric benchmark for LLMs. It is also the state of the art open-source LLM and the state of the art open training dataset. 

Probably the most important finding is that data curation algorithms that
Kevin Li (@kevinyli_) 's Twitter Profile Photo

Attention is all you need; at least the matrices are, if you want to distill Transformers into alternative architectures, like Mamba, with our new distillation method: MOHAWK! We also release a fully subquadratic, performant 1.5B model distilled from Phi-1.5 with only 3B tokens!

Attention is all you need; at least the matrices are, if you want to distill Transformers into alternative architectures, like Mamba, with our new distillation method: MOHAWK!

We also release a fully subquadratic, performant 1.5B model distilled from Phi-1.5 with only 3B tokens!
Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source
- 1B active, 7B total params for 5T tokens
- Best small LLM &amp; matches more costly ones like Gemma, Llama
- Open Model/Data/Code/Logs + lots of analysis &amp; experiments

📜arxiv.org/abs/2409.02060
🧵1/9
Ben Bogin (@ben_bogin) 's Twitter Profile Photo

📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️ arxiv.org/pdf/2409.07440

📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories

Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️

arxiv.org/pdf/2409.07440
Theophile Gervet (@theo_gervet) 's Twitter Profile Photo

Excited to finally share what I have been working on at Mistral AI. Meet Pixtral 12B, our first-ever multimodal model: - Drop-in replacement for Mistral Nemo 12B - SOTA multimodal capabilities without compromising on SOTA text-only capabilities - New 400M parameter vision encoder

Excited to finally share what I have been working on at <a href="/MistralAI/">Mistral AI</a>. Meet Pixtral 12B, our first-ever multimodal model:
- Drop-in replacement for Mistral Nemo 12B
- SOTA multimodal capabilities without compromising on SOTA text-only capabilities
- New 400M parameter vision encoder
Talor Abramovich (@abramovichtalor) 's Twitter Profile Photo

We're launching EnIGMA, our state-of-the-art AI agent for offensive cybersec! It uses tools like Ghidra & pwntools, can debug, connect to servers, and exploit vulnerabilities to solve CTF challenges. Built with researchers from Princeton, NYU, and TAU. enigma-agent.github.io

We're launching EnIGMA, our state-of-the-art AI agent for offensive cybersec! 
It uses tools like Ghidra &amp; pwntools, can debug, connect to servers, and exploit vulnerabilities to solve CTF challenges.
Built with researchers from Princeton, NYU, and TAU.
enigma-agent.github.io
Ofir Press (@ofirpress) 's Twitter Profile Photo

We just gave SWE-agent offensive cybersecurity capabilities, leading to state-of-the-art results on two challenging benchmarks! Try it out, it's live now

Luca Soldaini ✈️ ICLR 25 (@soldni) 's Twitter Profile Photo

Olmo goes multimodal! We are launching Molmo, a open family of multimodal models that rival the best closed VLMs out there 🤯 We spent the last 9 months meticulously curating PixMo, a dataset of (a) high-quality image-caption pairs and (b) multimodal instruction data.

Olmo goes multimodal!

We are launching Molmo, a open family of multimodal models that rival the best closed VLMs out there 🤯

We spent the last 9 months meticulously curating PixMo, a dataset of (a) high-quality image-caption pairs and (b) multimodal instruction data.
Leshem Choshen C U @ ICLR 🤖🤗 (@lchoshen) 's Twitter Profile Photo

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

Scaling laws predict🦣large models by training🦟small ones, cool right?
Fortunately, they are not that complicated or costly
at least they don't have to be

We have collected 400+ models
fitted 1000+ scaling laws
and created 1 guide
for cheap &amp; more reliable scaling law fitting:
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

github.com/mlfoundations/…
I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training
Mor Geva (@megamor2) 's Twitter Profile Photo

What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with Amit Elhelo 🧵 (1/10)

What's in an attention head? 🤯

We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨

A new preprint with <a href="/AmitElhelo/">Amit Elhelo</a> 🧵 (1/10)
Or Patashnik (@opatashnik) 's Twitter Profile Photo

Ever wondered how a SINGLE token represents all subject regions in personalization? Many methods use this token in cross-attention, meaning all semantic parts share the same single attention value. We present Nested Attention, a mechanism that generates localized attention values

Ever wondered how a SINGLE token represents all subject regions in personalization? Many methods use this token in cross-attention, meaning all semantic parts share the same single attention value. We present Nested Attention, a mechanism that generates localized attention values
Amanda Bertsch (@abertsch72) 's Twitter Profile Photo

Super honored to win the Language Modeling SAC award! I'll be presenting this work Wednesday in the 2pm poster session in Hall 3-- would love to chat with folks there or at the rest of the conference about long context data, ICL, inference time methods, New Mexican food, etc :)