Maor Ivgi (@maorivg) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚀 Exciting news! Apple has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets. 💡Why should you be excited? 1. The datasets and tools released as part of this research lay the groundwork for future advancements in

🚀 Exciting news! <a href="/Apple/">Apple</a> has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets.

💡Why should you be excited?

1. The datasets and tools released as part of this research lay the groundwork for future advancements in

thumb_up_off_alt35

chat_bubble_outline2

repeat5

shareShare

Ori Yoran

@oriyoran

a year ago

Can AI agents solve realistic, time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?" We introduce AssistantBench, a benchmark with 214 such tasks. Our new GPT-4 based agent gets just 25% accuracy! assistantbench.github.io

thumb_up_off_alt174

chat_bubble_outline7

repeat49

shareShare

Achal Dave

@achalddave

a year ago

Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!

thumb_up_off_alt112

chat_bubble_outline3

repeat29

shareShare

Vaishaal Shankar

@vaishaal

a year ago

DCLM models keep on coming! This time we release (by far) the best open-data 1B model!

thumb_up_off_alt27

chat_bubble_outline0

repeat3

shareShare

Maor Ivgi

@maorivg

a year ago

It’s incredible to witness the power of high-quality data. Today, we’re excited to share a fully open-source 1B model that outperforms many larger ones.

thumb_up_off_alt20

chat_bubble_outline0

repeat4

shareShare

Achal Dave

@achalddave

a year ago

Training DataComp-LM models meant we needed fast training code: here's a quick summary of how we sped up training in OpenLM by 60%, reducing costs by ~40%!

thumb_up_off_alt23

chat_bubble_outline1

repeat5

shareShare

Alex Dimakis

@alexgdimakis

a year ago

Datacomp-LM (DCLM) was presented today in ICLM FOMO workshop. DCLM is a data-centric benchmark for LLMs. It is also the state of the art open-source LLM and the state of the art open training dataset. Probably the most important finding is that data curation algorithms that

thumb_up_off_alt79

chat_bubble_outline1

repeat22

shareShare

Kevin Li

@kevinyli_

a year ago

Attention is all you need; at least the matrices are, if you want to distill Transformers into alternative architectures, like Mamba, with our new distillation method: MOHAWK! We also release a fully subquadratic, performant 1.5B model distilled from Phi-1.5 with only 3B tokens!

thumb_up_off_alt471

chat_bubble_outline5

repeat89

shareShare

Amit Zohar

@amit_zhr

a year ago

Thrilled to announce that our paper has been accepted for an Oral presentation at #ECCV2024! See you in Milan! With Uriel Singer, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, and Yaniv Taigman

thumb_up_off_alt61

chat_bubble_outline4

repeat15

shareShare

Niklas Muennighoff

@muennighoff

a year ago

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9

thumb_up_off_alt947

chat_bubble_outline23

repeat231

shareShare

Ben Bogin

@ben_bogin

a year ago

📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️ arxiv.org/pdf/2409.07440

thumb_up_off_alt72

chat_bubble_outline5

repeat19

shareShare

Theophile Gervet

@theo_gervet

10 months ago

Excited to finally share what I have been working on at Mistral AI. Meet Pixtral 12B, our first-ever multimodal model: - Drop-in replacement for Mistral Nemo 12B - SOTA multimodal capabilities without compromising on SOTA text-only capabilities - New 400M parameter vision encoder

Excited to finally share what I have been working on at <a href="/MistralAI/">Mistral AI</a>. Meet Pixtral 12B, our first-ever multimodal model:
- Drop-in replacement for Mistral Nemo 12B
- SOTA multimodal capabilities without compromising on SOTA text-only capabilities
- New 400M parameter vision encoder

thumb_up_off_alt262

chat_bubble_outline5

repeat39

shareShare

Talor Abramovich

@abramovichtalor

10 months ago

We're launching EnIGMA, our state-of-the-art AI agent for offensive cybersec! It uses tools like Ghidra & pwntools, can debug, connect to servers, and exploit vulnerabilities to solve CTF challenges. Built with researchers from Princeton, NYU, and TAU. enigma-agent.github.io

thumb_up_off_alt44

chat_bubble_outline2

repeat15

shareShare

Ofir Press

@ofirpress

10 months ago

We just gave SWE-agent offensive cybersecurity capabilities, leading to state-of-the-art results on two challenging benchmarks! Try it out, it's live now

thumb_up_off_alt59

chat_bubble_outline2

repeat11

shareShare

Luca Soldaini ✈️ ICLR 25

@soldni

10 months ago

Olmo goes multimodal! We are launching Molmo, a open family of multimodal models that rival the best closed VLMs out there 🤯 We spent the last 9 months meticulously curating PixMo, a dataset of (a) high-quality image-caption pairs and (b) multimodal instruction data.

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat170

shareShare

Leshem Choshen C U @ ICLR 🤖🤗

@lchoshen

9 months ago

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

thumb_up_off_alt248

chat_bubble_outline4

repeat36

shareShare

Alex Dimakis

@alexgdimakis

8 months ago

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

thumb_up_off_alt242

chat_bubble_outline9

repeat41

shareShare

Mor Geva

@megamor2

7 months ago

What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with Amit Elhelo 🧵 (1/10)

thumb_up_off_alt300

chat_bubble_outline5

repeat57

shareShare

Or Patashnik

@opatashnik

7 months ago

Ever wondered how a SINGLE token represents all subject regions in personalization? Many methods use this token in cross-attention, meaning all semantic parts share the same single attention value. We present Nested Attention, a mechanism that generates localized attention values

thumb_up_off_alt293

chat_bubble_outline5

repeat63

shareShare

Amanda Bertsch

@abertsch72

3 months ago

Super honored to win the Language Modeling SAC award! I'll be presenting this work Wednesday in the 2pm poster session in Hall 3-- would love to chat with folks there or at the rest of the conference about long context data, ICL, inference time methods, New Mexican food, etc :)

thumb_up_off_alt106

chat_bubble_outline9

repeat18

shareShare

Maor Ivgi

Gate.io

Akash Shetty

Ori Yoran

Achal Dave

Vaishaal Shankar

Maor Ivgi

Achal Dave

Alex Dimakis

Kevin Li

Amit Zohar

Niklas Muennighoff

Ben Bogin

Theophile Gervet

Talor Abramovich

Ofir Press

Luca Soldaini ✈️ ICLR 25

Leshem Choshen C U @ ICLR 🤖🤗

Alex Dimakis

Mor Geva

Or Patashnik

Amanda Bertsch