David Cox (@neurobongo) Twitter Tweets • TwiCopy

David Cox

@neurobongo

+ Follow

VP, AI Models @IBMResearch, IBM Director, @MITIBMLab. Former prof and serial/parallel entrepreneur.

ID: 261515114

linkhttps://www.linkedin.com/in/daviddanielcox/ calendar_today06-03-2011 03:28:22

21,21K Tweet

12,12K Followers

1,1K Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Yikang Shen

@yikang_shen

a year ago

Haha, surprise!

thumb_up_off_alt62

chat_bubble_outline2

repeat3

shareShare

Interested in LLM-based Code Translation 🧐? Check our CodeLingua leaderboard (codetlingua.github.io/leaderboard.ht…). We have updated the leaderboard with newly released Granite code LLMs from IBM Research. Granite models outperform Claude-3 and GPT-4 in C -> C++ translation 🔥.

thumb_up_off_alt52

chat_bubble_outline3

repeat13

shareShare

Leonid Karlinsky

@leokarlin

a year ago

Thanks for the highlight AK! We offer a simple and nearly-data-free way to move (large quantities) of custom PEFT models within or across LLM families or even across PEFT configurations. Useful for LLM cloud hosting when old base models need to be deprecated & upgraded

thumb_up_off_alt23

chat_bubble_outline0

repeat6

shareShare

Yikang Shen

@yikang_shen

a year ago

Thanks for posting our work! (1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.

thumb_up_off_alt134

chat_bubble_outline4

repeat27

shareShare

PyTorch

@pytorch

a year ago

Attending the PyTorch Conference? There's several talks and posters on Distributed Training. Folks from IBM, Meta, Databricks and more explore distributed LLM training with native PyTorch components and libraries! Come join us in September for the following talks! 🧵 [1/6]

thumb_up_off_alt75

chat_bubble_outline1

repeat8

shareShare

David Cox

@neurobongo

a year ago

tfw I am the sole technical person at a reception full of business people.

thumb_up_off_alt28

chat_bubble_outline4

repeat4

shareShare

David Cox

@neurobongo

10 months ago

Four fingers, five toes.

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

David Cox

@neurobongo

10 months ago

My daughter hit me yesterday, completely out of the blue, with a detailed and thoroughly researched presentation entitled "Why I should learn to play the harp"

thumb_up_off_alt11

chat_bubble_outline1

repeat0

shareShare

ollama

@ollama

9 months ago

Today, IBM and Ollama announces partnership to bring Granite 3.0 models! 😍 2 architectures: Dense: 2B: ollama run granite3-dense 8B: ollama run granite3-dense:8b Mixture of Experts: 1B: ollama run granite3-moe 3B: ollama run granite3-moe:3b Blog post:

thumb_up_off_alt930

chat_bubble_outline36

repeat152

shareShare

Daniel Newman

@danielnewmanuv

9 months ago

I had the chance to dive deep into the new models coming out from IBM with its introduction of Granite 3.0. Upon initial review, the work that is being done by IBM is at the leading edge of what can be done for smaller language models being utilized for generative AI solutions.

thumb_up_off_alt12

chat_bubble_outline0

repeat7

shareShare

Replicate

@replicate

9 months ago

We've partnered with IBM to bring the new Granite 3.0 language models to Replicate. These models are trained on license-permissible data collected following IBM’s AI Ethics principles for trustworthy enterprise usage. Best of all, they're fully open-source and Apache 2.0

thumb_up_off_alt44

chat_bubble_outline2

repeat8

shareShare

Alina Lozovskaya

@ailozovskaya

9 months ago

🚀 Introducing IBM Granite 3.0 models on the 🤗 Open LLM Leaderboard! Granite 3.0 is IBM’s newest suite of lightweight, multilingual, and versatile open foundation models designed for enterprise and customization. These models excel in coding, reasoning, and tool usage,

thumb_up_off_alt23

chat_bubble_outline2

repeat9

shareShare

Yikang Shen

@yikang_shen

9 months ago

Stick-Breaking Attention: Out-of-box length extrapolation, thanks to removing the position embedding; Better performance than Softmax+RoPE on almost every task; Similar efficient implementation like Flash Attention. Do we still need Softmax+RoPE for Language Models?

thumb_up_off_alt170

chat_bubble_outline9

repeat27

shareShare

steven

@tu7uruu

2 months ago

New top ASR model on the Open ASR Leaderboard! We've just added IBM Granite-Speech-3.3-8b and 2b to the leaderboard! Open weights and top scores across benchmarks.

thumb_up_off_alt120

chat_bubble_outline3

repeat19

shareShare