David Cox (@neurobongo) 's Twitter Profile
David Cox

@neurobongo

VP, AI Models @IBMResearch, IBM Director, @MITIBMLab. Former prof and serial/parallel entrepreneur.

ID: 261515114

linkhttps://www.linkedin.com/in/daviddanielcox/ calendar_today06-03-2011 03:28:22

21,21K Tweet

12,12K Followers

1,1K Following

Ali (@alibrahimzada) 's Twitter Profile Photo

Interested in LLM-based Code Translation 🧐? Check our CodeLingua leaderboard (codetlingua.github.io/leaderboard.ht…). We have updated the leaderboard with newly released Granite code LLMs from IBM Research. Granite models outperform Claude-3 and GPT-4 in C -> C++ translation 🔥.

Interested in LLM-based Code Translation 🧐?

Check our CodeLingua leaderboard (codetlingua.github.io/leaderboard.ht…). We have updated the leaderboard with newly released Granite code LLMs from <a href="/IBMResearch/">IBM Research</a>. Granite models outperform Claude-3 and GPT-4 in C -&gt; C++ translation 🔥.
Leonid Karlinsky (@leokarlin) 's Twitter Profile Photo

Thanks for the highlight AK! We offer a simple and nearly-data-free way to move (large quantities) of custom PEFT models within or across LLM families or even across PEFT configurations. Useful for LLM cloud hosting when old base models need to be deprecated & upgraded

Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Thanks for posting our work! (1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.

Thanks for posting our work! 
(1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.
PyTorch (@pytorch) 's Twitter Profile Photo

Attending the PyTorch Conference? There's several talks and posters on Distributed Training. Folks from IBM, Meta, Databricks and more explore distributed LLM training with native PyTorch components and libraries! Come join us in September for the following talks! 🧵 [1/6]

David Cox (@neurobongo) 's Twitter Profile Photo

My daughter hit me yesterday, completely out of the blue, with a detailed and thoroughly researched presentation entitled "Why I should learn to play the harp"

ollama (@ollama) 's Twitter Profile Photo

Today, IBM and Ollama announces partnership to bring Granite 3.0 models! 😍 2 architectures: Dense: 2B: ollama run granite3-dense 8B: ollama run granite3-dense:8b Mixture of Experts: 1B: ollama run granite3-moe 3B: ollama run granite3-moe:3b Blog post:

Today, IBM and Ollama announces partnership to bring Granite 3.0 models! 😍

2 architectures: 

Dense: 

2B: 
ollama run granite3-dense

8B:
ollama run granite3-dense:8b

Mixture of Experts: 

1B:
ollama run granite3-moe

3B:
ollama run granite3-moe:3b

Blog post:
Daniel Newman (@danielnewmanuv) 's Twitter Profile Photo

I had the chance to dive deep into the new models coming out from IBM with its introduction of Granite 3.0. Upon initial review, the work that is being done by IBM is at the leading edge of what can be done for smaller language models being utilized for generative AI solutions.

Replicate (@replicate) 's Twitter Profile Photo

We've partnered with IBM to bring the new Granite 3.0 language models to Replicate. These models are trained on license-permissible data collected following IBM’s AI Ethics principles for trustworthy enterprise usage. Best of all, they're fully open-source and Apache 2.0

We've partnered with IBM to bring the new Granite 3.0 language models to Replicate. These models are trained on license-permissible data collected following IBM’s AI Ethics principles for trustworthy enterprise usage.

Best of all, they're fully open-source and Apache 2.0
Alina Lozovskaya (@ailozovskaya) 's Twitter Profile Photo

🚀 Introducing IBM Granite 3.0 models on the 🤗 Open LLM Leaderboard! Granite 3.0 is IBM’s newest suite of lightweight, multilingual, and versatile open foundation models designed for enterprise and customization. These models excel in coding, reasoning, and tool usage,

🚀 Introducing IBM Granite 3.0 models on the 🤗 Open LLM Leaderboard!

Granite 3.0 is IBM’s newest suite of lightweight, multilingual, and versatile open foundation models designed for enterprise and customization. These models excel in coding, reasoning, and tool usage,
Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Stick-Breaking Attention: Out-of-box length extrapolation, thanks to removing the position embedding; Better performance than Softmax+RoPE on almost every task; Similar efficient implementation like Flash Attention. Do we still need Softmax+RoPE for Language Models?

Stick-Breaking Attention: Out-of-box length extrapolation, thanks to removing the position embedding; Better performance than Softmax+RoPE on almost every task; Similar efficient implementation like Flash Attention. Do we still need Softmax+RoPE for Language Models?
steven (@tu7uruu) 's Twitter Profile Photo

New top ASR model on the Open ASR Leaderboard! We've just added IBM Granite-Speech-3.3-8b and 2b to the leaderboard! Open weights and top scores across benchmarks.

New top ASR model on the Open ASR Leaderboard!

We've just added IBM Granite-Speech-3.3-8b and 2b to the leaderboard! 

Open weights and top scores across benchmarks.