Shashank Rajput (@shashank_r12) 's Twitter Profile
Shashank Rajput

@shashank_r12

LLM Research @DbrxMosaicAI

ID: 1955982469

linkhttps://shashankrajput.github.io/ calendar_today12-10-2013 06:44:48

216 Tweet

828 Followers

653 Following

Prithviraj (Raj) Ammanabrolu (@rajammanabrolu) 's Twitter Profile Photo

I'm recruiting one (1) PhD student this year focused on multimodal embodied agents. All things vision VLMs + RL!! Please apply to the UCSD CSE PhD app by Dec 15

Yuchen Zeng (@yzeng58) 's Twitter Profile Photo

🎉 Milestone: Our LIFT paper has hit 100+ citations! We introduced a simple method to adapt LLMs to new domains, and researchers are now achieving success with it across predictive chemistry, metamaterial physics & more! Check our work at uw-madison-lee-lab.github.io/LanguageInterf…

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities. HYMBA could revolutionize NLP

🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters?  🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities.

HYMBA could revolutionize NLP
Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at
Shashank Rajput (@shashank_r12) 's Twitter Profile Photo

I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing! databricks.com/blog/databrick…

Rajko Radovanović (@rajko_rad) 's Twitter Profile Photo

At NeurIPS early? Like making GPUs go brrr? Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! Per the usual, I'll be doing 3

At NeurIPS early? Like making GPUs go brrr? 

Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... 

Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! 

Per the usual, I'll be doing 3
jack morris (@jxmnop) 's Twitter Profile Photo

i'm somewhat confident that both the following properties will hold of language models in 2027: 1. tokenization will be gone, replaced with byte-level ingestion 2. all tokens that don't need to be read or written by a human will be continuous vectors luckily two interesting

i'm somewhat confident that both the following properties will hold of language models in 2027:

1.  tokenization will be gone, replaced with byte-level ingestion
2.  all tokens that don't need to be read or written by a human will be continuous vectors

luckily two interesting
Hongyi Wang (@hongyiwang10) 's Twitter Profile Photo

I have three Ph.D. student openings in my research group at Rutgers Computer Science Department starting in Fall 2025. If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at: grad.rutgers.edu/academics/prog… The deadline is

Databricks (@databricks) 's Twitter Profile Photo

Databricks research scientist Shashank Rajput s shares approaches in LLMs: - How RAG enhances accuracy - Evolution of attention mechanisms - Practical applications & trade-offs of Mamba architectures

Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer

Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). 

Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer
Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So

Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while

Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. 

The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while
Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!

We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets!

DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With