Rameswar Panda (@rpanda89) 's Twitter Profile
Rameswar Panda

@rpanda89

Distinguished Engineer, IBM Research

ID: 3251456054

linkhttps://rpand002.github.io/ calendar_today21-06-2015 09:07:45

57 Tweet

1,1K Followers

478 Following

IIT Gandhinagar (@iitgn) 's Twitter Profile Photo

We are happy to share that the paper submission deadline for the 13th edition of the ‘Indian Conference on Computer Vision, Graphics and Image Processing’ (#ICVGIP) has been extended till August 21. Website: bit.ly/3wD4yCR Shanmuganathan Raman Vineet Vashista udit bhatia

We are happy to share that the paper submission deadline for the 13th edition of the ‘Indian Conference on Computer Vision, Graphics and Image Processing’ (#ICVGIP) has been extended till August 21.
Website: bit.ly/3wD4yCR
<a href="/shanmugaphoto/">Shanmuganathan Raman</a> <a href="/VineetVashista/">Vineet Vashista</a> <a href="/uditbhatiaiitgn/">udit bhatia</a>
Rameswar Panda (@rpanda89) 's Twitter Profile Photo

We at MIT-IBM Watson AI Lab, are currently looking to hire PhD candidates for 2023 summer internship, to work on efficient training and inference of large language (and/or vision) models. Please DM or send me an email if you are interested.

Rameswar Panda (@rpanda89) 's Twitter Profile Photo

Happy to share that 3 papers on Efficient AI accepted to ICLR 2026. One as a "notable-top-25%" paper (Spotlight). Huge thanks to all my co-authors. Stay tuned for more details! Work MIT-IBM Watson AI Lab #ICLR2023

Rameswar Panda (@rpanda89) 's Twitter Profile Photo

We at MIT-IBM Watson AI Lab, are currently looking for a research software engineer, to work on efficient large language models, and develop prototype solutions to real-world problems, while publishing papers in top AI conferences. Apply at: krb-sjobs.brassring.com/TGnewUI/Search… #NLP #efficiency #LLMs

Rameswar Panda (@rpanda89) 's Twitter Profile Photo

We at MIT-IBM Watson AI Lab, are currently looking for a senior AI researcher, to work on efficient large language models, and develop prototype solutions to real-world problems, while publishing papers in top AI conferences. Apply at: careers.ibm.com/job/18637769/s… #NLP #efficiency #LLMs #AI

Rameswar Panda (@rpanda89) 's Twitter Profile Photo

Our team is currently looking to hire PhD candidates for 2024 summer internship, to work on efficient training and inference of large language (and/or multimodal) models. Please DM or send me an email if you are interested. MIT-IBM Watson AI Lab IBM Research

Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Yes, our goal is to create really useful code LLMs for real production use cases, not for just getting some kind of sota on HumanEval (but we still get it 😉).

Yikang Shen (@yikang_shen) 's Twitter Profile Photo

JetMoE and IBM Granite Code models are now natively available on in Huggingface Transformers v4.41! github.com/huggingface/tr…

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Shows that it is possible to share key and value heads between adjacent layers w/o perf degradation arxiv.org/abs/2405.12981

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Shows that it is possible to share key and value heads between adjacent layers w/o perf degradation

arxiv.org/abs/2405.12981
Mayank Mishra (@mayankmish98) 's Twitter Profile Photo

We have released 4-bit GGUF versions of all Granite Code models for local inference. 💻 The models can be found here: huggingface.co/collections/ib…

We have released 4-bit GGUF versions of all Granite Code models for local inference. 💻 
The models can be found here: huggingface.co/collections/ib…
Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Thanks for posting our work! (1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.

Thanks for posting our work! 
(1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.
Rameswar Panda (@rpanda89) 's Twitter Profile Photo

🚨Hiring🚨 We are looking for research scientists and engineers to join IBM Research (Cambridge, Bangalore). We train large language models and do fundamental research on directions related to LLMs. Please DM me your CV and a brief introduction of yourself if you are interested!

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

Shawn Tan (@tanshawn) 's Twitter Profile Photo

If you want to fine-tune the Granite 4.0 MoE models, Unsloth has a ready-to-go recipe here! If you're gonna roll your own, I've updated scattermoe to inject a forward pass to the Huggingface implementation that uses scattermoe. github.com/shawntan/scatt…

If you want to fine-tune the Granite 4.0 MoE models, Unsloth has a ready-to-go recipe here!

If you're gonna roll your own, I've updated scattermoe to inject a forward pass to the Huggingface implementation that uses scattermoe. 

github.com/shawntan/scatt…