Rameswar Panda (@rpanda89) Twitter Tweets • TwiCopy

Rameswar Panda

@rpanda89

3 years ago

Check out our new work at #ICML2022

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

We are happy to share that the paper submission deadline for the 13th edition of the ‘Indian Conference on Computer Vision, Graphics and Image Processing’ (#ICVGIP) has been extended till August 21. Website: bit.ly/3wD4yCR Shanmuganathan Raman Vineet Vashista udit bhatia

thumb_up_off_alt17

chat_bubble_outline1

repeat6

shareShare

Rameswar Panda

@rpanda89

3 years ago

We at MIT-IBM Watson AI Lab, are currently looking to hire PhD candidates for 2023 summer internship, to work on efficient training and inference of large language (and/or vision) models. Please DM or send me an email if you are interested.

thumb_up_off_alt151

chat_bubble_outline5

repeat19

shareShare

Rameswar Panda

@rpanda89

3 years ago

Happy to share that 3 papers on Efficient AI accepted to ICLR 2026. One as a "notable-top-25%" paper (Spotlight). Huge thanks to all my co-authors. Stay tuned for more details! Work MIT-IBM Watson AI Lab #ICLR2023

thumb_up_off_alt36

chat_bubble_outline0

repeat1

shareShare

Rameswar Panda

@rpanda89

3 years ago

We at MIT-IBM Watson AI Lab, are currently looking for a research software engineer, to work on efficient large language models, and develop prototype solutions to real-world problems, while publishing papers in top AI conferences. Apply at: krb-sjobs.brassring.com/TGnewUI/Search… #NLP #efficiency #LLMs

thumb_up_off_alt30

chat_bubble_outline1

repeat8

shareShare

Rameswar Panda

@rpanda89

2 years ago

We at MIT-IBM Watson AI Lab, are currently looking for a senior AI researcher, to work on efficient large language models, and develop prototype solutions to real-world problems, while publishing papers in top AI conferences. Apply at: careers.ibm.com/job/18637769/s… #NLP #efficiency #LLMs #AI

thumb_up_off_alt30

chat_bubble_outline1

repeat10

shareShare

Rameswar Panda

@rpanda89

2 years ago

Our team is currently looking to hire PhD candidates for 2024 summer internship, to work on efficient training and inference of large language (and/or multimodal) models. Please DM or send me an email if you are interested. MIT-IBM Watson AI Lab IBM Research

thumb_up_off_alt74

chat_bubble_outline3

repeat16

shareShare

Yikang Shen

@yikang_shen

a year ago

Yes, our goal is to create really useful code LLMs for real production use cases, not for just getting some kind of sota on HumanEval (but we still get it 😉).

thumb_up_off_alt21

chat_bubble_outline0

repeat7

shareShare

Yikang Shen

@yikang_shen

a year ago

JetMoE and IBM Granite Code models are now natively available on in Huggingface Transformers v4.41! github.com/huggingface/tr…

thumb_up_off_alt13

chat_bubble_outline1

repeat8

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a year ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Shows that it is possible to share key and value heads between adjacent layers w/o perf degradation arxiv.org/abs/2405.12981

thumb_up_off_alt363

chat_bubble_outline5

repeat77

shareShare

Mayank Mishra

@mayankmish98

a year ago

We have released 4-bit GGUF versions of all Granite Code models for local inference. 💻 The models can be found here: huggingface.co/collections/ib…

thumb_up_off_alt14

chat_bubble_outline0

repeat9

shareShare

Yikang Shen

@yikang_shen

a year ago

Thanks for posting our work! (1/5) After running thousands of experiments with the WSD learning rate scheduler and μTransfer, we found that the optimal learning rate strongly correlates with the batch size and the number of tokens.

thumb_up_off_alt134

chat_bubble_outline4

repeat27

shareShare

Rameswar Panda

@rpanda89

a year ago

Apply soon if you’re interested in joining a small, growing team to make a large impact together!!

thumb_up_off_alt15

chat_bubble_outline1

repeat0

shareShare

Rameswar Panda

@rpanda89

7 months ago

🚨Hiring🚨 We are looking for research scientists and engineers to join IBM Research (Cambridge, Bangalore). We train large language models and do fundamental research on directions related to LLMs. Please DM me your CV and a brief introduction of yourself if you are interested!

thumb_up_off_alt622

chat_bubble_outline11

repeat55

shareShare

Songlin Yang

@songlinyang4

5 months ago

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

thumb_up_off_alt424

chat_bubble_outline9

repeat79

shareShare

Shawn Tan

@tanshawn

13 days ago

If you want to fine-tune the Granite 4.0 MoE models, Unsloth has a ready-to-go recipe here! If you're gonna roll your own, I've updated scattermoe to inject a forward pass to the Huggingface implementation that uses scattermoe. github.com/shawntan/scatt…

thumb_up_off_alt35

chat_bubble_outline1

repeat4

shareShare

Rameswar Panda

@rpanda89

9 days ago

Amazing work from Zhangchen Xu!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Rameswar Panda

Rameswar Panda

IIT Gandhinagar

Rameswar Panda

Rameswar Panda

Rameswar Panda

Rameswar Panda

Rameswar Panda

Yikang Shen

Yikang Shen

Aran Komatsuzaki

Mayank Mishra

Yikang Shen

Rameswar Panda

Rameswar Panda

Songlin Yang

Shawn Tan

Rameswar Panda