Shrutimoy Das (@shrutimoy) 's Twitter Profile
Shrutimoy Das

@shrutimoy

PhD Student at IIT Gandhinagar

ID: 1405624883254493186

calendar_today17-06-2021 20:34:36

40 Tweet

79 Followers

335 Following

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Are you wondering how large language models like ChatGPT and InstructGPT actually work? One of the secret ingredients is RLHF - Reinforcement Learning from Human Feedback. Let's dive into how RLHF works in 8 tweets!

Divy Thakkar (@divy93t) 's Twitter Profile Photo

Research Week with Google - officially a wrap! Extremely energising to be with students and see their research curiosity! Till next time! Special thanks to our amazing speakers, ACs, organisers and Program Chairs !

Research Week with Google - officially a wrap! 

Extremely energising to be with students and see their research curiosity! Till next time! 

Special thanks to our amazing speakers, ACs, organisers and Program Chairs !
Ben Grimmer (@prof_grimmer) 's Twitter Profile Photo

I've proven the strangest result of my career.. The classic idea that gradient descent's rate is best with constant stepsizes 1/L is wrong. The idea that we need stepsizes in (0,2/L) for convergence is wrong. Periodic long steps are better, provably. arxiv.org/abs/2307.06324

I've proven the strangest result of my career..
The classic idea that gradient descent's rate is best with constant stepsizes 1/L is wrong. The idea that we need stepsizes in (0,2/L) for convergence is wrong.
Periodic long steps are better, provably.
arxiv.org/abs/2307.06324
Petar Veličković (@petarv_93) 's Twitter Profile Photo

Scientific discovery in the Age of AI 🧪🤖🧑‍🔬✨ ...now published in nature! It's been fantastic writing this survey-spinoff of the AI for Science workshops with these amazing coauthors! Thanks Marinka Zitnik for always keeping our spirits high! 😊 nature.com/articles/s4158…

Scientific discovery in the Age of AI 🧪🤖🧑‍🔬✨ 
...now published in <a href="/Nature/">nature</a>!

It's been fantastic writing this survey-spinoff of the <a href="/AI_for_Science/">AI for Science</a> workshops with these amazing coauthors! Thanks <a href="/marinkazitnik/">Marinka Zitnik</a> for always keeping our spirits high! 😊

nature.com/articles/s4158…
Shubhendu Trivedi (@_onionesque) 's Twitter Profile Photo

This probably keeps getting shared here all the time, but it's worth resharing: An excellent set of lectures on high dimensional probability and concentration inequalities by Roman Vershynin. These complement his great book well. math.uci.edu/~rvershyn/teac…

elvis (@omarsar0) 's Twitter Profile Photo

LLMs as Optimizers This is a really neat idea. This new paper from Google DeepMind proposes an approach where the optimization problem is described in natural language. An LLM is then instructed to iteratively generate new solutions based on the defined problem and previously

LLMs as Optimizers

This is a really neat idea. This new paper from Google DeepMind proposes an approach where the optimization problem is described in natural language. 

An LLM is then instructed to iteratively generate new solutions based on the defined problem and previously
AI Coffee Break with Letitia (@aicoffeebreak) 's Twitter Profile Photo

How does LoRA work? Low-Rank Adaptation for Parameter-Efficient LLM Finetuning explained. 👇 📺 youtu.be/KEv-F5UkhxU Great work by Edward Hu , Yelong Shen and collaborators! 👏 arxiv.org/abs/2106.09685

How does LoRA work?
Low-Rank Adaptation for Parameter-Efficient LLM Finetuning explained. 👇

📺 youtu.be/KEv-F5UkhxU

Great work by <a href="/edwardjhu/">Edward Hu</a> , Yelong Shen and collaborators! 👏 arxiv.org/abs/2106.09685
Ben Grimmer (@prof_grimmer) 's Twitter Profile Photo

The new strangest results of my career (with Kevin Shu and Alex Wang). Gradient descent can accelerate (in big-O!) by just periodically taking longer steps. No momentum needed to beat O(1/T) in smooth convex opt! Paper: arxiv.org/abs/2309.09961 [1/3]

The new strangest results of my career (with Kevin Shu and Alex Wang).

Gradient descent can accelerate (in big-O!) by just periodically taking longer steps.
No momentum needed to beat O(1/T) in smooth convex opt! 

Paper: arxiv.org/abs/2309.09961 [1/3]
MIT CSAIL (@mit_csail) 's Twitter Profile Photo

“The best learners are the people who push through the discomfort of being objectively bad at something.” — Tommy Collison

Shubhajit Roy (@royshubhajit) 's Twitter Profile Photo

Motivation to work on #connectivity by the “Ethernet Man” @BobMetcalfe12. Also awesome talk by Prof. Saket Saurabh, Faculty #IMSc #Chennai on Bad algorithms. Again, thanks to ACM India,Association for Computing Machinery and Infosys

Motivation to work on #connectivity by the “Ethernet Man” @BobMetcalfe12. 

Also awesome talk by Prof. Saket Saurabh, Faculty #IMSc #Chennai on Bad algorithms.

Again, thanks to <a href="/Indiaacm/">ACM India</a>,<a href="/TheOfficialACM/">Association for Computing Machinery</a> and <a href="/Infosys/">Infosys</a>
Prof. Anima Anandkumar (@animaanandkumar) 's Twitter Profile Photo

For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge

Quanta Magazine (@quantamagazine) 's Twitter Profile Photo

In two recent papers, researchers have improved upon the best-known speed for matrix multiplication. Steve Nadis reports: quantamagazine.org/new-breakthrou…