Kshitij Gupta (@kshitijkgupta) 's Twitter Profile
Kshitij Gupta

@kshitijkgupta

Passionate about AGI | Interested in Scaling Laws, Multimodal Foundation Models, Memory & Reasoning! @Mila_Quebec | prev @DeepMind, @Microsoft

ID: 1524900686042996736

calendar_today12-05-2022 23:53:53

19 Tweet

525 Followers

203 Following

CoLLAs 2025 (@collas_conf) 's Twitter Profile Photo

We are thrilled to release the list of invited speakers at CoLLAs 2025 2022: Yoshua Bengio, Rich Caruana, Claudia Clopath, Abhinav Gupta, Hugo Larochelle, Hanie Sedghi, Tinne Tuytelaars. Our registrations are also now open: lifelong-ml.cc/registration

We are thrilled to release the list of invited speakers at <a href="/CoLLAs_Conf/">CoLLAs 2025</a> 2022: Yoshua Bengio, Rich Caruana, Claudia Clopath, Abhinav Gupta, <a href="/hugo_larochelle/">Hugo Larochelle</a>, <a href="/HanieSedghi/">Hanie Sedghi</a>, Tinne Tuytelaars. Our registrations are also now open: lifelong-ml.cc/registration
Kshitij Gupta (@kshitijkgupta) 's Twitter Profile Photo

Excited to be here! Quick intro: Student at Mila - Institut québécois d'IA, advised by Sarath Chandar and Irina Rish! Passionate about building AI agents! Currently working in Sequential Decision Making, Scaling Laws, Reasoning, Memory, and Planning! Love exploring and learning new things!

Kshitij Gupta (@kshitijkgupta) 's Twitter Profile Photo

This is super exciting work by Google AI! Chain of thought prompting and step-by-step reasoning can help LLMs break down complex multi-step problems and iteratively reuse their knowledge to solve each sub-problem! Solving problems beyond what was seen during pretraining stage!

Kshitij Gupta (@kshitijkgupta) 's Twitter Profile Photo

What I find most exciting about this work: -Efficiently captures long contexts without an O(T^2) complexity. -Encourages capturing only relevant information from the past. -Top-Down information introduces Feedback and Recurrence into Transformers helping model sequences better!

David Krueger (@davidskrueger) 's Twitter Profile Photo

A new paper from my student Ethan Caballero is busy, Kshitij Gupta, Irina Rish and your's truly! I'm really impressed with the empirical results. The TL;DR is that we replace "linear on a log-log plot" with "piecewise linear on a log-log plot".

A new paper from my student <a href="/ethanCaballero/">Ethan Caballero is busy</a>, <a href="/kshitijkgupta/">Kshitij Gupta</a>, <a href="/irinarish/">Irina Rish</a> and your's truly!

I'm really impressed with the empirical results.

The TL;DR is that we replace "linear on a log-log plot" with "piecewise linear on a log-log plot".
Kshitij Gupta (@kshitijkgupta) 's Twitter Profile Photo

Very excited to share Broken Neural Scaling Laws! We decompose scaling trends and model them with smoothly broken power laws. This gives SotA extrapolation results on a wide set of tasks! Work done with amazing collaborators - Ethan Caballero is busy, Irina Rish, and David Krueger

Very excited to share Broken Neural Scaling Laws! 
We decompose scaling trends and model them with smoothly broken power laws. This gives SotA extrapolation results on a wide set of tasks! Work done with amazing collaborators - <a href="/ethanCaballero/">Ethan Caballero is busy</a>, <a href="/irinarish/">Irina Rish</a>, and <a href="/DavidSKrueger/">David Krueger</a>
Kshitij Gupta (@kshitijkgupta) 's Twitter Profile Photo

Happy new year everyone! 2022 was a wild ride, but I can’t wait to see what 2023 has in store as we work toward AGI. Looking forward to building multi-modal agents that can interact with external worlds and tools, and reason and solve new tasks! Here’s to an even wilder new year!

Kshitij Gupta (@kshitijkgupta) 's Twitter Profile Photo

Excited to share a sneak peek of what I have been exploring lately: How LLMs can use external tools and memory to iteratively design, implement, and debug code. Even more exciting results, features, and analyses coming out soon! kshitijkg.github.io/blog/jekyll/up… #LLMs #ChatGPT #code

Ethan Caballero is busy (@ethancaballero) 's Twitter Profile Photo

New version of Broken Neural Scaling Laws (BNSL) is out with accurate extrapolation results for the scaling behaviors listed in this attached picture: arxiv.org/abs/2210.14891 arxiv.org/pdf/2210.14891… Plots of all extrapolations are in this 🧵. Any other extrapolations you want?

New version of Broken Neural Scaling Laws (BNSL) is out with accurate extrapolation results for the scaling behaviors listed in this attached picture:

arxiv.org/abs/2210.14891
arxiv.org/pdf/2210.14891…

Plots of all extrapolations are in this đź§µ.

Any other extrapolations you want?
Ethan Caballero is busy (@ethancaballero) 's Twitter Profile Photo

Want to Superforecast AGI? Stop by "Broken Neural Scaling Laws" (arxiv.org/abs/2210.14891) poster at ICLR Conference poster session at MH1-2-3-4 #27 at 11:30AM - 1:30PM on Monday (iclr.cc/virtual/2023/p…) & at ICLR Me-FoMo Workshop poster session at AD10 at 1PM - 2PM on Thursday

Want to Superforecast AGI?

Stop by "Broken Neural Scaling Laws" (arxiv.org/abs/2210.14891) poster
at ICLR Conference poster session at MH1-2-3-4 #27 at 11:30AM - 1:30PM on Monday
(iclr.cc/virtual/2023/p…)
&amp; at ICLR Me-FoMo Workshop poster session at AD10 at 1PM - 2PM on Thursday