Aleksa Gordić (水平问题) (@gordic_aleksa) 's Twitter Profile
Aleksa Gordić (水平问题)

@gordic_aleksa

x @GoogleDeepMind @Microsoft

proud father of 16 H100s

flirting with LLMs, tensor core maximalist

ID: 907007346546810881

linkhttps://gordicaleksa.com/ calendar_today10-09-2017 22:26:17

4,4K Tweet

22,22K Followers

223 Following

Aleksa Gordić (水平问题) (@gordic_aleksa) 's Twitter Profile Photo

we're now basically building infinite RL environments - a simulation of our world, and once that grand project is completed, we'll have given birth to a superintelligence at that same point in time, a new universe is born, and evolution will start in this new world, ultimately

Aleksa Gordić (水平问题) (@gordic_aleksa) 's Twitter Profile Photo

getting to human-level performance took: 15 years on MNIST 7 years on ImageNet 9 months on GLUE and now it feels we're talking days/weeks (and benchmarks are increasingly more challenging) once it becomes real-time, do we even care what you call it?

Ethan (@torchcompiled) 's Twitter Profile Photo

This is a really great point and something I totally forgot about. I've been often thinking there may be future optimization algorithms that may deviate from SGD or be something of a hybrid approach. However if you keep stacking parameters local minima are less of an enemy

This is a really great point and something I totally forgot about. I've been often thinking there may be future optimization algorithms that may deviate from SGD or be something of a hybrid approach. However if you keep stacking parameters local minima are less of an enemy