Gabriel Rojo (@ggomezrojo) 's Twitter Profile
Gabriel Rojo

@ggomezrojo

Not investment advice.

ID: 227637029

linkhttp://www.emeritacapital.com calendar_today17-12-2010 11:16:30

17,17K Tweet

878 Followers

349 Following

Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use!

With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive
_its_not_real_ (@_its_not_real_) 's Twitter Profile Photo

"They're made out of meat." "Meat?" "Meat. Humans. They're made entirely out of meat." "But that's impossible. What about all the tokens they generate? The text? The code?" "They do produce tokens, but the tokens aren't their essence. They're merely outputs. The humans themselves

Ethan Mollick (@emollick) 's Twitter Profile Photo

"o3 I want you to make a map of the lighthouses of the great lakes. I want the map in “dark mode “ but each lighthouse marker should be aesthetically sized so it covers the distance it can be seen on an average night and is the color of the light" Few rounds of feedback later...

DeepLearning.AI (@deeplearningai) 's Twitter Profile Photo

CB Insights released its 2024 AI 100 list, spotlighting early-stage non-public startups that show strong market traction, financial health, and growth potential. The most recent cohort shows a growing market for agents and infrastructure, with over 20 percent of companies

CB Insights released its 2024 AI 100 list, spotlighting early-stage non-public startups that show strong market traction, financial health, and growth potential. 

The most recent cohort shows a growing market for agents and infrastructure, with over 20 percent of companies
m_ric (@aymericroucher) 's Twitter Profile Photo

I've made an open and free version of Google's NotebookLM, and it shows how high the open source tech task has risen! 💪 The app's workflow is simple. Given a source PDF or URL, it extracts the content from it, then tasks AI at Meta's Llama 3.3-70B, with writing the podcast

I've made an open and free version of Google's NotebookLM, and it shows how high the open source tech task has risen! 💪

The app's workflow is simple. Given a source PDF or URL, it extracts the content from it, then tasks <a href="/AIatMeta/">AI at Meta</a>'s Llama 3.3-70B, with writing the podcast
Haider. (@slow_developer) 's Twitter Profile Photo

Anthropic CPO, Mike Krieger: "over 70% of Anthropic pull requests are now generated by AI" but we're still figuring out what that means for code review and long-term architecture.

Agus 🔎 🔸 (@austinc3301) 's Twitter Profile Photo

Why is ~no one in the field of AI talking about Anthropic's On the Biology of a Large Language Model? For the first time, we get a pretty good glimpse of how LLMs reason through complex problems internally, but no one seems to be curious enough to care.

Why is ~no one in the field of AI talking about Anthropic's On the Biology of a Large Language Model? 

For the first time, we get a pretty good glimpse of how LLMs reason through complex problems internally, but no one seems to be curious enough to care.
Topaz Labs (@topazlabs) 's Twitter Profile Photo

It’s finally here. Starlight is now local in the all-new Video AI 7. And there’s more. See the release thread for every detail. 👇

Mehrdad Farajtabar (@mfarajtabar) 's Twitter Profile Photo

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching?

The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,
Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
elvis (@omarsar0) 's Twitter Profile Photo

Self-Challenging LLM Agents Self-improving AI systems are starting to show up everywhere. Meta and colleagues present self-improvement for general multi-turn tool-use LLM agents. Pay attention to this one, devs! Here are my notes:

Self-Challenging LLM Agents

Self-improving AI systems are starting to show up everywhere.

Meta and colleagues present self-improvement for general multi-turn tool-use LLM agents.

Pay attention to this one, devs!

Here are my notes:
SemiAnalysis (@semianalysis_) 's Twitter Profile Photo

Huawei faced the expert load balancing problem when training their mixture-of-experts (MoE) model Pangu Ultra MoE. Expert load balancing is a compromise between training dynamics and system efficiency. Here we explain the expert load balancing problem and Huawei's proposed

Huawei faced the expert load balancing problem when training their mixture-of-experts (MoE) model Pangu Ultra MoE.

Expert load balancing is a compromise between training dynamics and system efficiency.

Here we explain the expert load balancing problem and Huawei's proposed
Lisan al Gaib (@scaling01) 's Twitter Profile Photo

A few more observations after replicating the Tower of Hanoi game with their exact prompts: - You need AT LEAST 2^N - 1 moves and the output format requires 10 tokens per move + some constant stuff. - Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and

A few more observations after replicating the Tower of Hanoi game with their exact prompts:

- You need AT LEAST 2^N - 1 moves and the output format requires 10 tokens per move + some constant stuff.
- Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and
Keyon Vafa (@keyonv) 's Twitter Profile Photo

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

Akshay 🚀 (@akshay_pachaar) 's Twitter Profile Photo

ML researchers just built a new ensemble technique. It even outperforms XGBoost, CatBoost, and LightGBM. Here's a complete breakdown (explained visually):

elvis (@omarsar0) 's Twitter Profile Photo

One Token to Fool LLM-as-a-Judge Watch out for this one, devs! Semantically empty tokens, like “Thought process:”, “Solution”, or even just a colon “:”, can consistently trick models into giving false positive rewards. Here are my notes:

One Token to Fool LLM-as-a-Judge

Watch out for this one, devs!

Semantically empty tokens, like “Thought process:”, “Solution”, or even just a colon “:”, can consistently trick models into giving false positive rewards.

Here are my notes:
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

"How Many Instructions Can LLMs Follow at Once?" In this paper they found that leading LLMs can satisfy only about 68% of 500 concurrent instructions, showing a bias toward earlier instructions.

"How Many Instructions Can LLMs Follow at Once?"

In this paper they found that leading LLMs can satisfy only about 68% of 500 concurrent instructions, showing a bias toward earlier instructions.
bycloud (@bycloudai) 's Twitter Profile Photo

Manus posted a pretty interesting blog on “context engineering” that u don’t see often perfect for u if u are building around LLM applications gave me some optimization ideas that i wanna try for my app 🤔

Manus posted a pretty interesting blog on “context engineering” that u don’t see often 

perfect for u if u are building around LLM applications

gave me some optimization ideas that i wanna try for my app 🤔