Florent BARTOCCIONI (@fbartoc) 's Twitter Profile
Florent BARTOCCIONI

@fbartoc

Building world models at valeoAI

ID: 1258383960721231872

linkhttp://f-barto.github.io calendar_today07-05-2020 13:11:43

209 Tweet

41 Followers

672 Following

RoboPapers (@robopapers) 's Twitter Profile Photo

Full episode dropping soon! Geeking out with Paul Zhou on AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World auto-eval.github.io Co-hosted by Chris Paxton & Michael Cho - Rbt/Acc

Hongyang Li (@francislee2020) 's Twitter Profile Photo

Introducing #UniVLA, this is a unified VLA framework that enables policy learning across different environments, exhibiting unanimous improvement on multiple manipulation and navigation tasks. #RSS2025 github.com/OpenDriveLab/U…

Michael Cho - Rbt/Acc (@micoolcho) 's Twitter Profile Photo

This makes the past 3 years of work and often months away from my family worth it. A big shoutout to noriaki_hirose , Lydia Ignatova, Kyle Stachowicz Catherine Glossop Dhruv Shah Sergey Levine for giving meaning to the work we do FrodoBots While some see our attempts in robotic

Rudy Gilman (@rgilman33) 's Twitter Profile Photo

The secret life of SwiGLU Simple neurons like those using ReLU, GELU or SiLU create a new dimension, then slice across that same dimension to lop off part of the space. A gated neuron, on the other hand, can align the knife however it wants. In DINO-v2 what's interesting is

The secret life of SwiGLU  

Simple neurons like those using ReLU, GELU or SiLU create a new dimension, then slice across that same dimension to lop off part of the space.  

A gated neuron, on the other hand, can align the knife however it wants. In DINO-v2 what's interesting is
OpenDriveLab (@opendrivelab) 's Twitter Profile Photo

💥 Forget slow autoregression and skip rigid full-sequence denoising! Nexus is a next-gen predictive pipeline for realistic, safety-critical driving scene generation. What’s new? ✅ Decoupled diffusion → fast updates, goal-driven control ✅ Noise-masking training → inject

Edward Milsom (@edward_milsom) 's Twitter Profile Photo

To address the "parameterisation lottery" (ideas win because they work well with popular choices of e.g. learning rates) I think empirical hyperparameter transfer methods are crucial. Rules like mu-P require you to derive them first, which is painful... x.com/edward_milsom/…

Seohong Park (@seohong_park) 's Twitter Profile Photo

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

We found a way to do RL *only* with BC policies.

The idea is simple:

1. Train a BC policy π(a|s)
2. Train a conditional BC policy π(a|s, z)
3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG

Here, z can be anything (e.g., goals for goal-conditioned RL).

🧵↓
jack morris (@jxmnop) 's Twitter Profile Photo

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

new paper from our work at Meta!

**GPT-style language models memorize 3.6 bits per param**

we compute capacity by measuring total bits memorized, using some theory from Shannon (1953)

shockingly, the memorization-datasize curves look like this:
      ___________
  /
/

(🧵)
Xun Huang (@xunhuang1995) 's Twitter Profile Photo

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

Robert Lange (@roberttlange) 's Twitter Profile Photo

Text-to-LoRA: What if you no longer had to fine-tune your LLM for every single downstream task? 🚀 Stoked to share our work on instant LLM adaptation using meta-learned hypernetworks 📝 → 🔥 The idea is simple yet elegant: We text-condition a hypernetwork to output LoRA

Text-to-LoRA: What if you no longer had to fine-tune your LLM for every single downstream task?

🚀 Stoked to share our work on instant LLM adaptation using meta-learned hypernetworks 📝 →  🔥

The idea is simple yet elegant: We text-condition a hypernetwork to output LoRA
Ethan (@torchcompiled) 's Twitter Profile Photo

Modeling dolphin language is cool. Translating it into human speak is cooler. Somewhere you're gonna want to figure out how to align the latent space of dolphin language with that of human language in an unpaired, unbiased manner.

Modeling dolphin language is cool. Translating it into human speak is cooler. 

Somewhere you're gonna want to figure out how to align the latent space of dolphin language with that of human language in an unpaired, unbiased manner.
Brian Christian (@brianchristian) 's Twitter Profile Photo

Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵

Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵
leloy! (@leloykun) 's Twitter Profile Photo

This effect seems to just be an artifact of SGD/Adam/AdamW/etc and more modern optimizers, e.g. Muon/Shampoo/PSGD, don't have this 'issue'. The crux is that the raw 'gradients' we get from backpropagation tend to have low (stable) rank. And optimizers like SGD/AdamW preserves

This effect seems to just be an artifact of SGD/Adam/AdamW/etc and more modern optimizers, e.g. Muon/Shampoo/PSGD, don't have this 'issue'.

The crux is that the raw 'gradients' we get from backpropagation tend to have low (stable) rank. And optimizers like SGD/AdamW preserves
Y Combinator (@ycombinator) 's Twitter Profile Photo

François Chollet (François Chollet) on the ARC Prize and how we get to AGI. At AI Startup School in San Francisco. 00:00 - The Falling Cost of Compute 00:57 - Deep-Learning’s Scaling Era & Benchmarks 01:59 - The ARC Benchmark 03:02 - The 2024 Shift to Test-Time Adaptation 05:01 - What

Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

I'll discuss distributed learning on Saturday, July 12. First, I'll cover current methods needing high bandwidth, then next-generation methods for decentralized learning

Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default.
If anyone wants to take a crack at it... 
github.com/pytorch/pytorc…
Shashank (@shawshank_v) 's Twitter Profile Photo

Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵

Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵