Brandon Trabucco @ ICLR (@brandontrabucco) 's Twitter Profile
Brandon Trabucco @ ICLR

@brandontrabucco

AI/ML PhD Student at @mldcmu advised by @rsalakhu, Deep Learning, recipient of the @NDSEG Fellowship, musician soundcloud.com/brandontrabucco

ID: 2801069142

linkhttps://btrabuc.co calendar_today10-09-2014 04:42:30

289 Tweet

645 Followers

298 Following

Brandon Trabucco @ ICLR (@brandontrabucco) 's Twitter Profile Photo

With the success of LLM agents like OpenAI Operator, we are entering a new scaling era, but how do we train these agent models? We present InSTA, the largest training environment for LLM agents, containing live web navigation tasks for 150k diverse websites in multiple

Sachin Goyal (@goyalsachin007) 's Twitter Profile Photo

Think your LLM loves endless pre-training? ๐Ÿšจ Think again! Plot twist ahead! ๐ŸŽข While it (obviously) gives better base models, it might not necessarily give a better starting point for all the fancy post-training we do these days (instruction FT, multimodal FT, etc.). ๐Ÿ‘€

Bowen Wang (@bowenwangnlp) 's Twitter Profile Photo

๐ŸŽฎ Computer Use Agent Arena is LIVE! ๐Ÿš€ ๐Ÿ”ฅ Easiest way to test computer-use agents in the wild without any setup ๐ŸŒŸ Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more ๐Ÿ•น๏ธ Test agents on 100+ real apps & webs with one-click config ๐Ÿ”’ Safe & free

Agentica Project (@agentica_) 's Twitter Profile Photo

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, weโ€™re releasing everything: not just the model, but the dataset, code, and training recipeโ€”so you can train it yourself!๐Ÿ”ฅ Links below:

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math.

The best part is, weโ€™re releasing everything: not just the model, but the dataset, code, and training recipeโ€”so you can train it yourself!๐Ÿ”ฅ

Links below:
Brandon Trabucco @ ICLR (@brandontrabucco) 's Twitter Profile Photo

๐ŸŒ Building web-scale agents, and tired of Math and Coding tasks? Come chat with us at ICLR in Singapore. We are presenting InSTA at the DATA-FM workshop in the second Oral session, April 28th 2:30pm. InSTA is the largest environment for training agents, spanning 150k live

Christina Baek (@_christinabaek) 's Twitter Profile Photo

When we train models to do QA, are we robustly improving context dependency? No! In our ICLR Oral (Fri 11 AM), we show that if the base model knows the facts already, it shortcuts and learns to ignore the context completely! Visit us to learn more about knowledge conflicts ๐Ÿ˜€

When we train models to do QA, are we robustly improving context dependency? No!

In our ICLR Oral (Fri 11 AM), we show that if the base model knows the facts already, it shortcuts and learns to ignore the context completely! 

Visit us to learn more about knowledge conflicts ๐Ÿ˜€
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

Join me & @hbxnov at #ICLR2025 for our very purple poster on risks of LLM evals by private companies! ๐Ÿ•’ Today, 10am | ๐Ÿชง #219 Beyond Llama drama, LMSYS incorporation & ARC-AGI train/test fiasco, we discuss irreducible biasesโ€”even when firms act in good faith. Come say hi! ๐Ÿ’œ

Join me & @hbxnov at #ICLR2025 for our very purple poster on risks of LLM evals by private companies!

๐Ÿ•’ Today, 10am | ๐Ÿชง #219

Beyond Llama drama, LMSYS incorporation & ARC-AGI train/test fiasco, we discuss irreducible biasesโ€”even when firms act in good faith. Come say hi! ๐Ÿ’œ
Brandon Trabucco @ ICLR (@brandontrabucco) 's Twitter Profile Photo

Building LLM Agents? Come to my talk at the #ICLR DATA-FM workshop today at 2:30pm, Hall 4, Section 4. I'll be presenting InSTA, our work building the largest environment for agents on the live internet. arxiv.org/abs/2502.06776 #Agents #LLM

Building LLM Agents? Come to my talk at the #ICLR DATA-FM workshop today at 2:30pm, Hall 4, Section 4.

I'll be presenting InSTA, our work building the largest environment for agents on the live internet.

arxiv.org/abs/2502.06776

#Agents #LLM
Tianhao Wang ("Jiachen") @ICLR (@jiachenwang97) 's Twitter Profile Photo

It was challenging to organize the workshop as the sole in-person organizer, and Iโ€™m deeply grateful to everyone for their incredible support in making it a great success. Danqi Chen Peter Henderson Kyle Lo Vahab Mirrokni Bryan Kian Hsiang Low Xinran Gu Brandon Trabucco Zheng Xu, Edward Yeo,

It was challenging to organize the workshop as the sole in-person organizer, and Iโ€™m deeply grateful to everyone for their incredible support in making it a great success. <a href="/danqi_chen/">Danqi Chen</a> <a href="/PeterHndrsn/">Peter Henderson</a> <a href="/kylelostat/">Kyle Lo</a> <a href="/mirrokni/">Vahab Mirrokni</a> <a href="/bryanklow/">Bryan Kian Hsiang Low</a> <a href="/hmgxr128/">Xinran Gu</a> <a href="/brandontrabucco/">Brandon Trabucco</a> Zheng Xu, Edward Yeo,
MIT Media Lab (@medialab) 's Twitter Profile Photo

30+ years of Media Lab students, alumni, and postdocs at CHI 2025 in Yokohama! Photo courtesy of Professor Pattie Maes. #chi2025

30+ years of Media Lab students, alumni, and postdocs at CHI 2025 in Yokohama! Photo courtesy of Professor Pattie Maes. #chi2025
Stefano Ermon (@stefanoermon) 's Twitter Profile Photo

Theyโ€™re here. ๐Ÿ”ฅ Inceptionโ€™s diffusion LLMs โ€” lightning fast, state-of-the-art, and now public. Go build the future โ†’ platform.inceptionlabs.ai #GenAI #dLLMs #diffusion

Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

๐˜๐˜ถ๐˜ฎ๐˜ข๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜ฌ ๐˜ง๐˜ญ๐˜ถ๐˜ช๐˜ฅ๐˜ญ๐˜บโ€”๐˜ฏ๐˜ข๐˜ท๐˜ช๐˜จ๐˜ข๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ข๐˜ฃ๐˜ด๐˜ต๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ค๐˜ฐ๐˜ฏ๐˜ค๐˜ฆ๐˜ฑ๐˜ต๐˜ด ๐˜ฆ๐˜ง๐˜ง๐˜ฐ๐˜ณ๐˜ต๐˜ญ๐˜ฆ๐˜ด๐˜ด๐˜ญ๐˜บ, ๐˜ง๐˜ณ๐˜ฆ๐˜ฆ ๐˜ง๐˜ณ๐˜ฐ๐˜ฎ ๐˜ณ๐˜ช๐˜จ๐˜ช๐˜ฅ ๐˜ญ๐˜ช๐˜ฏ๐˜จ๐˜ถ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค ๐˜ฃ๐˜ฐ๐˜ถ๐˜ฏ๐˜ฅ๐˜ข๐˜ณ๐˜ช๐˜ฆ๐˜ด. But current reasoning models remain constrained by discrete tokens, limiting their full

๐˜๐˜ถ๐˜ฎ๐˜ข๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜ฌ ๐˜ง๐˜ญ๐˜ถ๐˜ช๐˜ฅ๐˜ญ๐˜บโ€”๐˜ฏ๐˜ข๐˜ท๐˜ช๐˜จ๐˜ข๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ข๐˜ฃ๐˜ด๐˜ต๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ค๐˜ฐ๐˜ฏ๐˜ค๐˜ฆ๐˜ฑ๐˜ต๐˜ด ๐˜ฆ๐˜ง๐˜ง๐˜ฐ๐˜ณ๐˜ต๐˜ญ๐˜ฆ๐˜ด๐˜ด๐˜ญ๐˜บ, ๐˜ง๐˜ณ๐˜ฆ๐˜ฆ ๐˜ง๐˜ณ๐˜ฐ๐˜ฎ ๐˜ณ๐˜ช๐˜จ๐˜ช๐˜ฅ ๐˜ญ๐˜ช๐˜ฏ๐˜จ๐˜ถ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค ๐˜ฃ๐˜ฐ๐˜ถ๐˜ฏ๐˜ฅ๐˜ข๐˜ณ๐˜ช๐˜ฆ๐˜ด. But current reasoning models remain constrained by discrete tokens, limiting their full
David Bau (@davidbau) 's Twitter Profile Photo

Dear MAGA friends, I have been worrying about STEM in the US a lot, because right now the Senate is writing new laws that cut 75% of the STEM budget in the US. Sorry for the long post, but the issue is really important, and I want to share what I know about it. The entire

Jason Weston (@jaseweston) 's Twitter Profile Photo

๐ŸšจSelf-Challenging Language Model Agents๐Ÿšจ ๐Ÿ“: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to

๐ŸšจSelf-Challenging Language Model Agents๐Ÿšจ
๐Ÿ“: arxiv.org/abs/2506.01716

A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to
Chhavi Yadav (@chhaviyadav_) 's Twitter Profile Photo

Upon graduation, I paused to reflect on what my PhD had truly taught me. Was it just how to write papers, respond to brutal reviewer comments, and survive without much sleep? Or did it leave a deeper imprint on me โ€” beyond the metrics and milestones? Turns out, it did. A

Upon graduation, I paused to reflect on what my PhD had truly taught me. Was it just how to write papers, respond to brutal reviewer comments, and survive without much sleep? Or did it leave a deeper imprint on me โ€” beyond the metrics and milestones? Turns out, it did.

A
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Say ahoy to ๐š‚๐™ฐ๐™ธ๐™ป๐™พ๐šโ›ต: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! ๐š‚๐™ฐ๐™ธ๐™ป๐™พ๐š โ›ต out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!