Hossein Kashiani (@hossein_serein) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Gave a tutorial at the Artificial Intelligence Summer School in Turkey yesterday. It's weird bc I can't see the audience via Zoom, and the jokes aren't even remotely funny. Check out the concise version here: youtube.com/watch?v=i2qSxM…

thumb_up_off_alt207

chat_bubble_outline3

repeat30

shareShare

Ruizhe Li

@liruizhe94

a year ago

The research area of interpretability in LLMs grows very fast, and it might be very hard for beginners to know where to start and for researchers to follow the latest research progress. I try to collect all relevant resources here: github.com/ruizheliUOA/Aw…

thumb_up_off_alt287

chat_bubble_outline3

repeat52

shareShare

Pascal Mettes

@pascalmettes

a year ago

Vision-language models benefit from hyperbolic embeddings for standard tasks, but did you know that hyperbolic vision-language models also have surprising properties? Our new #TMLR paper shows 3 intriguing properties. w/ Sarah Ibrahimi Mina Ghadimi @nannevn marcel worring

thumb_up_off_alt301

chat_bubble_outline4

repeat54

shareShare

Atsuyuki Miyai @UTokyo

@atsumiyaiam

a year ago

How are OOD Detection, Open-set Recognition, Anomaly Detection, etc. evolving in the CLIP & GPT-4V eras?🤔 📣 Check our preprint survey on Generalized OOD detection v2! arxiv.org/abs/2407.21794 We encapsulate the evolution of these tasks in VLM era, identifying challenges!

thumb_up_off_alt63

chat_bubble_outline5

repeat19

shareShare

Stedman Halliday

@stedmanhalliday

a year ago

How to fix Nerd Neck 🪄 By popular demand, here's a thread of tips for addressing this increasingly common issue.

thumb_up_off_alt30,30K

chat_bubble_outline185

repeat2,2K

shareShare

Andreas Kirsch 🇺🇦

@blackhc

a year ago

Excited to publish a Python package that turns Andrej Karpathy's "A Recipe for Training Neural Networks" into easy-to-use diagnostics code! 🔧 No more randomly poking around in your custom PyTorch DNN to debug it. Get simple diagnostics for your neural nets 🫶 #PyTorch 1/

Excited to publish a Python package that turns <a href="/karpathy/">Andrej Karpathy</a>'s "A Recipe for Training Neural Networks" into easy-to-use diagnostics code! 🔧

No more randomly poking around in your custom <a href="/PyTorch/">PyTorch</a> DNN to debug it.

Get simple diagnostics for your neural nets 🫶

#PyTorch

1/

thumb_up_off_alt720

chat_bubble_outline9

repeat117

shareShare

Sumanth

@sumanth_077

9 months ago

Leetcode style problems for Machine Learning algorithms: deep-ml.com

thumb_up_off_alt720

chat_bubble_outline13

repeat132

shareShare

Chip Huyen

@chipro

8 months ago

It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links. My editor just told me the manuscript has been sent to the printers. - The ebook will be coming out later this week. - Paperback copies should be available in a few weeks (hopefully

thumb_up_off_alt5,5K

chat_bubble_outline173

repeat627

shareShare

wh

@nrehiew_

8 months ago

Another diffusion paper from ICLR 2025 with 8 8 8 8. tldr: they introduce shortcut models that can generate images under any amount of inference time compute including a single step

thumb_up_off_alt452

chat_bubble_outline4

repeat50

shareShare

Ravid Shwartz Ziv

@ziv_ravid

7 months ago

Interested in representation in intermediate layers of LLMs? What is a good representation, and how can it be measured? Visit Oscar's poster on "Does Representation Matter? Exploring Intermediate Layers in Large Language Models"

thumb_up_off_alt13

chat_bubble_outline2

repeat2

shareShare

Andrej Karpathy

@karpathy

7 months ago

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being

thumb_up_off_alt19,19K

chat_bubble_outline413

repeat2,2K

shareShare

AI at Meta

@aiatmeta

7 months ago

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz

thumb_up_off_alt1,1K

chat_bubble_outline28

repeat192

shareShare

Sebastian Raschka

@rasbt

7 months ago

Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article. It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision: magazine.sebastianraschka.com/p/ai-research-…

thumb_up_off_alt955

chat_bubble_outline23

repeat150

shareShare

Lei Li

@lileics

7 months ago

Is DPO better than PPO? What are the important ingredients to properly train RLHF for LLM? How to train them efficiently? Yi Wu from Tsinghua University gave an excellent talk on Effective RL training for LLMs at CMU LTI before #NeurIPS2024. youtube.com/watch?v=T1SeqB… Language Technologies Institute | @CarnegieMellon

thumb_up_off_alt277

chat_bubble_outline3

repeat54

shareShare

Fazl Barez

@fazlbarez

7 months ago

🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨 Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇 Paper: arxiv.org/pdf/2501.04952 1/8

thumb_up_off_alt238

chat_bubble_outline8

repeat63

shareShare

Morgan Brown

@morganb

6 months ago

🧵 Finally had a chance to dig into DeepSeek’s r1… Let me break down why DeepSeek's AI innovations are blowing people's minds (and possibly threatening Nvidia's $2T market cap) in simple terms...

thumb_up_off_alt56,56K

chat_bubble_outline1,1K

repeat12,12K

shareShare

NYU Center for Data Science

@nyudatascience

5 months ago

Recent work by Md Rifat Arefin, Oscar Skean, & CDS' Ravid Shwartz Ziv and Yann LeCun shows intermediate layers in LLMs often outperform the final layer for downstream tasks. Using info theory & geometric analysis, they reveal why this happens & how it impacts models. nyudatascience.medium.com/middle-layers-…

thumb_up_off_alt101

chat_bubble_outline5

repeat19

shareShare

david

@dav1d_bai

4 months ago

New blog post! We can use inference-time compute to reduce hallucination rates in reasoning models by injecting an interruption token and sampling in parallel.(1/n)

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat110

shareShare

Pavlo Molchanov

@pavlomolchanov

4 months ago

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025. Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like

thumb_up_off_alt670

chat_bubble_outline14

repeat135

shareShare

Hossein Kashiani

Gate.io

Jia-Bin Huang

Ruizhe Li

Pascal Mettes

Atsuyuki Miyai @UTokyo

Stedman Halliday

Andreas Kirsch 🇺🇦

Sumanth

Chip Huyen

wh

Ravid Shwartz Ziv

Andrej Karpathy

AI at Meta

Sebastian Raschka

Lei Li

Fazl Barez

Morgan Brown

NYU Center for Data Science

david

Pavlo Molchanov