Hossein Kashiani (@hossein_serein) 's Twitter Profile
Hossein Kashiani

@hossein_serein

PhD student. Interested in Computer Vision.

ID: 1268271506733309958

calendar_today03-06-2020 20:01:46

164 Tweet

128 Followers

2,2K Following

Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

Gave a tutorial at the Artificial Intelligence Summer School in Turkey yesterday. It's weird bc I can't see the audience via Zoom, and the jokes aren't even remotely funny. Check out the concise version here: youtube.com/watch?v=i2qSxM…

Gave a tutorial at the Artificial Intelligence Summer School in Turkey yesterday. 

It's weird bc I can't see the audience via Zoom, and the jokes aren't even remotely funny. 

Check out the concise version here: youtube.com/watch?v=i2qSxM…
Ruizhe Li (@liruizhe94) 's Twitter Profile Photo

The research area of interpretability in LLMs grows very fast, and it might be very hard for beginners to know where to start and for researchers to follow the latest research progress. I try to collect all relevant resources here: github.com/ruizheliUOA/Aw…

Pascal Mettes (@pascalmettes) 's Twitter Profile Photo

Vision-language models benefit from hyperbolic embeddings for standard tasks, but did you know that hyperbolic vision-language models also have surprising properties? Our new #TMLR paper shows 3 intriguing properties. w/ Sarah Ibrahimi Mina Ghadimi @nannevn marcel worring

Vision-language models benefit from hyperbolic embeddings for standard tasks, but did you know that hyperbolic vision-language models also have surprising properties?

Our new #TMLR paper shows 3 intriguing properties.

w/ <a href="/sarahibrahimi_/">Sarah Ibrahimi</a> <a href="/GhadimiAtigMina/">Mina Ghadimi</a> @nannevn <a href="/marcelworring/">marcel worring</a>
Atsuyuki Miyai @UTokyo (@atsumiyaiam) 's Twitter Profile Photo

How are OOD Detection, Open-set Recognition, Anomaly Detection, etc. evolving in the CLIP & GPT-4V eras?🤔 📣 Check our preprint survey on Generalized OOD detection v2! arxiv.org/abs/2407.21794 We encapsulate the evolution of these tasks in VLM era, identifying challenges!

How are OOD Detection, Open-set Recognition, Anomaly Detection, etc. evolving in the CLIP &amp; GPT-4V eras?🤔

📣 Check our preprint survey on Generalized OOD detection v2! 
arxiv.org/abs/2407.21794

We encapsulate the evolution of these tasks in VLM era, identifying challenges!
Andreas Kirsch 🇺🇦 (@blackhc) 's Twitter Profile Photo

Excited to publish a Python package that turns Andrej Karpathy's "A Recipe for Training Neural Networks" into easy-to-use diagnostics code! 🔧 No more randomly poking around in your custom PyTorch DNN to debug it. Get simple diagnostics for your neural nets 🫶 #PyTorch 1/

Excited to publish a Python package that turns <a href="/karpathy/">Andrej Karpathy</a>'s "A Recipe for Training Neural Networks" into easy-to-use diagnostics code! 🔧

No more randomly poking around in your custom <a href="/PyTorch/">PyTorch</a> DNN to debug it. 

Get simple diagnostics for your neural nets 🫶

#PyTorch 

1/
Chip Huyen (@chipro) 's Twitter Profile Photo

It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links. My editor just told me the manuscript has been sent to the printers. - The ebook will be coming out later this week. - Paperback copies should be available in a few weeks (hopefully

It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links.

My editor just told me the manuscript has been sent to the printers. 

- The ebook will be coming out later this week.
- Paperback copies should be available in a few weeks (hopefully
wh (@nrehiew_) 's Twitter Profile Photo

Another diffusion paper from ICLR 2025 with 8 8 8 8. tldr: they introduce shortcut models that can generate images under any amount of inference time compute including a single step

Another diffusion paper from ICLR 2025 with 8 8 8 8.

tldr: they introduce shortcut models that can generate images under any amount of inference time compute including a single step
Ravid Shwartz Ziv (@ziv_ravid) 's Twitter Profile Photo

Interested in representation in intermediate layers of LLMs? What is a good representation, and how can it be measured? Visit Oscar's poster on "Does Representation Matter? Exploring Intermediate Layers in Large Language Models"

Interested in representation in intermediate layers of LLMs? What is a good representation, and how can it be measured?  Visit Oscar's poster on "Does Representation Matter? Exploring Intermediate Layers in Large Language Models"
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being

AI at Meta (@aiatmeta) 's Twitter Profile Photo

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency &amp; robustness.

Paper ➡️ go.fb.me/w23lmz
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article. It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision: magazine.sebastianraschka.com/p/ai-research-…

Lei Li (@lileics) 's Twitter Profile Photo

Is DPO better than PPO? What are the important ingredients to properly train RLHF for LLM? How to train them efficiently? Yi Wu from Tsinghua University gave an excellent talk on Effective RL training for LLMs at CMU LTI before #NeurIPS2024. youtube.com/watch?v=T1SeqB… Language Technologies Institute | @CarnegieMellon

Fazl Barez (@fazlbarez) 's Twitter Profile Photo

🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨 Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇 Paper: arxiv.org/pdf/2501.04952 1/8

🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨

Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇

Paper: arxiv.org/pdf/2501.04952
1/8
Morgan Brown (@morganb) 's Twitter Profile Photo

🧵 Finally had a chance to dig into DeepSeek’s r1… Let me break down why DeepSeek's AI innovations are blowing people's minds (and possibly threatening Nvidia's $2T market cap) in simple terms...

NYU Center for Data Science (@nyudatascience) 's Twitter Profile Photo

Recent work by Md Rifat Arefin, Oscar Skean, & CDS' Ravid Shwartz Ziv and Yann LeCun shows intermediate layers in LLMs often outperform the final layer for downstream tasks. Using info theory & geometric analysis, they reveal why this happens & how it impacts models. nyudatascience.medium.com/middle-layers-…

david (@dav1d_bai) 's Twitter Profile Photo

New blog post! We can use inference-time compute to reduce hallucination rates in reasoning models by injecting an interruption token and sampling in parallel.(1/n)

New blog post! We can use inference-time compute to reduce hallucination rates in reasoning models by injecting an interruption token and sampling in parallel.(1/n)
Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025. Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025.

Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like