ExpeL is now accepted at #AAAI24 ! The code and camera ready version will be updated promptly. Thanks for all the collaborators and see you in Vancouver!
Check us out at #NeurIPS2023 poster!We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…
Our recent work: Agent Attention!
[High Performance & Linear Complexity]
[Accelerate and improve Stable Diffusion, no additional fine-tuning is required]
The paper and code have been released:
arxiv.org/abs/2312.08874
github.com/LeapLabTHU/Age…
Excited to share our #NeurIPS2023 spotlight paper! 🌟 It proposes a novel offline-to-online RL algorithm, efficiently utilizing collected samples by training a family of policies offline and selecting suitable ones online. Check out our paper for details! arxiv.org/abs/2310.17966
Our recent work: Agent Attention!
[High Performance & Linear Complexity]
[Double the speed of SD and enhance generation quality, no additional fine-tuning is required]
Paper and code:
huggingface.co/papers/2312.08…
arxiv.org/abs/2312.08874
github.com/LeapLabTHU/Age…
EfficientTrain++ is accepted by TPAMI2024🤩
🔥An off-the-shelf, easy-to-implement algorithm for training foundation visual backbones efficiently!
🔥1.5−3.0× lossless training/pre-training speedup on ImageNet-1K/22K!
Paper&Code:
arxiv.org/abs/2405.08768
github.com/LeapLabTHU/Eff…
📢Excited to share our recent work on Large Multimodal Models: ConvLLaVA. Without the encoding multiple image patches and multiple encoders, we use a hierarchical backbone, ConvNeXt, realizing high resolution understanding.
arxiv.org/pdf/2405.15738
ConvLLaVA
Hierarchical Backbones as Visual Encoder for Large Multimodal Models
High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic