Xibin Zhou (@xibinbayeszhou) 's Twitter Profile
Xibin Zhou

@xibinbayeszhou

Ph.D. candidate in Westlake University in Hangzhou, China. Currently working on AI-assisted Biology.

ID: 1569321833593466880

calendar_today12-09-2022 13:47:49

61 Tweet

51 Followers

107 Following

fajie yuan (@duguyuan) 's Twitter Profile Photo

Excited to share ProTrec, a fast & accurate protein search tool! 30x/60x better seq-func/func-seq retrieval 100x faster than Foldseek & MMseq2 9 tasks: seq-stru, seq-func, struc-fun, etc. Beats ESM2 in 9/11 tasks Thanks to Sergey Ovchinnikov chentongwang biorxiv.org/content/10.110…

fajie yuan (@duguyuan) 's Twitter Profile Photo

Introducing ProTrek, a 3-modal PLM for protein seq, struc, and func: ✨ Trained on 40M protein-text pairs, 100x larger than ProteinCLIP, ProtST, ProteinCLAP 🚀 30x/60x better accuracy than ProtST, ProteinCLAP ⚡ 100x faster than Foldseek, MMseq2 for similar function searches

Introducing ProTrek, a 3-modal PLM for protein seq, struc, and func:

✨ Trained on 40M protein-text pairs, 100x larger than ProteinCLIP, ProtST, ProteinCLAP
🚀 30x/60x better accuracy than ProtST, ProteinCLAP
⚡ 100x faster than Foldseek, MMseq2 for similar function searches
fajie yuan (@duguyuan) 's Twitter Profile Photo

ProTrek supports 9 search tasks (see b): seq-struc, seq-func, struc-func, struc-seq, func-struc, func-seq, seq-seq, struc-struc, func-func. It's the most universal search tool, a great supplement to Foldseek & MMseq for similar function protein searches. 1/

ProTrek supports 9 search tasks (see b): seq-struc, seq-func, struc-func, struc-seq, func-struc, func-seq, seq-seq, struc-struc, func-func. It's the most universal search tool, a great supplement to Foldseek & MMseq for similar function protein searches. 1/
fajie yuan (@duguyuan) 's Twitter Profile Photo

ProTrek is also a universal protein language model, supporting diverse tasks like ESM2 and SaProt. It outperforms ESM2 in 9/11 downstream tasks. 2/

ProTrek is also a universal protein language model, supporting diverse tasks like ESM2 and SaProt. It outperforms ESM2 in 9/11 downstream tasks. 2/
fajie yuan (@duguyuan) 's Twitter Profile Photo

ProTrek was trained on the largest protein-text dataset: 14M precise pairs and 25M noisy pairs. Thanks to data scaling, it achieves 30x/60x improvement in seq-to-text & text-to-seq search over 2 recent models (ProtST, ProteinCLAP) that used only 500k pairs. 3/

ProTrek was trained on the largest protein-text dataset: 14M precise pairs and 25M noisy pairs. 
Thanks to data scaling, it achieves 30x/60x improvement in seq-to-text & text-to-seq search over 2 recent models (ProtST, ProteinCLAP) that used only 500k pairs. 3/
fajie yuan (@duguyuan) 's Twitter Profile Photo

ProTrek uses a max-inner product search (MIPS) algorithm, completing searches in billion-level databases in seconds—100x faster than Foldseek and MMseq2. It also shows significant accuracy improvements when searching for proteins with similar functions. 4/

ProTrek uses a max-inner product search (MIPS) algorithm, completing searches in billion-level databases in seconds—100x faster than Foldseek and MMseq2.

It also shows significant accuracy improvements when searching for proteins with similar functions. 4/
fajie yuan (@duguyuan) 's Twitter Profile Photo

🚀 We've released the model weights of ProTrek on Huggingface and GitHub: 🔗Git: github.com/westlake-repl/… 🔗Huggingface: huggingface.co/westlake-repl/… 🎉Check out the demo on the Swiss-prot database: huggingface.co/spaces/westlak… 🌐 Access to UniProt coming soon with more engineering! 5/

Xibin Zhou (@xibinbayeszhou) 's Twitter Profile Photo

Why o1-preview doesn't think any more??? the questions are counting r in strawberry and whether 9.11 is bigger than 9.9 or not. the answers are 2 and 9.11 is bigger. what's wrong with my o1?

Why o1-preview doesn't think any more??? the questions are counting r in strawberry and whether 9.11 is bigger than 9.9 or not. the answers are 2 and 9.11 is bigger. what's wrong with my o1?
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

"What Matters In Transformers?" is an interesting paper (arxiv.org/abs/2406.15786) that finds you can actually remove half of the attention layers in LLMs like Llama without noticeably reducing modeling performance. The concept is relatively simple. The authors delete attention

"What Matters In Transformers?" is an interesting paper (arxiv.org/abs/2406.15786) that finds you can actually remove half of the attention layers in LLMs like Llama without noticeably reducing modeling performance.

The concept is relatively simple. The authors delete attention
Dimitriadis Nikos @ ICLR (@nikdimitriadis) 's Twitter Profile Photo

Fine-tuning pre-trained models leads to catastrophic forgetting, gains on one task cause losses on others. These issues worsen in multi-task merging scenarios. Enter LiNeS 📈, a method to solve them with ease. 🔥 🌐: lines-merging.github.io 📜: arxiv.org/abs/2410.17146 🧵 1/11

Science Robotics (@scirobotics) 's Twitter Profile Photo

A new Viewpoint in Science #Robotics discusses how #automating physical tasks in the #laboratory could enable faster and safer scientific progress. bit.ly/4hvsg9Q

A new Viewpoint in Science #Robotics discusses how #automating physical tasks in the #laboratory could enable faster and safer scientific progress. bit.ly/4hvsg9Q
Jorge Bravo (@bravo_abad) 's Twitter Profile Photo

Unveiling chemical knowledge in Large Language Models Molecules remain vital to advances in medicine and materials, but designing or describing them can be time-intensive. Data-driven approaches now help propose candidate structures, predict properties, or interpret features.

Unveiling chemical knowledge in Large Language Models

Molecules remain vital to advances in medicine and materials, but designing or describing them can be time-intensive. Data-driven approaches now help propose candidate structures, predict properties, or interpret features.
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

HALOGEN is a comprehensive benchmark with automated verifiers that decomposes and analyzes LLM outputs into atomic facts to detect and classify hallucinations across diverse tasks. Methods in this Paper 🔧: → HALOGEN tests LLMs on 9 different domains like coding,

HALOGEN is a comprehensive benchmark with automated verifiers that decomposes and analyzes LLM outputs into atomic facts to detect and classify hallucinations across diverse tasks.

Methods in this Paper 🔧:

→ HALOGEN tests LLMs on 9 different domains like coding,
AK (@_akhaliq) 's Twitter Profile Photo

Google presents Evolving Deeper LLM Thinking Controlling for inference cost, we find that Mind Evolution significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks. In the TravelPlanner and Natural Plan

Google presents Evolving Deeper LLM Thinking

Controlling for inference cost, we find that Mind Evolution significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks. In the TravelPlanner and Natural Plan
fajie yuan (@duguyuan) 's Twitter Profile Photo

🚀 Exciting update! We've integrated ProTrek as the RAG for Evolla's online demo. Now, Evolla is even cooler—ready to tackle real-world protein function challenges! 🧬✨ Try it out: chat-protein.com GitHub: github.com/westlake-repl/… Feedback welcome!