Xibin Zhou (@xibinbayeszhou) Twitter Tweets • TwiCopy

fajie yuan

a year ago

Excited to share ProTrec, a fast & accurate protein search tool! 30x/60x better seq-func/func-seq retrieval 100x faster than Foldseek & MMseq2 9 tasks: seq-stru, seq-func, struc-fun, etc. Beats ESM2 in 9/11 tasks Thanks to Sergey Ovchinnikov chentongwang biorxiv.org/content/10.110…

thumb_up_off_alt59

chat_bubble_outline0

repeat16

shareShare

fajie yuan

@duguyuan

a year ago

Introducing ProTrek, a 3-modal PLM for protein seq, struc, and func: ✨ Trained on 40M protein-text pairs, 100x larger than ProteinCLIP, ProtST, ProteinCLAP 🚀 30x/60x better accuracy than ProtST, ProteinCLAP ⚡ 100x faster than Foldseek, MMseq2 for similar function searches

thumb_up_off_alt84

chat_bubble_outline7

repeat30

shareShare

fajie yuan

@duguyuan

a year ago

ProTrek supports 9 search tasks (see b): seq-struc, seq-func, struc-func, struc-seq, func-struc, func-seq, seq-seq, struc-struc, func-func. It's the most universal search tool, a great supplement to Foldseek & MMseq for similar function protein searches. 1/

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

fajie yuan

@duguyuan

a year ago

ProTrek is also a universal protein language model, supporting diverse tasks like ESM2 and SaProt. It outperforms ESM2 in 9/11 downstream tasks. 2/

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

fajie yuan

@duguyuan

a year ago

ProTrek was trained on the largest protein-text dataset: 14M precise pairs and 25M noisy pairs. Thanks to data scaling, it achieves 30x/60x improvement in seq-to-text & text-to-seq search over 2 recent models (ProtST, ProteinCLAP) that used only 500k pairs. 3/

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

fajie yuan

@duguyuan

a year ago

ProTrek uses a max-inner product search (MIPS) algorithm, completing searches in billion-level databases in seconds—100x faster than Foldseek and MMseq2. It also shows significant accuracy improvements when searching for proteins with similar functions. 4/

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

fajie yuan

@duguyuan

a year ago

🚀 We've released the model weights of ProTrek on Huggingface and GitHub: 🔗Git: github.com/westlake-repl/… 🔗Huggingface: huggingface.co/westlake-repl/… 🎉Check out the demo on the Swiss-prot database: huggingface.co/spaces/westlak… 🌐 Access to UniProt coming soon with more engineering! 5/

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Xibin Zhou

@xibinbayeszhou

a year ago

Why o1-preview doesn't think any more??? the questions are counting r in strawberry and whether 9.11 is bigger than 9.9 or not. the answers are 2 and 9.11 is bigger. what's wrong with my o1?

thumb_up_off_alt0

chat_bubble_outline1

repeat1

shareShare

Sebastian Raschka

@rasbt

a year ago

"What Matters In Transformers?" is an interesting paper (arxiv.org/abs/2406.15786) that finds you can actually remove half of the attention layers in LLMs like Llama without noticeably reducing modeling performance. The concept is relatively simple. The authors delete attention

thumb_up_off_alt2,2K

chat_bubble_outline72

repeat510

shareShare

Kevin K. Yang 楊凱筌

@kevinkaichuang

a year ago

Our interns can do all sorts of non-llm things! (and llm things if they want!) jobs.careers.microsoft.com/global/en/job/…

thumb_up_off_alt40

chat_bubble_outline1

repeat3

shareShare

Dimitriadis Nikos @ ICLR

@nikdimitriadis

a year ago

Fine-tuning pre-trained models leads to catastrophic forgetting, gains on one task cause losses on others. These issues worsen in multi-task merging scenarios. Enter LiNeS 📈, a method to solve them with ease. 🔥 🌐: lines-merging.github.io 📜: arxiv.org/abs/2410.17146 🧵 1/11

thumb_up_off_alt256

chat_bubble_outline6

repeat60

shareShare

bioRxiv Bioinfo

@biorxiv_bioinfo

a year ago

AI-readiness for Biomedical Data: Bridge2AI Recommendations biorxiv.org/cgi/content/sh… #biorxiv_bioinfo

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

bioRxiv Bioinfo

@biorxiv_bioinfo

a year ago

Efficient Inference, Training, and Fine-tuning of Protein Language Models biorxiv.org/cgi/content/sh… #biorxiv_bioinfo

thumb_up_off_alt28

chat_bubble_outline0

repeat6

shareShare

Science Robotics

@scirobotics

a year ago

A new Viewpoint in Science #Robotics discusses how #automating physical tasks in the #laboratory could enable faster and safer scientific progress. bit.ly/4hvsg9Q

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Jorge Bravo

@bravo_abad

10 months ago

Unveiling chemical knowledge in Large Language Models Molecules remain vital to advances in medicine and materials, but designing or describing them can be time-intensive. Data-driven approaches now help propose candidate structures, predict properties, or interpret features.

thumb_up_off_alt89

chat_bubble_outline1

repeat22

shareShare

Rohan Paul

@rohanpaul_ai

10 months ago

HALOGEN is a comprehensive benchmark with automated verifiers that decomposes and analyzes LLM outputs into atomic facts to detect and classify hallucinations across diverse tasks. Methods in this Paper 🔧: → HALOGEN tests LLMs on 9 different domains like coding,

thumb_up_off_alt55

chat_bubble_outline5

repeat14

shareShare

AK

@_akhaliq

10 months ago

Google presents Evolving Deeper LLM Thinking Controlling for inference cost, we find that Mind Evolution significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks. In the TravelPlanner and Natural Plan

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat209

shareShare

fajie yuan

@duguyuan

5 months ago

🚀 Exciting update! We've integrated ProTrek as the RAG for Evolla's online demo. Now, Evolla is even cooler—ready to tackle real-world protein function challenges! 🧬✨ Try it out: chat-protein.com GitHub: github.com/westlake-repl/… Feedback welcome!

thumb_up_off_alt23

chat_bubble_outline1

repeat2

shareShare