Yujie Qian (@yujie_qian) Twitter Tweets • TwiCopy

MIT Jameel Clinic for AI & Health

2 years ago

Want to train chemistry prediction models but struggling w/ data extraction? Yujie Qian, Zed Li et al. propose TextReact, which retrieves text descriptions relevant for a chem. reaction & aligns them w/ their molecular representation #EMNLP2023 arxiv.org/abs/2312.04881

Want to train chemistry prediction models but struggling w/ data extraction? <a href="/Yujie_Qian/">Yujie Qian</a>, <a href="/zli11010/">Zed Li</a> et al. propose TextReact, which retrieves text descriptions relevant for a chem. reaction & aligns them w/ their molecular representation #EMNLP2023 arxiv.org/abs/2312.04881

thumb_up_off_alt17

chat_bubble_outline1

repeat6

shareShare

Tengyu Ma

@tengyuma

2 years ago

OpenAI’s embedding v3 were out 🎉! Curious about its quality? We tested on 11 code retrieval datasets & 9 industry-domain datasets: 1. OpenAI v3 > ada-002 & cohere (except v3-small on code) 2. voyage-code-2 is the best with + 14% margin on code & + 3% on industry docs 🚀

OpenAI’s embedding v3 were out 🎉! Curious about its quality? We tested on 11 code retrieval datasets & 9 industry-domain datasets:

1. <a href="/OpenAI/">OpenAI</a> v3 > ada-002 & cohere (except v3-small on code)
2. voyage-code-2 is the best with + 14% margin on code & + 3% on industry docs 🚀

thumb_up_off_alt126

chat_bubble_outline6

repeat22

shareShare

Voyage AI (part of MongoDB)

@voyageai

2 years ago

Rerankers refine the retrieval in RAG. 🆕📢 Excited to announce our first reranker, rerank-lite-1: state-of-the-art in retrieval accuracy on 27 datasets across domains (law, finance, tech, long docs, etc.), enhancing various search methods, vector-based or lexical. 🧵

thumb_up_off_alt58

chat_bubble_outline4

repeat10

shareShare

Tengyu Ma

@tengyuma

2 years ago

🆕📢 Voyage_AI_'s new embedding model for legal and long-context retrieval and RAG: voyage-law-2! 1.🥇 # 1 on MTEB legal retrieval benchmark with a large margin 2.📜 Best quality for long-context (16K) 3.✨ Improved quality across domains 4.🛒 On AWS Marketplace #RAG #LLMs

🆕📢 <a href="/Voyage_AI_/">Voyage_AI_</a>'s new embedding model for legal and long-context retrieval and RAG: voyage-law-2!

1.🥇 # 1 on MTEB legal retrieval benchmark with a large margin
2.📜 Best quality for long-context (16K)
3.✨ Improved quality across domains
4.🛒 On AWS Marketplace

#RAG #LLMs

thumb_up_off_alt92

chat_bubble_outline4

repeat25

shareShare

Voyage AI (part of MongoDB)

@voyageai

2 years ago

🆕 📢 voyage-large-2-instruct embedding model tops the MTEB leaderboard 🥇! huggingface.co/spaces/mteb/le… — embedding dimension = 1024 (4x smaller than any other non-Voyage model in top-5) — 16K context length (2x of OpenAI v3 large) blog: blog.voyageai.com/2024/05/05/voy… #RAG #LLM

thumb_up_off_alt30

chat_bubble_outline5

repeat10

shareShare

Voyage AI (part of MongoDB)

@voyageai

2 years ago

🆕📢 We are thrilled to launch rerank-1, our best general-purpose and multilingual reranker! It refines the ranking of your search results with cross-encoder transformers. It outperforms Cohere's english-v3 on English datasets and multilingual-v3 on multilingual datasets 🚀.

thumb_up_off_alt57

chat_bubble_outline2

repeat8

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

🆕 📢 Launching Voyage AI’s new embedding model for finance retrieval and RAG: voyage-finance-2! 1. ✨ Superior finance retrieval quality with an average of 7% gain over OpenAI and 12% over cohere. 2. 📚 32K context length 3. 🛒 On AWS Marketplace #RAG #LLM

thumb_up_off_alt44

chat_bubble_outline3

repeat13

shareShare

Yujie Qian

@yujie_qian

a year ago

The most severe academic misconduct I’ve seen in recent years

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

🌍📢 Launching our multilingual embeddings, voyage-multilingual-2! 👑 Average 5.6% gain on evaluated languages, including French, German, Japanese, Spanish, and Korean 📚 32K context length 🛒 On AWS Marketplace Check us out! 👉🏼 The first 50M tokens are on us. #RAG #LLM

thumb_up_off_alt26

chat_bubble_outline3

repeat7

shareShare

JCIM & JCTC Journals

@jcim_jctc

a year ago

OpenChemIE: An Information Extraction Toolkit for Chemistry Literature pubs.acs.org/doi/10.1021/ac… Yujie Qian Connor W. Coley Regina Barzilay #JCIM Vol64 Issue14 #MachineLearning #DeepLearning

thumb_up_off_alt38

chat_bubble_outline0

repeat8

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

📢 Announcing a new generation of Voyage embedding models: voyage-3 and voyage-3-lite! When compared with OpenAI's v3 large: voyage-3: + 7.5% accuracy, 2.2× cheaper, 3× smaller embeddings, 4× context voyage-3-lite: + 3.8% accuracy, 6× cheaper, 6× smaller embeddings, 4× context

📢 Announcing a new generation of Voyage embedding models: voyage-3 and voyage-3-lite!

When compared with <a href="/OpenAI/">OpenAI</a>'s v3 large:
voyage-3: + 7.5% accuracy, 2.2× cheaper, 3× smaller embeddings, 4× context
voyage-3-lite: + 3.8% accuracy, 6× cheaper, 6× smaller embeddings, 4× context

thumb_up_off_alt110

chat_bubble_outline4

repeat27

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

📢 Announcing a new generation of natively multilingual Voyage rerankers: rerank-2 and rerank-2-lite! Adding them on top of OpenAI's latest v3 large embeddings improves accuracy by 13.89% and 11.86%, 2.3x and 1.7x the gain attained by latest cohere reranker (English v3), resp.

📢 Announcing a new generation of natively multilingual Voyage rerankers: rerank-2 and rerank-2-lite!

Adding them on top of <a href="/OpenAI/">OpenAI</a>'s latest v3 large embeddings improves accuracy by 13.89% and 11.86%, 2.3x and 1.7x the gain attained by latest <a href="/cohere/">cohere</a> reranker (English v3), resp.

thumb_up_off_alt31

chat_bubble_outline1

repeat10

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

Thrilled to share that we've closed $28M in funding, led by CRV, with continued support from Wing VC and sarah guo // conviction. Also excited to onboard strategic partners SnowflakeDB and Databricks! voyage.ai Building the world’s best models for RAG and search 🧵🧵🧵:

thumb_up_off_alt147

chat_bubble_outline10

repeat35

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

📢 Announcing voyage-multimodal-3, our first multimodal embedding model! It vectorizes interleaved text & images, capturing key visual features from screenshots of PDFs, slides, tables, figures, etc. +19.63% accuracy gain on 3 multimodal retrieval tasks (20 datasets)! 🧵🧵

thumb_up_off_alt167

chat_bubble_outline5

repeat37

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

Vector-based code retrieval is a critical building block in code assistants and agents. However, many people complained about the lack of diverse, high-quality evaluation datasets for it. We surveyed existing ones and proposed some methods to build better ones. 🧵🧵

thumb_up_off_alt31

chat_bubble_outline1

repeat14

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

📢 Announcing voyage-code-3 embedding model! 1. more accurate: + 14% gain over OpenAI-v3-large 2. flexible dimension (Matryoshka): 256-2048 3. quantized embeddings: float, int8, binary 4. new Pareto frontier: (binary,256 dim.) is 6% better than OpenAI (float,3072 dim.) 🧵🧵

thumb_up_off_alt60

chat_bubble_outline3

repeat10

shareShare

Jiaxuan You

@youjiaxuan

a year ago

Shame on the MIT Media Lab Professor. Racist activities shouldn't have their places at a top AI conference NeurIPS Conference. Imagine changing "Chinese" to other races or ethic groups, she will probably be fired by Massachusetts Institute of Technology (MIT). This is not for Chinese. Who knows who will be the next victim?

thumb_up_off_alt557

chat_bubble_outline9

repeat36

shareShare

Voyage AI (part of MongoDB)

@voyageai

a year ago

📢 Announcing the new SOTA voyage-3-large embedding model! • 9.74% over OpenAI and +20.71% over Cohere • flexible dim. (256-2048) and quantizations (float, int8, binary) • 8.56% over OpenAI with 1/24x storage cost • 1.16% over OpenAI with 1/192x storage cost ($10K → $52)

thumb_up_off_alt187

chat_bubble_outline3

repeat35

shareShare

Tengyu Ma

@tengyuma

9 months ago

We joined MongoDB! Voyage AI by MongoDB’s best-in-class embedding models and rerankers will be part of MongoDB’s best-in-class database, powering mission-critical AI applications with high-quality semantic retrieval capability. A huge thank you to everyone with us on this journey, and to

We joined <a href="/MongoDB/">MongoDB</a>! <a href="/VoyageAI/">Voyage AI by MongoDB</a>’s best-in-class embedding models and rerankers will be part of MongoDB’s best-in-class database, powering mission-critical AI applications with high-quality semantic retrieval capability.

A huge thank you to everyone with us on this journey, and to

thumb_up_off_alt365

chat_bubble_outline55

repeat25

shareShare

Voyage AI (part of MongoDB)

@voyageai

6 months ago

📢 Meet voyage-3.5 and voyage-3.5-lite! • flexible dim. and quantizations • voyage-3.5 & 3.5-lite (int8, 2048 dim.) are 8% & 6% more accurate than OpenAI-v3-large, and 2.2x & 6.5x cheaper, resp. Also 83% less vectorDB cost! • 3.5-lite ~ Cohere-v4 in quality, but 83% cheaper.

thumb_up_off_alt33

chat_bubble_outline1

repeat11

shareShare