Yujie Qian (@yujie_qian) 's Twitter Profile
Yujie Qian

@yujie_qian

Founding Research Scientist @VoyageAI | PhD @MIT_CSAIL

ID: 747802915407011840

linkhttp://people.csail.mit.edu/yujieq/ calendar_today28-06-2016 14:44:43

62 Tweet

292 Followers

201 Following

MIT Jameel Clinic for AI & Health (@aihealthmit) 's Twitter Profile Photo

Want to train chemistry prediction models but struggling w/ data extraction? Yujie Qian, Zed Li et al. propose TextReact, which retrieves text descriptions relevant for a chem. reaction & aligns them w/ their molecular representation #EMNLP2023 arxiv.org/abs/2312.04881

Want to train chemistry prediction models but struggling w/ data extraction? <a href="/Yujie_Qian/">Yujie Qian</a>, <a href="/zli11010/">Zed Li</a> et al. propose TextReact, which retrieves text descriptions relevant for a chem. reaction &amp; aligns them w/ their molecular representation #EMNLP2023 arxiv.org/abs/2312.04881
Tengyu Ma (@tengyuma) 's Twitter Profile Photo

OpenAI’s embedding v3 were out 🎉! Curious about its quality? We tested on 11 code retrieval datasets & 9 industry-domain datasets: 1. OpenAI v3 > ada-002 & cohere (except v3-small on code) 2. voyage-code-2 is the best with + 14% margin on code & + 3% on industry docs 🚀

OpenAI’s embedding v3 were out 🎉! Curious about its quality? We tested on 11 code retrieval datasets &amp; 9 industry-domain datasets:

1. <a href="/OpenAI/">OpenAI</a> v3 &gt; ada-002 &amp; cohere (except v3-small on code)
2. voyage-code-2 is the best with + 14% margin on code &amp; + 3% on industry docs 🚀
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

Rerankers refine the retrieval in RAG.  🆕📢 Excited to announce our first reranker, rerank-lite-1: state-of-the-art in retrieval accuracy on 27 datasets across domains (law, finance, tech, long docs, etc.), enhancing various search methods, vector-based or lexical. 🧵

Rerankers refine the retrieval in RAG. 

🆕📢 Excited to announce our first reranker, rerank-lite-1: state-of-the-art in retrieval accuracy on 27 datasets across domains (law, finance, tech, long docs, etc.), enhancing various search methods, vector-based or lexical. 🧵
Tengyu Ma (@tengyuma) 's Twitter Profile Photo

🆕📢 Voyage_AI_'s new embedding model for legal and long-context retrieval and RAG: voyage-law-2! 1.🥇 # 1 on MTEB legal retrieval benchmark with a large margin 2.📜 Best quality for long-context (16K)  3.✨ Improved quality across domains 4.🛒 On AWS Marketplace  #RAG #LLMs

🆕📢 <a href="/Voyage_AI_/">Voyage_AI_</a>'s new embedding model for legal and long-context retrieval and RAG: voyage-law-2!

1.🥇 # 1 on MTEB legal retrieval benchmark with a large margin
2.📜 Best quality for long-context (16K) 
3.✨ Improved quality across domains
4.🛒 On AWS Marketplace 

#RAG #LLMs
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

🆕 📢 voyage-large-2-instruct embedding model tops the MTEB leaderboard 🥇! huggingface.co/spaces/mteb/le… — embedding dimension = 1024 (4x smaller than any other non-Voyage model in top-5) — 16K context length (2x of OpenAI v3 large) blog: blog.voyageai.com/2024/05/05/voy… #RAG #LLM

🆕 📢  voyage-large-2-instruct embedding model tops the MTEB leaderboard 🥇! huggingface.co/spaces/mteb/le…

— embedding dimension = 1024 (4x smaller than any other non-Voyage model in top-5)
— 16K context length (2x of OpenAI v3 large)

blog: blog.voyageai.com/2024/05/05/voy…

#RAG #LLM
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

🆕📢 We are thrilled to launch rerank-1, our best general-purpose and multilingual reranker! It refines the ranking of your search results with cross-encoder transformers. It outperforms Cohere's english-v3 on English datasets and multilingual-v3 on multilingual datasets 🚀.

🆕📢 We are thrilled to launch rerank-1, our best general-purpose and multilingual reranker! It refines the ranking of your search results with cross-encoder transformers.

It outperforms Cohere's english-v3 on English datasets and multilingual-v3 on multilingual datasets 🚀.
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

🆕 📢 Launching Voyage AI’s new embedding model for finance retrieval and RAG: voyage-finance-2! 1. ✨ Superior finance retrieval quality with an average of 7% gain over OpenAI and 12% over cohere. 2. 📚 32K context length 3. 🛒 On AWS Marketplace #RAG #LLM

🆕 📢 Launching Voyage AI’s new embedding model for finance retrieval and RAG: voyage-finance-2!

1. ✨ Superior finance retrieval quality with an average of 7% gain over <a href="/OpenAI/">OpenAI</a> and 12% over <a href="/cohere/">cohere</a>.
2. 📚 32K context length
3. 🛒 On AWS Marketplace

#RAG #LLM
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

🌍📢 Launching our multilingual embeddings, voyage-multilingual-2! 👑 Average 5.6% gain on evaluated languages, including French, German, Japanese, Spanish, and Korean 📚 32K context length 🛒 On AWS Marketplace Check us out! 👉🏼 The first 50M tokens are on us. #RAG #LLM

🌍📢 Launching our multilingual embeddings, voyage-multilingual-2!

👑 Average 5.6% gain on evaluated languages, including French, German, Japanese, Spanish, and Korean
📚 32K context length
🛒 On AWS Marketplace

Check us out! 👉🏼 The first 50M tokens are on us.
#RAG #LLM
JCIM & JCTC Journals (@jcim_jctc) 's Twitter Profile Photo

OpenChemIE: An Information Extraction Toolkit for Chemistry Literature pubs.acs.org/doi/10.1021/ac… Yujie Qian Connor W. Coley Regina Barzilay #JCIM Vol64 Issue14 #MachineLearning #DeepLearning

Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

📢 Announcing a new generation of Voyage embedding models: voyage-3 and voyage-3-lite! When compared with OpenAI's v3 large: voyage-3: + 7.5% accuracy, 2.2× cheaper, 3× smaller embeddings, 4× context voyage-3-lite: + 3.8% accuracy, 6× cheaper, 6× smaller embeddings, 4× context

📢 Announcing a new generation of Voyage embedding models: voyage-3 and voyage-3-lite!

When compared with <a href="/OpenAI/">OpenAI</a>'s v3 large:
voyage-3: + 7.5% accuracy, 2.2× cheaper, 3× smaller embeddings, 4× context
voyage-3-lite: + 3.8% accuracy, 6× cheaper, 6× smaller embeddings, 4× context
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

📢 Announcing a new generation of natively multilingual Voyage rerankers: rerank-2 and rerank-2-lite! Adding them on top of OpenAI's latest v3 large embeddings improves accuracy by 13.89% and 11.86%, 2.3x and 1.7x the gain attained by latest cohere reranker (English v3), resp.

📢 Announcing a new generation of natively multilingual Voyage rerankers: rerank-2 and rerank-2-lite!

Adding them on top of <a href="/OpenAI/">OpenAI</a>'s latest v3 large embeddings improves accuracy by 13.89% and 11.86%, 2.3x and 1.7x the gain attained by latest <a href="/cohere/">cohere</a> reranker (English v3), resp.
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

Thrilled to share that we've closed $28M in funding, led by CRV, with continued support from Wing VC and sarah guo // conviction. Also excited to onboard strategic partners SnowflakeDB and Databricks! voyage.ai Building the world’s best models for RAG and search 🧵🧵🧵:

Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

📢 Announcing voyage-multimodal-3, our first multimodal embedding model! It vectorizes interleaved text & images, capturing key visual features from screenshots of PDFs, slides, tables, figures, etc. +19.63% accuracy gain on 3 multimodal retrieval tasks (20 datasets)! 🧵🧵

📢 Announcing voyage-multimodal-3, our first multimodal embedding model!

It vectorizes interleaved text &amp; images, capturing key visual features from screenshots of PDFs, slides, tables, figures, etc. 

+19.63% accuracy gain on 3 multimodal retrieval tasks (20 datasets)! 🧵🧵
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

Vector-based code retrieval is a critical building block in code assistants and agents. However, many people complained about the lack of diverse, high-quality evaluation datasets for it. We surveyed existing ones and proposed some methods to build better ones. 🧵🧵

Vector-based code retrieval is a critical building block in code assistants and agents. However, many people complained about the lack of diverse, high-quality evaluation datasets for it. We surveyed existing ones and proposed some methods to build better ones. 🧵🧵
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

📢 Announcing voyage-code-3 embedding model! 1. more accurate: + 14% gain over OpenAI-v3-large 2. flexible dimension (Matryoshka): 256-2048 3. quantized embeddings: float, int8, binary 4. new Pareto frontier: (binary,256 dim.) is 6% better than OpenAI (float,3072 dim.) 🧵🧵

📢 Announcing voyage-code-3 embedding model!

1. more accurate: + 14% gain over OpenAI-v3-large
2. flexible dimension (Matryoshka): 256-2048
3. quantized embeddings: float, int8, binary
4. new Pareto frontier: (binary,256 dim.) is 6% better than OpenAI (float,3072 dim.) 🧵🧵
Jiaxuan You (@youjiaxuan) 's Twitter Profile Photo

Shame on the MIT Media Lab Professor. Racist activities shouldn't have their places at a top AI conference NeurIPS Conference. Imagine changing "Chinese" to other races or ethic groups, she will probably be fired by Massachusetts Institute of Technology (MIT). This is not for Chinese. Who knows who will be the next victim?

Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

📢 Announcing the new SOTA voyage-3-large embedding model! • 9.74% over OpenAI and +20.71% over Cohere • flexible dim. (256-2048) and quantizations (float, int8, binary) • 8.56% over OpenAI with 1/24x storage cost • 1.16% over OpenAI with 1/192x storage cost ($10K → $52)

📢 Announcing the new SOTA voyage-3-large embedding model!

• 9.74% over OpenAI and +20.71% over Cohere
• flexible dim. (256-2048) and quantizations (float, int8, binary)
• 8.56% over OpenAI with 1/24x storage cost
• 1.16% over OpenAI with 1/192x storage cost ($10K → $52)
Tengyu Ma (@tengyuma) 's Twitter Profile Photo

We joined MongoDB! Voyage AI by MongoDB’s best-in-class embedding models and rerankers will be part of MongoDB’s best-in-class database, powering mission-critical AI applications with high-quality semantic retrieval capability. A huge thank you to everyone with us on this journey, and to

We joined <a href="/MongoDB/">MongoDB</a>! <a href="/VoyageAI/">Voyage AI by MongoDB</a>’s best-in-class embedding models and rerankers will be part of MongoDB’s best-in-class database, powering mission-critical AI applications with high-quality semantic retrieval capability.

A huge thank you to everyone with us on this journey, and to
Voyage AI (part of MongoDB) (@voyageai) 's Twitter Profile Photo

📢 Meet voyage-3.5 and voyage-3.5-lite! • flexible dim. and quantizations • voyage-3.5 & 3.5-lite (int8, 2048 dim.) are 8% & 6% more accurate than OpenAI-v3-large, and 2.2x & 6.5x cheaper, resp. Also 83% less vectorDB cost! • 3.5-lite ~ Cohere-v4 in quality, but 83% cheaper.

📢 Meet voyage-3.5 and voyage-3.5-lite!
• flexible dim. and quantizations
• voyage-3.5 &amp; 3.5-lite (int8, 2048 dim.) are 8% &amp; 6% more accurate than OpenAI-v3-large, and 2.2x &amp; 6.5x cheaper, resp. Also 83% less vectorDB cost! 
• 3.5-lite ~ Cohere-v4 in quality, but 83% cheaper.