Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile
Xinyan Velocity Yu

@xinyanvyu

#NLProc PhD @usc, bsms @uwcse | Previously @Meta @Microsoft @Pinterest | Doing random walks in Seattle | Opinions are my own.

ID: 1288519465756315648

linkhttps://velocitycavalry.github.io calendar_today29-07-2020 16:59:25

136 Tweet

915 Followers

804 Following

CLS (@chengleisi) 's Twitter Profile Photo

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.
Felix Hill (@felixhill84) 's Twitter Profile Photo

Do you work in AI? Do you find things uniquely stressful right now, like never before? Haver you ever suffered from a mental illness? Read my personal experience of those challenges here: docs.google.com/document/d/1aE…

Deqing Fu (@deqingfu) 's Twitter Profile Photo

It's cool to see Google DeepMind's new research to show similar findings as we did back in April. IsoBench (isobench.github.io, accepted to Conference on Language Modeling 2024) was curated to show the performance gap across modalities and multimodal models' preference over text modality.

It's cool to see <a href="/GoogleDeepMind/">Google DeepMind</a>'s new research to show similar findings as we did back in April. 

IsoBench (isobench.github.io, accepted to <a href="/COLM_conf/">Conference on Language Modeling</a> 2024) was curated to show the performance gap across modalities and multimodal models' preference over text modality.
Zhaofeng Wu @ ICLR (@zhaofeng_wu) 's Twitter Profile Photo

💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, & audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986

💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, &amp; audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986
Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

Like how we might have a semantic "hub" in our brain, we find models tend to process🤔non-English & even non-language data (text, code, images, audios.etc) in their dominant language, too! Thank you Zhaofeng Wu for the wonderful collaboration!

Xinyan Velocity Yu (@xinyanvyu) 's Twitter Profile Photo

Akari is extremely intelligent, caring, thoughtful, organized, and supportive. Her work is very impactful as well. Any school that hires her secures a gem!!

Zhaofeng Wu @ ICLR (@zhaofeng_wu) 's Twitter Profile Photo

We have released our code at github.com/ZhaofengWu/sem…. We hope that this could be useful for future studies understanding the how LMs work!

Ollie Liu (@olliezliu) 's Twitter Profile Photo

Introducing METAGENE-1🧬, an open-source 7B-parameter metagenomics foundation model pretrained on 1.5 trillion base pairs. Built for pandemic monitoring, pathogen detection, and biosurveillance, with SOTA results across many genomics tasks. 🧵1/

Introducing METAGENE-1🧬, an open-source 7B-parameter metagenomics foundation model pretrained on 1.5 trillion base pairs. Built for pandemic monitoring, pathogen detection, and biosurveillance, with SOTA results across many genomics tasks.
🧵1/
Zhaofeng Wu @ ICLR (@zhaofeng_wu) 's Twitter Profile Photo

To appear @ #ICLR2025! We show that LMs represent semantically-equiv. input across languages, modalities, etc similarly. This shared repr. space is structured by the LM's dominant language, which is also relevant to recent phenomena where LMs "think" in Chinese🀄️in Eng🔠 contexts

Ziyi Liu (@liuziyi93) 's Twitter Profile Photo

[1/x] Humans naturally understand implicit cultural values in conversation—but can LLMs do the same? We are excited to introduce CQ-Bench, a benchmark for evaluating LLMs’ cultural intelligence through dialogue. Details below 🧵👇

[1/x]
Humans naturally understand implicit cultural values in conversation—but can LLMs do the same? We are excited to introduce CQ-Bench, a benchmark for evaluating LLMs’ cultural intelligence through dialogue. Details below 🧵👇
Yung-Sung Chuang (@yungsungchuang) 's Twitter Profile Photo

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not! Our findings: ⚖️Standard rerankers outperform those w/ step-by-step reasoning! 🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯 👇But, why? 📰arxiv.org/abs/2505.16886 (1/6)

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not!

Our findings:
⚖️Standard rerankers outperform those w/ step-by-step reasoning!
🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯
👇But, why?

📰arxiv.org/abs/2505.16886

(1/6)
Kaiser Sun (@kaiserwholearns) 's Twitter Profile Photo

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑
TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8
#NLProc #LLM #AIResearch