Xiaodong Liu (@allenlao) 's Twitter Profile
Xiaodong Liu

@allenlao

Deep Learning and NLP Researcher: interested in machine learning, nlp, dog and cat. Opinions are my own.

ID: 535040353

linkhttps://github.com/namisan calendar_today24-03-2012 05:43:31

150 Tweet

470 Followers

254 Following

Xiaodong Liu (@allenlao) 's Twitter Profile Photo

Our recent work on large LM pretraining obtains SOTA on both GLUE/SuperGLUE. Notably, it first achieved human parity on MNLI and RTE on GLUE, the last two GLUE tasks which human parity had not yet met. I'm clearing up codes for SuperGLUE and will release them once it is done.

Microsoft Research (@msftresearch) 's Twitter Profile Photo

Current benchmarks may yield imprecise readings of AI models’ natural language understanding. Two new NLU benchmarks aim for more accurate evaluations. #NeurIPS2021 msft.it/6003kf7QR

Sebastien Bubeck (@sebastienbubeck) 's Twitter Profile Photo

The Algorithms team at MSR Redmond is looking for someone with hands-on experience in NLP and deep learning tools to implement and develop optimization of differentially private learning algorithms! Great opportunity to work with a fantastic team. careers.microsoft.com/us/en/job/1260…

Microsoft Research (@msftresearch) 's Twitter Profile Photo

When a neural network is too large to pretrain more than once, tuning its hyperparameters is practically impossible. Today, we announce μTransfer—a new technique that can tune the 6.7 billion parameter GPT-3 model using only 7% of the pretraining compute: msft.it/6009wwxJD

Greg Yang (@thegregyang) 's Twitter Profile Photo

1/ You can't train GPT-3 on a single GPU, much less tune its hyperparameters (HPs). But what if I tell you… …you *can* tune its HPs on a single GPU thanks to new theoretical advances? paper arxiv.org/abs/2203.03466 code github.com/microsoft/mup blog microsoft.com/en-us/research…

1/ You can't train GPT-3 on a single GPU, much less tune its hyperparameters (HPs).

But what if I tell you…

…you *can* tune its HPs on a single GPU thanks to new theoretical advances?

paper arxiv.org/abs/2203.03466
code github.com/microsoft/mup
blog microsoft.com/en-us/research…
Databricks Mosaic Research (@dbrxmosaicai) 's Twitter Profile Photo

Today, an exciting paper from Microsoft Research: Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer arxiv.org/abs/2203.03466 While it's too early to say, this may be remembered as the single biggest efficiency advancement in hyperparameter tuning.

Today, an exciting paper from <a href="/MSFTResearch/">Microsoft Research</a>: 
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
arxiv.org/abs/2203.03466

While it's too early to say, this may be remembered as the single biggest efficiency advancement in hyperparameter tuning.
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access: ai.facebook.com/blog/democrati…

Nathan Benaich (@nathanbenaich) 's Twitter Profile Photo

🤓In 2017, Google researchers introduced the Transformer in "Attention is all you need", which took AI by storm. 5 startups were born: Adept (🏦 Air Street Capital), Inceptive, NEAR Protocol, @CohereAI, CharacterAI. Only 1/8 authors remain Google AI, another is at OpenAI. 😉

🤓In 2017, Google researchers introduced the Transformer in "Attention is all you need", which took AI by storm. 

5 startups were born: <a href="/AdeptAILabs/">Adept</a> (🏦 <a href="/airstreet/">Air Street Capital</a>), Inceptive, <a href="/NEARProtocol/">NEAR Protocol</a>, @CohereAI, CharacterAI.

Only 1/8 authors remain <a href="/GoogleAI/">Google AI</a>, another is at <a href="/OpenAI/">OpenAI</a>. 

😉
Yaqing Wang (@yaqing_wang) 's Twitter Profile Photo

🚨[New Paper] Check out our recent work on parameter-efficient fine-tuning. We introduce a new method to boost the performance of Adapter to outperform full model fine-tuning. Great collaboration with Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Awadallah and Jianfeng Gao.

William Fedus (@liamfedus) 's Twitter Profile Photo

Today we're releasing all Switch Transformer models in T5X/JAX, including the 1.6T param Switch-C and the 395B param Switch-XXL models. Pleased to have these open-sourced! github.com/google-researc… All thanks to the efforts of James Lee-Thorp, Adam Roberts, and Hyung Won Chung

MMitchell (@mmitchell_ai) 's Twitter Profile Photo

Q: ACM FAccT (main AI Ethics conf) was $10,000 short. They also turned down Google sponsorship due to G's continued refusal to address structural discrimination & trauma to me & @timnitGebru (@dair-community.social/bsky.social) specifically. Is there any issue w/ me starting a GoFundMe to make up the diff?

Tuo Zhao (@tourzhao) 's Twitter Profile Photo

Need scalable and efficient large language models for long sequences? Check our SPADE models in arxiv.org/abs/2212.08136. By leveraging a state space layer, SPADE complements the lack of long-range dependency issue in transformer models using local attentions. (1/3)

Need scalable and efficient large language models for long sequences? Check our SPADE models in arxiv.org/abs/2212.08136. By leveraging a state space layer, SPADE complements the lack of long-range dependency issue in transformer models using local attentions. (1/3)
Yann LeCun (@ylecun) 's Twitter Profile Photo

LLMs are still making sh*t up. That's fine if you use them as writing assistants. Not good as question answerers, search engines, etc. RLHF merely mitigates the most frequent mistakes without actually fixing the problem.

Jeff Dean (@jeffdean) 's Twitter Profile Photo

Bard is now available in the US and UK, w/more countries to come. It’s great to see early Google AI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at bard.google.com

Rada Mihalcea (@radamihalcea) 's Twitter Profile Photo

Drago loved his family and was a deeply caring father. His daughter, Victoria has a disability and requires extensive care. We are raising money to help Drago’s family to continue to provide Victoria with the care she needs. Any help will be appreciated!🙏🏼 gofund.me/34bde687

Geoffrey Hinton (@geoffreyhinton) 's Twitter Profile Photo

In the NYT today, Cade Metz implies that I left Google so that I could criticize Google. Actually, I left so that I could talk about the dangers of AI without considering how this impacts Google. Google has acted very responsibly.