Xiaodong Liu (@allenlao) Twitter Tweets • TwiCopy

Xiaodong Liu

4 years ago

Our recent work on large LM pretraining obtains SOTA on both GLUE/SuperGLUE. Notably, it first achieved human parity on MNLI and RTE on GLUE, the last two GLUE tasks which human parity had not yet met. I'm clearing up codes for SuperGLUE and will release them once it is done.

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Microsoft Research

@msftresearch

4 years ago

Current benchmarks may yield imprecise readings of AI models’ natural language understanding. Two new NLU benchmarks aim for more accurate evaluations. #NeurIPS2021 msft.it/6003kf7QR

thumb_up_off_alt53

chat_bubble_outline2

repeat17

shareShare

Hoifung Poon

@hoifungpoon

4 years ago

Our latest work on knowledge-rich self-supervision, with extensive study in biomedical entity linking. 1/n

thumb_up_off_alt15

chat_bubble_outline2

repeat3

shareShare

Sebastien Bubeck

@sebastienbubeck

4 years ago

The Algorithms team at MSR Redmond is looking for someone with hands-on experience in NLP and deep learning tools to implement and develop optimization of differentially private learning algorithms! Great opportunity to work with a fantastic team. careers.microsoft.com/us/en/job/1260…

thumb_up_off_alt53

chat_bubble_outline2

repeat14

shareShare

Microsoft Research

@msftresearch

4 years ago

When a neural network is too large to pretrain more than once, tuning its hyperparameters is practically impossible. Today, we announce μTransfer—a new technique that can tune the 6.7 billion parameter GPT-3 model using only 7% of the pretraining compute: msft.it/6009wwxJD

thumb_up_off_alt473

chat_bubble_outline6

repeat95

shareShare

Greg Yang

@thegregyang

4 years ago

1/ You can't train GPT-3 on a single GPU, much less tune its hyperparameters (HPs). But what if I tell you… …you *can* tune its HPs on a single GPU thanks to new theoretical advances? paper arxiv.org/abs/2203.03466 code github.com/microsoft/mup blog microsoft.com/en-us/research…

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat271

shareShare

Aleksa Gordić (水平问题)

@gordic_aleksa

4 years ago

[🥳new video🧠] "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer" (μTransfer) paper explained! YT: youtu.be/MNOJQINH-qw Greg Yang Edward Hu @ibab_ml Szymon Sidor Xiaodong Liu Jakub Pachocki Weizhu Chen Jianfeng Gao Microsoft Research OpenAI

thumb_up_off_alt24

chat_bubble_outline1

repeat6

shareShare

Databricks Mosaic Research

@dbrxmosaicai

4 years ago

Today, an exciting paper from Microsoft Research: Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer arxiv.org/abs/2203.03466 While it's too early to say, this may be remembered as the single biggest efficiency advancement in hyperparameter tuning.

Today, an exciting paper from <a href="/MSFTResearch/">Microsoft Research</a>:
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
arxiv.org/abs/2203.03466

While it's too early to say, this may be remembered as the single biggest efficiency advancement in hyperparameter tuning.

thumb_up_off_alt206

chat_bubble_outline3

repeat38

shareShare

AI at Meta

@aiatmeta

4 years ago

Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access: ai.facebook.com/blog/democrati…

thumb_up_off_alt2,2K

chat_bubble_outline47

repeat650

shareShare

Nathan Benaich

@nathanbenaich

4 years ago

🤓In 2017, Google researchers introduced the Transformer in "Attention is all you need", which took AI by storm. 5 startups were born: Adept (🏦 Air Street Capital), Inceptive, NEAR Protocol, @CohereAI, CharacterAI. Only 1/8 authors remain Google AI, another is at OpenAI. 😉

🤓In 2017, Google researchers introduced the Transformer in "Attention is all you need", which took AI by storm.

5 startups were born: <a href="/AdeptAILabs/">Adept</a> (🏦 <a href="/airstreet/">Air Street Capital</a>), Inceptive, <a href="/NEARProtocol/">NEAR Protocol</a>, @CohereAI, CharacterAI.

Only 1/8 authors remain <a href="/GoogleAI/">Google AI</a>, another is at <a href="/OpenAI/">OpenAI</a>.

😉

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat179

shareShare

Yaqing Wang

@yaqing_wang

4 years ago

🚨[New Paper] Check out our recent work on parameter-efficient fine-tuning. We introduce a new method to boost the performance of Adapter to outperform full model fine-tuning. Great collaboration with Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Awadallah and Jianfeng Gao.

thumb_up_off_alt27

chat_bubble_outline0

repeat7

shareShare

William Fedus

@liamfedus

3 years ago

Today we're releasing all Switch Transformer models in T5X/JAX, including the 1.6T param Switch-C and the 395B param Switch-XXL models. Pleased to have these open-sourced! github.com/google-researc… All thanks to the efforts of James Lee-Thorp, Adam Roberts, and Hyung Won Chung

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat193

shareShare

MMitchell

@mmitchell_ai

3 years ago

Q: ACM FAccT (main AI Ethics conf) was $10,000 short. They also turned down Google sponsorship due to G's continued refusal to address structural discrimination & trauma to me & @timnitGebru (@dair-community.social/bsky.social) specifically. Is there any issue w/ me starting a GoFundMe to make up the diff?

thumb_up_off_alt100

chat_bubble_outline4

repeat10

shareShare

Tuo Zhao

@tourzhao

3 years ago

Need scalable and efficient large language models for long sequences? Check our SPADE models in arxiv.org/abs/2212.08136. By leveraging a state space layer, SPADE complements the lack of long-range dependency issue in transformer models using local attentions. (1/3)

thumb_up_off_alt23

chat_bubble_outline1

repeat4

shareShare

Yann LeCun

@ylecun

3 years ago

LLMs are still making sh*t up. That's fine if you use them as writing assistants. Not good as question answerers, search engines, etc. RLHF merely mitigates the most frequent mistakes without actually fixing the problem.

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat204

shareShare

Jeff Dean

@jeffdean

3 years ago

Bard is now available in the US and UK, w/more countries to come. It’s great to see early Google AI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at bard.google.com

thumb_up_off_alt716

chat_bubble_outline28

repeat118

shareShare

Rada Mihalcea

@radamihalcea

3 years ago

Drago loved his family and was a deeply caring father. His daughter, Victoria has a disability and requires extensive care. We are raising money to help Drago’s family to continue to provide Victoria with the care she needs. Any help will be appreciated!🙏🏼 gofund.me/34bde687

thumb_up_off_alt73

chat_bubble_outline0

repeat39

shareShare

Geoffrey Hinton

@geoffreyhinton

3 years ago

In the NYT today, Cade Metz implies that I left Google so that I could criticize Google. Actually, I left so that I could talk about the dangers of AI without considering how this impacts Google. Google has acted very responsibly.

thumb_up_off_alt15,15K

chat_bubble_outline626

repeat2,2K

shareShare

Jacob Andreas

@jacobandreas

2 years ago

thumb_up_off_alt112

chat_bubble_outline1

repeat20

shareShare

Hoifung Poon

@hoifungpoon

10 months ago

Kudo to co-first authors Yu Gu, Robert Tinn, Hao Cheng for driving the project to a big success, and big congrats to all the co-authors Michael R. Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao.

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare