DeepSPIN (@deep_spin) 's Twitter Profile
DeepSPIN

@deep_spin

Deep structured prediction in NLP. ERC project coordinated by @andre_t_martins. Instituto de Telecomunicações.

ID: 1026171704744325120

linkhttps://deep-spin.github.io/ calendar_today05-08-2018 18:22:54

23 Tweet

349 Followers

73 Following

DeepSPIN (@deep_spin) 's Twitter Profile Photo

"Structure Back in Play, Translation Wants More Context" DeepSPINner André Martins writes on the Unbabel R&D blog his notes from this year's #icml2018 and #ACL2018: medium.com/unbabel/icml-a…

timorous bestie 😷 (@vnfrombucharest) 's Twitter Profile Photo

Towards Dynamic Computation Graphs via Sparse Latent Structure: #emnlp2018 + André Claire Cardie - marginalize over structured latent vars w/ SparseMAP - CG a function of discrete structure - eg latent dependency TreeLSTM pdf arxiv.org/abs/1809.00653 code github.com/vene/sparsemap…

Towards Dynamic Computation Graphs via Sparse Latent Structure: #emnlp2018 + André <a href="/clairecardie/">Claire Cardie</a>

- marginalize over structured latent vars w/ SparseMAP
- CG a function of discrete structure
- eg latent dependency TreeLSTM

pdf arxiv.org/abs/1809.00653
code github.com/vene/sparsemap…
DeepSPIN (@deep_spin) 's Twitter Profile Photo

DeepSPIN talks at #emnlp2018 ! - Thu, 11:00AM, talk @ Blackbox NLP Interpretable Structure Induction via Sparse Attention. Peters/Niculae/Martins. - Fri, 3:36PM, main conf talk @ ML(3B) Towards Dynamic Computation Graphs via Sparse Latent Structure. Niculae/Martins/Cardie.

DeepSPIN (@deep_spin) 's Twitter Profile Photo

A nice write-up of the challenges of lemmatization by DeepSPINner Erick! Multilingual examples reveal different complexities hard to imagine if focusing on English.

timorous bestie 😷 (@vnfrombucharest) 's Twitter Profile Photo

Adaptively Sparse Transformers @emnlp2019 +Gonçalo Correia, André Martins α-entmax attention α=1: softmax, α=2: sparsemax, continuous in between. twist: we learn α for each head, w gradients! Some heads become dense, some sparse. arxiv.org/abs/1909.00015 github.com/deep-spin/entm…

Adaptively Sparse Transformers
@emnlp2019 +Gonçalo Correia, André Martins 

α-entmax attention
α=1: softmax, α=2: sparsemax, continuous in between.
twist: we learn α for each head, w gradients! Some heads become dense, some sparse.

arxiv.org/abs/1909.00015
github.com/deep-spin/entm…