transformer Read
Questions What’s the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions ? Extended Learning Attention mechanisms with a recurrent network.
Bahdanau, Dzmitry. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).
Use convolutional neural networks to compute hidden representations in parallel.
Gehring, Jonas, et al. “Convolutional sequence to sequence learning.” International conference on machine learning. PMLR, 2017.