This short tutorial covers the basics of the Transformer, a neural network architecture designed for handling sequential data in machine learning.
Timestamps:
0:00 - Intro
1:18 - Motivation for developing the Transformer
2:44 - Input embeddings (start of encoder walk-through)
3:29 - Attention
6:29 - Multi-head attention
7:55 - Positional encodings
9:59 - Add & norm, feedforward, & stacking encoder layers
11:14 - Masked multi-head attention (start of decoder walk-through)
12:35 - Cross-attention
13:38 - Decoder output & prediction probabilities
14:46 - Complexity analysis
16:00 - Transformers as graph neural networks
Original Transformers paper:
Attention is All You Need -
Other papers mentioned:
(GPT-3) Language Models are Few-Shot Learners -
(DALL-E) Zero-Shot Text-to-Image Generation -
BERT: Pre-training of Deep Bidirectional Tran
1 view
21
14
6 months ago 01:00:19 1
Agents of SHIELD S1E17 - Turn, Turn, Turn with Clark Gregg
6 months ago 00:03:41 1
Max Oazo & Ojax - It’s My Life Remix | Best Deep House Remix 2024