Generating neural machine instructions for multi-head attention
An animal can do training and inference every day of its existence until the day of its death. A forward pass is all you need.
Here is the current model configuration that I use for multi-head attention:
let sequence_length = 6;
let vocab_size = 20;
let n_embd = 64;
let num_heads = 8;
You can see the full model here: https://github.com/sebhtml/novigrad/blob/8df6192829edd9cda3a221e12affa770fbda2a1b/src/models/mega_man_attention.rs
The struct uses Embedding, MultiHeadAttention, Linear, Softmax:
struct MegaManAttentionModel {
input_shape: Vec<usize>,
output_shape: Vec<usize>,
vocab_size: usize,
sequence_length: usize,
embedding: Embedding,
multi_head_attention: MultiHeadAttention,
linear: Linear,
softmax: Softmax,
}
The instructions generated by Novigrad are below.
Notes:
- INSTRUCTION MatMul t170 t166 t171 means that we do a matrix multiplication between tensors t170 and t166 and the result is written to tensor t171.
- MatMulBackward is basically 2 Gemm.
Booting Neural Machine...
Neural program compiled with Novigrad
Tensors: 179
Parameters: 20888
Input size: [6, 20]
Output size: [6, 20]
Instructions: 476
------------------------------
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t84
INSTRUCTION MatMul t83 t1 t84
INSTRUCTION Zero t85
INSTRUCTION Add t84 t2 t85
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t86
INSTRUCTION MatMul t83 t3 t86
INSTRUCTION Zero t87
INSTRUCTION Add t86 t4 t87
INSTRUCTION Zero t90
INSTRUCTION MatMul t85 t87 t90
INSTRUCTION Zero t91
INSTRUCTION Scale t90 t91
INSTRUCTION Zero t92
INSTRUCTION Add t91 t7 t92
INSTRUCTION Zero t93
INSTRUCTION Softmax t92 t93
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t88
INSTRUCTION MatMul t83 t5 t88
INSTRUCTION Zero t89
INSTRUCTION Add t88 t6 t89
INSTRUCTION Zero t94
INSTRUCTION MatMul t93 t89 t94
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t95
INSTRUCTION MatMul t83 t8 t95
INSTRUCTION Zero t96
INSTRUCTION Add t95 t9 t96
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t97
INSTRUCTION MatMul t83 t10 t97
INSTRUCTION Zero t98
INSTRUCTION Add t97 t11 t98
INSTRUCTION Zero t101
INSTRUCTION MatMul t96 t98 t101
INSTRUCTION Zero t102
INSTRUCTION Scale t101 t102
INSTRUCTION Zero t103
INSTRUCTION Add t102 t14 t103
INSTRUCTION Zero t104
INSTRUCTION Softmax t103 t104
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t99
INSTRUCTION MatMul t83 t12 t99
INSTRUCTION Zero t100
INSTRUCTION Add t99 t13 t100
INSTRUCTION Zero t105
INSTRUCTION MatMul t104 t100 t105
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t106
INSTRUCTION MatMul t83 t15 t106
INSTRUCTION Zero t107
INSTRUCTION Add t106 t16 t107
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t108
INSTRUCTION MatMul t83 t17 t108
INSTRUCTION Zero t109
INSTRUCTION Add t108 t18 t109
INSTRUCTION Zero t112
INSTRUCTION MatMul t107 t109 t112
INSTRUCTION Zero t113
INSTRUCTION Scale t112 t113
INSTRUCTION Zero t114
INSTRUCTION Add t113 t21 t114
INSTRUCTION Zero t115
INSTRUCTION Softmax t114 t115
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t110
INSTRUCTION MatMul t83 t19 t110
INSTRUCTION Zero t111
INSTRUCTION Add t110 t20 t111
INSTRUCTION Zero t116
INSTRUCTION MatMul t115 t111 t116
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t117
INSTRUCTION MatMul t83 t22 t117
INSTRUCTION Zero t118
INSTRUCTION Add t117 t23 t118
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t119
INSTRUCTION MatMul t83 t24 t119
INSTRUCTION Zero t120
INSTRUCTION Add t119 t25 t120
INSTRUCTION Zero t123
INSTRUCTION MatMul t118 t120 t123
INSTRUCTION Zero t124
INSTRUCTION Scale t123 t124
INSTRUCTION Zero t125
INSTRUCTION Add t124 t28 t125
INSTRUCTION Zero t126
INSTRUCTION Softmax t125 t126
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t121
INSTRUCTION MatMul t83 t26 t121
INSTRUCTION Zero t122
INSTRUCTION Add t121 t27 t122
INSTRUCTION Zero t127
INSTRUCTION MatMul t126 t122 t127
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t128
INSTRUCTION MatMul t83 t29 t128
INSTRUCTION Zero t129
INSTRUCTION Add t128 t30 t129
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t130
INSTRUCTION MatMul t83 t31 t130
INSTRUCTION Zero t131
INSTRUCTION Add t130 t32 t131
INSTRUCTION Zero t134
INSTRUCTION MatMul t129 t131 t134
INSTRUCTION Zero t135
INSTRUCTION Scale t134 t135
INSTRUCTION Zero t136
INSTRUCTION Add t135 t35 t136
INSTRUCTION Zero t137
INSTRUCTION Softmax t136 t137
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t132
INSTRUCTION MatMul t83 t33 t132
INSTRUCTION Zero t133
INSTRUCTION Add t132 t34 t133
INSTRUCTION Zero t138
INSTRUCTION MatMul t137 t133 t138
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t139
INSTRUCTION MatMul t83 t36 t139
INSTRUCTION Zero t140
INSTRUCTION Add t139 t37 t140
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t141
INSTRUCTION MatMul t83 t38 t141
INSTRUCTION Zero t142
INSTRUCTION Add t141 t39 t142
INSTRUCTION Zero t145
INSTRUCTION MatMul t140 t142 t145
INSTRUCTION Zero t146
INSTRUCTION Scale t145 t146
INSTRUCTION Zero t147
INSTRUCTION Add t146 t42 t147
INSTRUCTION Zero t148
INSTRUCTION Softmax t147 t148
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t143
INSTRUCTION MatMul t83 t40 t143
INSTRUCTION Zero t144
INSTRUCTION Add t143 t41 t144
INSTRUCTION Zero t149
INSTRUCTION MatMul t148 t144 t149
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t150
INSTRUCTION MatMul t83 t43 t150
INSTRUCTION Zero t151
INSTRUCTION Add t150 t44 t151
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t152
INSTRUCTION MatMul t83 t45 t152
INSTRUCTION Zero t153
INSTRUCTION Add t152 t46 t153
INSTRUCTION Zero t156
INSTRUCTION MatMul t151 t153 t156
INSTRUCTION Zero t157
INSTRUCTION Scale t156 t157
INSTRUCTION Zero t158
INSTRUCTION Add t157 t49 t158
INSTRUCTION Zero t159
INSTRUCTION Softmax t158 t159
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t154
INSTRUCTION MatMul t83 t47 t154
INSTRUCTION Zero t155
INSTRUCTION Add t154 t48 t155
INSTRUCTION Zero t160
INSTRUCTION MatMul t159 t155 t160
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t161
INSTRUCTION MatMul t83 t50 t161
INSTRUCTION Zero t162
INSTRUCTION Add t161 t51 t162
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t163
INSTRUCTION MatMul t83 t52 t163
INSTRUCTION Zero t164
INSTRUCTION Add t163 t53 t164
INSTRUCTION Zero t167
INSTRUCTION MatMul t162 t164 t167
INSTRUCTION Zero t168
INSTRUCTION Scale t167 t168
INSTRUCTION Zero t169
INSTRUCTION Add t168 t56 t169
INSTRUCTION Zero t170
INSTRUCTION Softmax t169 t170
INSTRUCTION Zero t83
INSTRUCTION MatMul t81 t0 t83
INSTRUCTION Zero t165
INSTRUCTION MatMul t83 t54 t165
INSTRUCTION Zero t166
INSTRUCTION Add t165 t55 t166
INSTRUCTION Zero t171
INSTRUCTION MatMul t170 t166 t171
INSTRUCTION Zero t172
INSTRUCTION Concat t94 t105 t116 t127 t138 t149 t160 t171 t172
INSTRUCTION Zero t173
INSTRUCTION MatMul t172 t57 t173
INSTRUCTION Zero t174
INSTRUCTION Add t173 t58 t174
INSTRUCTION Zero t175
INSTRUCTION MatMul t174 t59 t175
INSTRUCTION Zero t176
INSTRUCTION Add t175 t60 t176
INSTRUCTION Zero t177
INSTRUCTION Softmax t176 t177
INSTRUCTION Zero t178
INSTRUCTION CrossEntropyLoss t82 t177 t178
INSTRUCTION CrossEntropyLossBackward t82 t177 t177
INSTRUCTION Clip t177
INSTRUCTION IdentityBackward t177 t176
INSTRUCTION Clip t176
INSTRUCTION AddBackward t176 t175 t60
INSTRUCTION Clip t175 t60
INSTRUCTION MatMulBackward t174 t59 t175 t174 t59
INSTRUCTION Clip t174 t59
INSTRUCTION AddBackward t174 t173 t58
INSTRUCTION Clip t173 t58
INSTRUCTION MatMulBackward t172 t57 t173 t172 t57
INSTRUCTION Clip t172 t57
INSTRUCTION ConcatBackward t172 t94 t105 t116 t127 t138 t149 t160 t171
INSTRUCTION Clip t94 t105 t116 t127 t138 t149 t160 t171
INSTRUCTION MatMulBackward t170 t166 t171 t170 t166
INSTRUCTION Clip t170 t166
INSTRUCTION AddBackward t166 t165 t55
INSTRUCTION Clip t165 t55
INSTRUCTION MatMulBackward t83 t54 t165 t83 t54
INSTRUCTION Clip t83 t54
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t170 t169
INSTRUCTION Clip t169
INSTRUCTION AddBackward t169 t168 t56
INSTRUCTION Clip t168 t56
INSTRUCTION Identity t168 t167
INSTRUCTION Clip t167
INSTRUCTION MatMulBackward t162 t164 t167 t162 t164
INSTRUCTION Clip t162 t164
INSTRUCTION AddBackward t164 t163 t53
INSTRUCTION Clip t163 t53
INSTRUCTION MatMulBackward t83 t52 t163 t83 t52
INSTRUCTION Clip t83 t52
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t162 t161 t51
INSTRUCTION Clip t161 t51
INSTRUCTION MatMulBackward t83 t50 t161 t83 t50
INSTRUCTION Clip t83 t50
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t159 t155 t160 t159 t155
INSTRUCTION Clip t159 t155
INSTRUCTION AddBackward t155 t154 t48
INSTRUCTION Clip t154 t48
INSTRUCTION MatMulBackward t83 t47 t154 t83 t47
INSTRUCTION Clip t83 t47
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t159 t158
INSTRUCTION Clip t158
INSTRUCTION AddBackward t158 t157 t49
INSTRUCTION Clip t157 t49
INSTRUCTION Identity t157 t156
INSTRUCTION Clip t156
INSTRUCTION MatMulBackward t151 t153 t156 t151 t153
INSTRUCTION Clip t151 t153
INSTRUCTION AddBackward t153 t152 t46
INSTRUCTION Clip t152 t46
INSTRUCTION MatMulBackward t83 t45 t152 t83 t45
INSTRUCTION Clip t83 t45
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t151 t150 t44
INSTRUCTION Clip t150 t44
INSTRUCTION MatMulBackward t83 t43 t150 t83 t43
INSTRUCTION Clip t83 t43
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t148 t144 t149 t148 t144
INSTRUCTION Clip t148 t144
INSTRUCTION AddBackward t144 t143 t41
INSTRUCTION Clip t143 t41
INSTRUCTION MatMulBackward t83 t40 t143 t83 t40
INSTRUCTION Clip t83 t40
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t148 t147
INSTRUCTION Clip t147
INSTRUCTION AddBackward t147 t146 t42
INSTRUCTION Clip t146 t42
INSTRUCTION Identity t146 t145
INSTRUCTION Clip t145
INSTRUCTION MatMulBackward t140 t142 t145 t140 t142
INSTRUCTION Clip t140 t142
INSTRUCTION AddBackward t142 t141 t39
INSTRUCTION Clip t141 t39
INSTRUCTION MatMulBackward t83 t38 t141 t83 t38
INSTRUCTION Clip t83 t38
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t140 t139 t37
INSTRUCTION Clip t139 t37
INSTRUCTION MatMulBackward t83 t36 t139 t83 t36
INSTRUCTION Clip t83 t36
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t137 t133 t138 t137 t133
INSTRUCTION Clip t137 t133
INSTRUCTION AddBackward t133 t132 t34
INSTRUCTION Clip t132 t34
INSTRUCTION MatMulBackward t83 t33 t132 t83 t33
INSTRUCTION Clip t83 t33
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t137 t136
INSTRUCTION Clip t136
INSTRUCTION AddBackward t136 t135 t35
INSTRUCTION Clip t135 t35
INSTRUCTION Identity t135 t134
INSTRUCTION Clip t134
INSTRUCTION MatMulBackward t129 t131 t134 t129 t131
INSTRUCTION Clip t129 t131
INSTRUCTION AddBackward t131 t130 t32
INSTRUCTION Clip t130 t32
INSTRUCTION MatMulBackward t83 t31 t130 t83 t31
INSTRUCTION Clip t83 t31
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t129 t128 t30
INSTRUCTION Clip t128 t30
INSTRUCTION MatMulBackward t83 t29 t128 t83 t29
INSTRUCTION Clip t83 t29
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t126 t122 t127 t126 t122
INSTRUCTION Clip t126 t122
INSTRUCTION AddBackward t122 t121 t27
INSTRUCTION Clip t121 t27
INSTRUCTION MatMulBackward t83 t26 t121 t83 t26
INSTRUCTION Clip t83 t26
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t126 t125
INSTRUCTION Clip t125
INSTRUCTION AddBackward t125 t124 t28
INSTRUCTION Clip t124 t28
INSTRUCTION Identity t124 t123
INSTRUCTION Clip t123
INSTRUCTION MatMulBackward t118 t120 t123 t118 t120
INSTRUCTION Clip t118 t120
INSTRUCTION AddBackward t120 t119 t25
INSTRUCTION Clip t119 t25
INSTRUCTION MatMulBackward t83 t24 t119 t83 t24
INSTRUCTION Clip t83 t24
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t118 t117 t23
INSTRUCTION Clip t117 t23
INSTRUCTION MatMulBackward t83 t22 t117 t83 t22
INSTRUCTION Clip t83 t22
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t115 t111 t116 t115 t111
INSTRUCTION Clip t115 t111
INSTRUCTION AddBackward t111 t110 t20
INSTRUCTION Clip t110 t20
INSTRUCTION MatMulBackward t83 t19 t110 t83 t19
INSTRUCTION Clip t83 t19
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t115 t114
INSTRUCTION Clip t114
INSTRUCTION AddBackward t114 t113 t21
INSTRUCTION Clip t113 t21
INSTRUCTION Identity t113 t112
INSTRUCTION Clip t112
INSTRUCTION MatMulBackward t107 t109 t112 t107 t109
INSTRUCTION Clip t107 t109
INSTRUCTION AddBackward t109 t108 t18
INSTRUCTION Clip t108 t18
INSTRUCTION MatMulBackward t83 t17 t108 t83 t17
INSTRUCTION Clip t83 t17
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t107 t106 t16
INSTRUCTION Clip t106 t16
INSTRUCTION MatMulBackward t83 t15 t106 t83 t15
INSTRUCTION Clip t83 t15
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t104 t100 t105 t104 t100
INSTRUCTION Clip t104 t100
INSTRUCTION AddBackward t100 t99 t13
INSTRUCTION Clip t99 t13
INSTRUCTION MatMulBackward t83 t12 t99 t83 t12
INSTRUCTION Clip t83 t12
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t104 t103
INSTRUCTION Clip t103
INSTRUCTION AddBackward t103 t102 t14
INSTRUCTION Clip t102 t14
INSTRUCTION Identity t102 t101
INSTRUCTION Clip t101
INSTRUCTION MatMulBackward t96 t98 t101 t96 t98
INSTRUCTION Clip t96 t98
INSTRUCTION AddBackward t98 t97 t11
INSTRUCTION Clip t97 t11
INSTRUCTION MatMulBackward t83 t10 t97 t83 t10
INSTRUCTION Clip t83 t10
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t96 t95 t9
INSTRUCTION Clip t95 t9
INSTRUCTION MatMulBackward t83 t8 t95 t83 t8
INSTRUCTION Clip t83 t8
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION MatMulBackward t93 t89 t94 t93 t89
INSTRUCTION Clip t93 t89
INSTRUCTION AddBackward t89 t88 t6
INSTRUCTION Clip t88 t6
INSTRUCTION MatMulBackward t83 t5 t88 t83 t5
INSTRUCTION Clip t83 t5
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION SoftmaxBackward t93 t92
INSTRUCTION Clip t92
INSTRUCTION AddBackward t92 t91 t7
INSTRUCTION Clip t91 t7
INSTRUCTION Identity t91 t90
INSTRUCTION Clip t90
INSTRUCTION MatMulBackward t85 t87 t90 t85 t87
INSTRUCTION Clip t85 t87
INSTRUCTION AddBackward t87 t86 t4
INSTRUCTION Clip t86 t4
INSTRUCTION MatMulBackward t83 t3 t86 t83 t3
INSTRUCTION Clip t83 t3
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
INSTRUCTION AddBackward t85 t84 t2
INSTRUCTION Clip t84 t2
INSTRUCTION MatMulBackward t83 t1 t84 t83 t1
INSTRUCTION Clip t83 t1
INSTRUCTION MatMulBackward t81 t0 t83 t81 t0
INSTRUCTION Clip t81 t0
Comments