An animal can do training and inference every day of its existence until the day of its death. A forward pass is all you need. Here is the current model configuration that I use for multi-head attention: let sequence_length = 6; let vocab_size = 20; let n_embd = 64; let num_heads = 8; You can see the full model here: https://github.com/sebhtml/novigrad/blob/8df6192829edd9cda3a221e12affa770fbda2a1b/src/models/mega_man_attention.rs The struct uses Embedding, MultiHeadAttention, Linear, Softmax: struct MegaManAttentionModel { input_shape: Vec<usize>, output_shape: Vec<usize>, vocab_size: usize, sequence_length: usize, embedding: Embedding, multi_head_attention: MultiHeadAttention, linear: Linear, softmax: Softmax, } The instructions generated by Novigrad are below. Notes: - INSTRUCTION MatMul t170 t166 t171 means that we do a matrix multiplication between tensors t170 and t166 and the result is wr
Comments