Posts

Showing posts from May, 2024

Generating neural machine instructions for multi-head attention

  An animal can do training and inference every day of its existence until the day of its death. A forward pass is all you need.   Here is the current model configuration that I use for multi-head attention:         let sequence_length = 6;         let vocab_size = 20;         let n_embd = 64;         let num_heads = 8; You can see the full model here: https://github.com/sebhtml/novigrad/blob/8df6192829edd9cda3a221e12affa770fbda2a1b/src/models/mega_man_attention.rs The struct uses Embedding, MultiHeadAttention, Linear, Softmax: struct MegaManAttentionModel {     input_shape: Vec<usize>,     output_shape: Vec<usize>,     vocab_size: usize,     sequence_length: usize,     embedding: Embedding,     multi_head_attention: MultiHeadAttention,     linear: Linear,     softmax: Softmax, } The instructions generated by Novigrad are below. Notes: - INSTRUCTION    MatMul    t170 t166    t171     means that we do a matrix multiplication between tensors t170 and t166 and the result is wr

AI: bump vocab_size from 256 to 34816 using byte pair encoding

Image
BPE My byte-pair encoding (BPE) implementation works.     PR: https://github.com/sebhtml/novigrad/pull/17 I increased vocab_size from 256 to 34816 with BPE.   GPU Thanks to NVIDIA, I can do this on my laptop that has a NVIDIA RTX 4060.   Tokens The text I am using has 78970 characters. With my BPE implementation, this text is encoded into 44513 tokens. Loss The loss goes from 108.2 to 0. Epoch 0 Total_error 108.20975, change: NaN Epoch 100 Total_error 0, change: -1 Epoch 300 Total_error 0, change: NaN     Example 0 input_text: {{short description|Video game franchise}} {{About|the vi expected_output_text: de input_tokens: [22796, 30981, 11071, 20834, 19622, 31405, 30073, 11, 23050, 12550, 13, 32014, 19623, 16841, 6070, 17, 125, 28674, 1460, 15833, 20052, 8, 22923, 20, 22797, 21, 24292, 31531, 13, 5289, 126, 24990] expected_output_token: [19624]   Example 0 before training: Loss 11.230275 actual_output_token: [13142] actual_output_text: me   Example 0 after training: Loss -0 actual_ou