AI: bump vocab_size from 256 to 34816 using byte pair encoding

BPE

My byte-pair encoding (BPE) implementation works.

 

 

PR: https://github.com/sebhtml/novigrad/pull/17

I increased vocab_size from 256 to 34816 with BPE.

 

GPU

Thanks to NVIDIA, I can do this on my laptop that has a NVIDIA RTX 4060.



 

Tokens

The text I am using has 78970 characters.

With my BPE implementation, this text is encoded into 44513 tokens.



Loss

The loss goes from 108.2 to 0.

Epoch 0 Total_error 108.20975, change: NaN
Epoch 100 Total_error 0, change: -1
Epoch 300 Total_error 0, change: NaN
 

 

Example 0

input_text: {{short description|Video game franchise}}

{{About|the vi

expected_output_text: de


input_tokens: [22796, 30981, 11071, 20834, 19622, 31405, 30073, 11, 23050, 12550, 13, 32014, 19623, 16841, 6070, 17, 125, 28674, 1460, 15833, 20052, 8, 22923, 20, 22797, 21, 24292, 31531, 13, 5289, 126, 24990]


expected_output_token: [19624]

 

Example 0 before training:

Loss 11.230275
actual_output_token: [13142]
actual_output_text: me

 

Example 0 after training:

Loss -0
actual_output_token: [19624]
actual_output_text: de

 

Weirdness

The weird thing is that the neural network can generate a token that is not known to the BPE decoder. In this case, I chose to simply emit '?'.
 

Comments

Popular posts from this blog

The Thorium actor engine is operational now, we can start to work on actor applications for metagenomics

Learning to solve the example 1 of puzzle 3aa6fb7a in the ARC prize

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor