AI: bump vocab_size from 256 to 34816 using byte pair encoding

BPE

My byte-pair encoding (BPE) implementation works.

 

 

PR: https://github.com/sebhtml/novigrad/pull/17

I increased vocab_size from 256 to 34816 with BPE.

 

GPU

Thanks to NVIDIA, I can do this on my laptop that has a NVIDIA RTX 4060.



 

Tokens

The text I am using has 78970 characters.

With my BPE implementation, this text is encoded into 44513 tokens.



Loss

The loss goes from 108.2 to 0.

Epoch 0 Total_error 108.20975, change: NaN
Epoch 100 Total_error 0, change: -1
Epoch 300 Total_error 0, change: NaN
 

 

Example 0

input_text: {{short description|Video game franchise}}

{{About|the vi

expected_output_text: de


input_tokens: [22796, 30981, 11071, 20834, 19622, 31405, 30073, 11, 23050, 12550, 13, 32014, 19623, 16841, 6070, 17, 125, 28674, 1460, 15833, 20052, 8, 22923, 20, 22797, 21, 24292, 31531, 13, 5289, 126, 24990]


expected_output_token: [19624]

 

Example 0 before training:

Loss 11.230275
actual_output_token: [13142]
actual_output_text: me

 

Example 0 after training:

Loss -0
actual_output_token: [19624]
actual_output_text: de

 

Weirdness

The weird thing is that the neural network can generate a token that is not known to the BPE decoder. In this case, I chose to simply emit '?'.
 

Comments

Popular posts from this blog

A survey of the burgeoning industry of cloud genomics

Generating neural machine instructions for multi-head attention

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor