Learning to solve the example 1 of puzzle 3aa6fb7a in the ARC prize
Hi dear reader, Puzzle rules The example 1 of the puzzle 3aa6fb7a of the ARC prize looks like this: A playout starts with an blank board of width 7 and of height 7. There are therefore 49 cells. Each cell can be blank or can contain one of the 10 possible colors. The colors are identified from 0 to 9. I added a rule. This rule is that each cell has to be changed exactly once. The game ends when all cells have been changed. The player wins if all the cell colors match the cell colors in the expected output. Some numbers: Each cell has 11 possible values: 10 colors + blank state There are 49! ( 6.0828186e+62) distinct possible orders of cell changes in a playout. There are 49!*10 ( 6.0828186e+63) 49 * 10 = 490 possible actions for the first action. There are 11^49 ( 1.0671896e+51) possible board states The number of possible pairs (s, a) is even larger and upper-bounded by 11^49 * 49 * 10 = 5.2292289e+53 . Given a board state s and a proposed action a, an action value is calculat