[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 25% of the time
Created by TurnTrout on 2023-02-09; known on 2023-02-16
- TurnTrout estimated 90% on 2023-02-09
- TurnTrout on 2023-02-09
- peligrietzer estimated 95% on 2023-02-09
- uli estimated 97% and said “Decision squares usually have 3 different choices, so a-priori 33% chance, and it seems unlikely to get less likely under pressure from gradient descent” on 2023-02-12
- rhaps0dy estimated 98% and said “Thanks uli for precomputing priors for me” on 2023-03-02
- TurnTrout said “Although note that there are, technically, always 5 different actions available, but some of them might have the same effects (eg going into wall is same as no-op).” on 2023-03-02