[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 25% of the time

Created by TurnTrout on 2023-02-09; known on 2023-02-16

TurnTrout estimated 90% on 2023-02-09
peligrietzer estimated 95% on 2023-02-09
uli estimated 97% and said “Decision squares usually have 3 different choices, so a-priori 33% chance, and it seems unlikely to get less likely under pressure from gradient descent” on 2023-02-12
TurnTrout changed the deadline from “on 2023-02-16” on 2023-03-01
rhaps0dy estimated 98% and said “Thanks uli for precomputing priors for me” on 2023-03-02
TurnTrout said “Although note that there are, technically, always 5 different actions available, but some of them might have the same effects (eg going into wall is same as no-op).” on 2023-03-02