PredictionBook is now read-only ( read more ).

[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 95% of the time

Created by TurnTrout on 2023-02-09; known on 2023-02-16

  • TurnTrout estimated 12% on 2023-02-09
  • peligrietzer estimated 5% on 2023-02-09
  • uli estimated 20% and said “I don’t know the details of the training process, but given the policy is (roughly) trained by supervised learning to argmax over the advantage, it doesn’t seem that unlikely on 2023-02-12
  • TurnTrout changed the deadline from “on 2023-02-16on 2023-03-01
  • rhaps0dy estimated 15% on 2023-03-02