alpha-zero-boosted

A "build to learn" Alpha Zero implementation using Gradient Boosted Decision Trees (LightGBM) (by cgreer)

Alpha-zero-boosted Alternatives

Similar projects and alternatives to alpha-zero-boosted

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better alpha-zero-boosted alternative or higher similarity.

alpha-zero-boosted reviews and mentions

Posts with mentions or reviews of alpha-zero-boosted. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-15.
  • DeepMind has open-sourced the heart of AlphaGo and AlphaZero
    4 projects | news.ycombinator.com | 15 Feb 2023
    > I came up with a nifty implementation in Python that outperforms the naive impl by 30x, allowing a pure python MCTS/NN interop implementation. See https://www.moderndescartes.com/essays/deep_dive_mcts/

    Great post!

    Chasing pointers in the MCTS tree is definitely a slow approach. Although typically there are < 900 "considerations" per move for alphazero. I've found getting value/policy predictions from a neural network (or GBDT[1]) for the node expansions during those considerations is at least an order of magnitude slower than the MCTS tree-hopping logic.

    [1] https://github.com/cgreer/alpha-zero-boosted

  • MuZero: Mastering Go, chess, shogi and Atari without rules
    3 projects | news.ycombinator.com | 23 Dec 2020
    What you can do is checkout the algorithm at a particular stages of development. AlphaZero&Friends start out not being very good at the game, then over time they learn and become super human. You typically checkpoint the weights for the model at various stages. So early on, the algo would be like a 600 elo player for chess and then eventually get to superhuman elo levels. So if you wanted to train you can gradually play against versions of the algo until you can beat them by loading up the weights at various difficulty stages.

    I implemented AlphaZero (but not Mu yet) using GBDTs instead of NNs here if you're curious about how it would work: https://github.com/cgreer/alpha-zero-boosted. Instead of saving the "weights" for a GBDT, you save the splitpoints for the value/policy models, but the concept is the same.

Stats

Basic alpha-zero-boosted repo stats
2
79
3.2
almost 4 years ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com