mars
alpha-zero-boosted
Our great sponsors
mars | alpha-zero-boosted | |
---|---|---|
- | 2 | |
2,675 | 79 | |
0.2% | - | |
5.7 | 3.2 | |
4 months ago | almost 4 years ago | |
Python | Python | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mars
We haven't tracked posts mentioning mars yet.
Tracking mentions began in Dec 2020.
alpha-zero-boosted
-
DeepMind has open-sourced the heart of AlphaGo and AlphaZero
> I came up with a nifty implementation in Python that outperforms the naive impl by 30x, allowing a pure python MCTS/NN interop implementation. See https://www.moderndescartes.com/essays/deep_dive_mcts/
Great post!
Chasing pointers in the MCTS tree is definitely a slow approach. Although typically there are < 900 "considerations" per move for alphazero. I've found getting value/policy predictions from a neural network (or GBDT[1]) for the node expansions during those considerations is at least an order of magnitude slower than the MCTS tree-hopping logic.
[1] https://github.com/cgreer/alpha-zero-boosted
-
MuZero: Mastering Go, chess, shogi and Atari without rules
What you can do is checkout the algorithm at a particular stages of development. AlphaZero&Friends start out not being very good at the game, then over time they learn and become super human. You typically checkpoint the weights for the model at various stages. So early on, the algo would be like a 600 elo player for chess and then eventually get to superhuman elo levels. So if you wanted to train you can gradually play against versions of the algo until you can beat them by loading up the weights at various difficulty stages.
I implemented AlphaZero (but not Mu yet) using GBDTs instead of NNs here if you're curious about how it would work: https://github.com/cgreer/alpha-zero-boosted. Instead of saving the "weights" for a GBDT, you save the splitpoints for the value/policy models, but the concept is the same.
What are some alternatives?
modin - Modin: Scale your Pandas workflows by changing a single line of code
KataGo - GTP engine and self-play learning in Go
eland - Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
neural_network_chess - Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)
xarray - N-D labeled arrays and datasets in Python
katrain - Improve your Baduk skills by training with KataGo!
Python-Schema-Matching - A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
adversarial-robustness-toolbox - Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
scikit-survival - Survival analysis built on top of scikit-learn
leela-zero - Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
dmatrix2np - Convert XGBoost's DMatrix format to np.array
mctx - Monte Carlo tree search in JAX