EfficientZero
minihack
Our great sponsors
EfficientZero | minihack | |
---|---|---|
9 | 5 | |
823 | 439 | |
- | 4.6% | |
0.0 | 6.8 | |
3 months ago | 19 days ago | |
Python | Python | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
EfficientZero
-
[D] GPT-3T: Can we train language models to think further ahead?
Here's an algorithm that is more sample efficient : https://github.com/YeWR/EfficientZero
-
MuZero learns to play Teamfight Tactics
Use multiprocessing to have more GPU workers could help. My code based on EfficientZero https://github.com/YeWR/EfficientZero is utilizing CPUs and GPUs to 90%. It uses Ray for multiprocessing and splits Reanalyze into CPU and GPU workers to maximize resource utilization. By the way, it's not converging to optimal policy well: it gets stuck at 50% optimal episode return at with a small amount of training. Have you had this issue before?
- Anyone found any working replication repo for MuZero?
-
[D] Most important AI Paper´s this year so far in my opinion + Proto AGI speculation at the end
Mastering Atari Games with Limited Data – EfficientZero ( Human sample -efficiency! ) Paper: https://arxiv.org/abs/2111.00210 Lesswrong article about the paper: https://www.lesswrong.com/posts/mRwJce3npmzbKfxws/efficientzero-how-it-works Github: https://github.com/YeWR/EfficientZero
minihack
- Difficult RL generalization benchmarks
-
Anyone found any working replication repo for MuZero?
I have an implementation of Stochastic MuZero in JAX. It's been tested solely in MiniHack environments, but can be made to work in other environments by changing the representation function.
-
Best GridWorld environment?
If you want something as simple as possible, I'd go with MiniGrid, and if you want to have a richer world with more complex settings, then MiniHack.
What are some alternatives?
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
flash-attention-jax - Implementation of Flash Attention in Jax
gym-simplegrid - Simple Gridworld Gymnasium Environment
XMem - [ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
flash-attention - Fast and memory-efficient exact attention
RHO-Loss
CodeRL - This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS22).
msn - Masked Siamese Networks for Label-Efficient Learning (https://arxiv.org/abs/2204.07141)
Minigrid - Simple and easily configurable grid world environments for reinforcement learning
mctx - Monte Carlo tree search in JAX
perceiver-ar
ede - Code for the paper "Uncertainty-Driven Exploration for Generalization in Reinforcement Learning".