Ray
stable-baselines
Our great sponsors
Ray | stable-baselines | |
---|---|---|
42 | 10 | |
30,988 | 4,000 | |
2.8% | - | |
10.0 | 0.0 | |
about 15 hours ago | over 1 year ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Ray
-
Open Source Advent Fun Wraps Up!
22. Ray | Github | tutorial
-
Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Custom Models
Training times for GSM8k are mentioned here: https://github.com/ray-project/ray/tree/master/doc/source/te...
- Ray – an open source project for scaling AI workloads
-
Methods to keep agents inside grid world.
Here's a reference from RLlib that points to docs and an example, and here's one from one of my projects that includes all my own implementations
-
TransformerXL + PPO Baseline + MemoryGym
RLlib
- Is dynamic action masking possible in Rllib?
-
AWS re:Invent 2022 Recap | Data & Analytics services
⦿ AWS Glue Data Quality - Automatic data quality rule recommendations based on your data AWS Glue for Ray - Data integration with Ray (ray.io), a popular new open-source compute framework that helps you scale Python workloads
-
Think about it for a second
https://ray.io (just dropping the link)
-
Elixir Livebook now as a desktop app
I've wondered whether it's easier to add data analyst stuff to Elixir that Python seems to have, or add features to Python that Erlang (and by extension Elixir) provides out of the box.
By what I can see, if you want multiprocessing on Python in an easier way (let's say running async), you have to use something like ray core[0], then if you want multiple machines you need redis(?). Elixir/Erlang supports this out of the box.
Explorer[1] is an interesting approach, where it uses Rust via Rustler (Elixir library to call Rust code) and uses Polars as its dataframe library. I think Rustler needs to be reworked for this usecase, as it can be slow to return data. I made initial improvements which drastically improves encoding (https://github.com/elixir-nx/explorer/pull/282 and https://github.com/elixir-nx/explorer/pull/286, tldr 20+ seconds down to 3).
-
Learn various techniques to reduce data processing time by using multiprocessing, joblib, and tqdm concurrent
Adding these for anyone who had a similar question about Ray vs dask 1, 2, 3
stable-baselines
-
Distributed implementation tips
As underlined by gold-panda, you can give a try with multiprocessing. I once implemented a version based on what is done in stable_baselines v1 (https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/vec_env/subproc_vec_env.py)
-
GAIL without actions?
Found relevant code at https://github.com/hill-a/stable-baselines + all code implementations here
-
Best framework to use if learning today
Depends what you wanna do. Universal answer would be https://stable-baselines.readthedocs.io/
-
weird mean reward graph
As you will see here it is recommended to augment this safety measure with target kl_divergence, that will ensure even smoother learning and enforce early stopping to prevent learning collapses.
-
Nvidia ISAAC gym/RL
Code for https://arxiv.org/abs/1707.06347 found: https://github.com/hill-a/stable-baselines
- Bounds for observation
-
Understanding multi agent learning in OpenAI gym and stable-baselines
I haven't read the code, but stable-baselines doesn't support multi-agent environments (https://github.com/hill-a/stable-baselines/issues/423), so I think they're trying to make learning multi-agent easier with Environment.train().
- Using Reinforment Learning to beat the first boss in Dark souls 3 with Proximal Policy Optimization
-
Reinforcement Learning Crash Course (Free)
- https://github.com/hill-a/stable-baselines (Tensorflow)
-
JAX Implementations of Actor-Critic Algorithms
- tf2 speed: https://github.com/hill-a/stable-baselines/issues/576#issuecomment-573331715
What are some alternatives?
optuna - A hyperparameter optimization framework
stable-baselines3 - PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
rl-baselines3-zoo - A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
Faust - Python Stream Processing
Super-mario-bros-PPO-pytorch - Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
gevent - Coroutine-based concurrency library for Python
Tic-Tac-Toe-Gym - This is the Tic-Tac-Toe game made with Python using the PyGame library and the Gym library to implement the AI with Reinforcement Learning
SCOOP (Scalable COncurrent Operations in Python) - SCOOP (Scalable COncurrent Operations in Python)
DI-engine - OpenDILab Decision AI Engine
Thespian Actor Library - Python Actor concurrency library
gym