-
rliable
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
-
bsuite
bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
First off, "performance" is highly speculative. So make sure you nail down what you mean and ensure reliability of those measurements. Check out https://github.com/google-research/rliable.
Second, if you're interested in evaluating the robustness of the algo implementation, then projects like https://github.com/deepmind/bsuite might help highlight issues, although they may not be relevent to your problem.
Related posts
-
[D] What is standard practice in RL when reporting average returns across multiple seeds in a table or a plot?
-
What is the next booming topic in Deep RL?
-
"Human-level Atari 200x faster", DeepMind 2022 (200x reduction in dataset scale required by Agent57 for human performance)
-
How Hugging Face 🤗 can contribute to the Deep Reinforcement Learning Ecosystem?
-
Deep RL at the Edge of Statistical Precipice (NeurIPS Outstanding Paper)