Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

tmrl

11 422 6.3 Python

Reinforcement Learning for real-time applications - host of the TrackMania Roborace League

Hi all! I'm implementing a TQC with n-step learning in Trackmania (I forked original repo from here: https://github.com/trackmania-rl/tmrl, my modified version here: https://github.com/Pheoxis/AITrackmania/tree/main). It compiles, but I am pretty sure that I implemented n-step learning incorrectly, but as a beginner I don't know what I did wrong. Here's my code before implementing n-step algorithm: https://github.com/Pheoxis/AITrackmania/blob/main/tmrl/custom/custom_algorithms.py. If anyone checked what I did wrong, I would be very grateful. I will also attach some plots from my last training and outputs from printed lines (print.txt), maybe it will help :) If you need any additional information feel free to ask.

AITrackmania

1 2 8.3 Python

Hi all! I'm implementing a TQC with n-step learning in Trackmania (I forked original repo from here: https://github.com/trackmania-rl/tmrl, my modified version here: https://github.com/Pheoxis/AITrackmania/tree/main). It compiles, but I am pretty sure that I implemented n-step learning incorrectly, but as a beginner I don't know what I did wrong. Here's my code before implementing n-step algorithm: https://github.com/Pheoxis/AITrackmania/blob/main/tmrl/custom/custom_algorithms.py. If anyone checked what I did wrong, I would be very grateful. I will also attach some plots from my last training and outputs from printed lines (print.txt), maybe it will help :) If you need any additional information feel free to ask.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
softlearning

4 1,152 0.0 Python

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

# see https://github.com/rail-berkeley/softlearning/issues/60

stable-baselines3-contrib

6 427 6.7 Python

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code

# https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/tqc/tqc.py :

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project