Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Hi, These days, there are lots of implementation without next state and done for memory like drq-v2 official implementation. But, I have a question about is it okay to throw out "done" in replay buffer. In my point of view, there are some problems about done related signal. or did I read implementation code wrong?
Sometimes it is more than okay, it may be necessary. For instance in tmrl we do exactly that, because we are in a partially observable environment where we cannot say whether the next state will be terminal or not, and where what we try to actually optimize is an infinite sum of discounted rewards.