-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
In simpler setups (such as simulating a walking ant in MuJoCo), you can feasibly get away with a reward as simple as giving the agent positive reward for moving towards some goal, giving the agent a small, negative reward for not making any forward progress, and giving the agent a large, negative reward for moving away from the goal. The agent simply knows (a) the current angles of it's joints (which it can apply force to) and (b) it's current position relative to the goal. Through a lot of training with these simple rules, the agent can learn to walk towards the goal. Note that it doesn't explicitly learn to walk, it just figures out how to actuate it's joints to move towards the goal as quickly as possible, which, as it turns out, is walking (or, in the case of the example GIF I linked to, more like skipping).
Just saw that our video was posted here. For people interested in the research, here is the project website with the research paper: https://danijar.com/daydreamer
Related posts
-
Reinforcement learning or computer vision
-
Mastering Diverse Domains through World Models - DreamerV3 - Deepmind 2023 - First algorithm to collect diamonds in Minecraft from scratch without human data or curricula! Now with github links!
-
Sources of Actor Gradients
-
PyDreamer: model-based RL written in PyTorch + integrations with DM Lab and MineRL environments
-
Google AI, DeepMind And The University of Toronto Introduce DreamerV2, The First Reinforcement Learning (RL) Agent That Outperforms Humans on The Atari Benchmark