Our great sponsors
-
open_spiel
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
First, you should structure your code to something more standard. Specifically, reinforcement learning is typically modeled as a Markov decision process (MDP). You should write your code to follow this model, as it will make debugging (either from your end or for others) easier as well as allowing your implementation to be compatible with other frameworks. Check out OpenAI Gym to see how they structure things. Obviously if you're just experimenting/learning then this is less critical, but you'll be saving yourself a lot of grief if you get into the habit sooner rather than later.
I'm not sure if this is helpful, but there is a reference implementation in OpenSpiel. If you look at python/examples/breakthrough_dqn.py and change game = "breakthrough" to game = "tic_tac_toe", and then env_configs = {}, this runs DQN and evaluates against random. After 200k episodes, I'm getting average reward 0.995 for player 1 and around 0.9 for player 2. If you average over seats that's 0.95, but this is in [-1,1] so that's 0.975 in [0,1], Keep in mind that's not really the percentage of wins sin it's not separating wins, draws, and losses-- thought you can easily modify that if you wast-- but the win percentage won't be any higher than that. The code base also has Connect Four so you can try that too. And if you really want the real upper-bound, you can solve the game using value iteration to get the optimal policy and then do expectimax over a random policy versus the optimal greedy policy (or estimate it from many Monte Carlo samples). The win percentage may still rise with more training, but no matter how you slice it, you're not going to get to 100% wins :)
Related posts
- What projects or open-source contributions can impress Jane Street recruiters for a Quant SWE role ?
- I want to build a learning agent for a combinatorial game
- minimax for imperfect-information turn-games?
- OpenSpiel 1.3 Released!
- What's a good OpenAI Gym Environment for applying centralized multi-agent learning using expected SARSA with tile coding?