Problems making a Tic-Tac-Toe bot

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gym

96 33,873 0.0 Python

A toolkit for developing and comparing reinforcement learning algorithms.

First, you should structure your code to something more standard. Specifically, reinforcement learning is typically modeled as a Markov decision process (MDP). You should write your code to follow this model, as it will make debugging (either from your end or for others) easier as well as allowing your implementation to be compatible with other frameworks. Check out OpenAI Gym to see how they structure things. Obviously if you're just experimenting/learning then this is less critical, but you'll be saving yourself a lot of grief if you get into the habit sooner rather than later.

open_spiel

44 3,999 9.5 C++

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

I'm not sure if this is helpful, but there is a reference implementation in OpenSpiel. If you look at python/examples/breakthrough_dqn.py and change game = "breakthrough" to game = "tic_tac_toe", and then env_configs = {}, this runs DQN and evaluates against random. After 200k episodes, I'm getting average reward 0.995 for player 1 and around 0.9 for player 2. If you average over seats that's 0.95, but this is in [-1,1] so that's 0.975 in [0,1], Keep in mind that's not really the percentage of wins sin it's not separating wins, draws, and losses-- thought you can easily modify that if you wast-- but the win percentage won't be any higher than that. The code base also has Connect Four so you can try that too. And if you really want the real upper-bound, you can solve the game using value iteration to get the optimal policy and then do expectimax over a random policy versus the optimal greedy policy (or estimate it from many Monte Carlo samples). The win percentage may still rise with more training, but no matter how you slice it, you're not going to get to 100% wins :)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project