REINFORCEMENT LEARNING • AI GAME AGENT

Tic-Tac-Toe AI

Reinforcement Learning agent for Tic-Tac-Toe.
Implemented with DQN and DDQN to learn optimal strategies through self-play. Formed the basis of my undergraduate thesis.

Overview

This project explored Deep Reinforcement Learning applied to the classic Tic-Tac-Toe game. The goal was to build an AI agent that learns from scratch via self-play, without being explicitly programmed with strategies. The agent leveraged DQN (Deep Q-Network) and DDQN (Double DQN) techniques to stabilize learning and avoid Q-value overestimation.

Technical Highlights

Environment: Custom Tic-Tac-Toe board state simulator
Algorithms: DQN, Double DQN
Training: Self-play episodes with ε-greedy exploration
Loss Function: Mean Squared Error (MSE) for Q-value updates
Replay Buffer: Experience replay with target network updates
Frameworks: Python, TensorFlow, NumPy

Results

The agent successfully learned optimal Tic-Tac-Toe strategies after extensive training. With DDQN, stability improved compared to baseline DQN, reducing over-optimistic moves. The trained agent achieves near-perfect play against humans and random opponents, consistently forcing a win or draw.

Challenges & Solutions

The main challenge was exploration vs. exploitation. Early models either over-explored random moves or over-fitted to short-term patterns. Integrating ε-greedy exploration decay and target network updates balanced learning. Using DDQN further stabilized training by addressing Q-value overestimation.