Developing an Agent
In order for the game to properly load your agent one must install the agent, there are several ways to do so:
create a new agent within
src/splendor/agents
and when installing splendor your agent will be installed as well. (i.e. when invokingpip install .
)create a new package and develop your agent there and then install it.
create a new agent within
src/splendor/agents
and ONLY DURING DEVELOPMENT install splendor by usingpip install -e .
(instead of thepip install .
) which allows you to edit and adjust your agent as you please without the necessity to re-install the package.
Training Our Agents:
Training The Genetic Algorithm Agent:
In order to train the genetic algorithm agent with the following hyper-parameters:
Specify the population size in each generation to be 24 (should be a multiple of 12).
Train for 20 generations.
Fix the mutation rate chance to be 0.1(%).
Use a fixed random seed.
Use the following command:
evolve --population-size 24 --generations 20 --mutation-rate 0.1 --seed 1234
Training The PPO Agent:
In order to train the PPO agent you should run the following command:
ppo
This command will train the PPO agent with the default training hyper-parameters.
ppo --device cuda --working-dir runs --transfer-learning --opponent minimax
This command will use GPU during it’s training, it will use the installed weights as initialization of the network
and the PPO will be trained against MiniMax. Furthermore all the generated files (weights stored in .pth
files and stats.csv
) will be generated within the directory runs/
.
There are multiple available architectures for the neural network to be used by the PPO agent, for instance:
MLP - Multi-Layered Perceptron, also known as Fully-Connected Feed-Forward network.
Self-Attention and then MLP.
GRU (Gated Recurrent Unit) and then MLP.
LSTM (Long Short-Term Memory) and then MLP.
By default the MLP architecture will be used, however you can decide to train the PPO with a different architecture via the --architecture
or via it’s shortcuts -a
and --arch
, here is an example:
ppo --device cpu --opponent random --architecture gru
There are also multiple opponents available to be trained or evaluated against, such as:
random
minimax
itself
PPO (with MLP)
PPO (with Self-Attention)
PPO (with GRU)
PPO (with LSTM)
SplendorEnv
- an OpenAI gym
compatible simulator for the game Splendor
gym or currently gymnasium (Previously maintained by OpenAI and now by the Farama Foundation) is a framework providing an API standard for reinforcement learning with a diverse collection of reference environments.
We’ve made a custom gym.Env
and registered it as one of gym
environments. This would come in handy when training agent such as DQN
or PPO.
How to create an instance of SplendorEnv
:
import
gymnasium
-import gymnasium as gym
.registering
SplendorEnv
togym
-import splendor.Splendor.gym
define the opponents:
When creating an instance of SplendorEnv
you should tell it which
agents will be used as opponents to you (the one who uses the env.). For
the following example we’ll use a single random agent as an opponent.
from splendor.agents.generic.random import myAgent
opponents = [myAgent(0)]
creating the environment:
env = gym.make("splendor-v1", agents=opponents)
Custom features of SplendorEnv
every call to
env.step(action)
simulate (by usingSplendorGameRule
) the turns of all the opponents.when calling
env.reset()
SplendorEnv
will return the feature vector of the initial state AND the turn of our agent via the second variable (thedict
) which will have a key calledmy_turn
.SplendorEnv
have several custom properties:state
- the actualSplendorState
- not the feature vector.my_turn
- the turn of the agent, same as the value returned byenv.reset()
.
SplendorEnv
have several custom methods:get_legal_actions_mask
- a method for getting a mask vector which masks all the illegal action ofsplendor.Splendor.gym.envs.actions.ALL_ACTIONS
.
You can access those like this:
env.unwrapped.my_turn