splendor.splendor.gym.envs package

Submodules

splendor.splendor.gym.envs.actions module

Combinatorially define all possible actions in the game.

class splendor.splendor.gym.envs.actions.Action(type_enum: ActionEnum, collected_gems: dict[Literal['black', 'red', 'yellow', 'green', 'blue', 'white'], int] | None = None, returned_gems: dict[Literal['black', 'red', 'yellow', 'green', 'blue', 'white'], int] | None = None, position: CardPosition | None = None, noble_index: int | None = None)[source]

Bases: object

Represent an action as a dataclass with a more comfortable API.

collected_gems: dict[Literal['black', 'red', 'yellow', 'green', 'blue', 'white'], int] | None = None
noble_index: int | None = None
position: CardPosition | None = None
returned_gems: dict[Literal['black', 'red', 'yellow', 'green', 'blue', 'white'], int] | None = None
classmethod to_action_element(action: CollectAction | ReserveAction | BuyAction, state: SplendorState, agent_index: int) ActionTypeVar[source]

Convert an action in SplendorGameRule format to Action.

type_enum: ActionEnum
class splendor.splendor.gym.envs.actions.ActionEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Enum for all action types.

BUY_AVAILABLE = 5
BUY_RESERVE = 6
COLLECT_DIFF = 3
COLLECT_SAME = 2
PASS = 1
RESERVE = 4
class splendor.splendor.gym.envs.actions.CardPosition(tier: int, card_index: int, reserved_index: int)[source]

Bases: object

dataclass for representing where a card is located on the board.

card_index: int
reserved_index: int
tier: int

splendor.splendor.gym.envs.splendor_env module

Implementation of Splendor as a gym.Env.

class splendor.splendor.gym.envs.splendor_env.SplendorEnv(agents: list[Agent], shuffle_turns: bool = True, fixed_turn: int | None = None, render_mode: str | None = None)[source]

Bases: Env

Custom gym.Env for the game Splendor.

Create an array of shape (len(ALL_ACTIONS),) whose values are 0’s or 1’s. If the at the i’th index the mask[i] == 1 then the i’th action is legal, otherwise it’s illegal (The legal actions are based on SplendorGameRule).

render() None[source]

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • “rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • “ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • “rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(*, seed: int | None = None, options: dict | None = None) tuple[ndarray[Any, dtype[_ScalarType_co]], dict[str, int]][source]

Reset the environment - Create a new game.

Parameters:
  • seed – the seed to use in np_random.

  • options – ignored, both this parameter & seed are passed in order to comply with gym.Env signature.

Returns:

the initial state of a new game and the id (turn) of my agent.

Note:

the order of turns in randomly chosen each time reset is called.

property state: SplendorState

return the current game state itself, not the feature vector of that state.

step(action: int) tuple[ndarray[Any, dtype[_ScalarType_co]], float, bool, bool, dict][source]

Run one time-step of the environment’s dynamics.

Param:

which action to take.

Returns:

The new state (successor), the reward given, a flag indicating whether or not the game ended, truncated (will be ignored), additional information (will be ignored).

Note:

this method returns 2 redundant variables (truncated & info) only in order to comply with gym.Env.step signature.

property turn: int

return the turn of the player.

splendor.splendor.gym.envs.utils module

Collection of useful utility functions used in the implementation of SplendorEnv.

splendor.splendor.gym.envs.utils.build_action(action_index: int, state: SplendorState, agent_index: int) dict[source]

Construct the action to be taken from it’s action index in the ALL_ACTION list.

Returns:

the corresponding action to the action_index, in the format required by SplendorGameRule.

Note:

when using this function for building a buying action the function doesn’t takes into account the wildcard gems (yellow) and the owned cards for the conclusion of the returned_gems - this can lead to a broken state where a player have a negative amount of gems…

splendor.splendor.gym.envs.utils.create_action_mapping(legal_actions: list[CollectAction | ReserveAction | BuyAction], state: SplendorState, agent_index: int) dict[int, CollectAction | ReserveAction | BuyAction][source]

Create the mapping between action indices to legal actions. This would be in use by both SplendorEnv & by the PPO agent.

Create an array of shape (len(ALL_ACTIONS),) whose values are 0’s or 1’s. If the at the i’th index the mask[i] == 1 then the i’th action is legal, otherwise it’s illegal.

Module contents

Import SplendorEnv whenever someone import splendor.Splendor.gym.envs.

class splendor.splendor.gym.envs.SplendorEnv(agents: list[Agent], shuffle_turns: bool = True, fixed_turn: int | None = None, render_mode: str | None = None)[source]

Bases: Env

Custom gym.Env for the game Splendor.

Create an array of shape (len(ALL_ACTIONS),) whose values are 0’s or 1’s. If the at the i’th index the mask[i] == 1 then the i’th action is legal, otherwise it’s illegal (The legal actions are based on SplendorGameRule).

render() None[source]

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • “rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • “ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • “rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(*, seed: int | None = None, options: dict | None = None) tuple[ndarray[Any, dtype[_ScalarType_co]], dict[str, int]][source]

Reset the environment - Create a new game.

Parameters:
  • seed – the seed to use in np_random.

  • options – ignored, both this parameter & seed are passed in order to comply with gym.Env signature.

Returns:

the initial state of a new game and the id (turn) of my agent.

Note:

the order of turns in randomly chosen each time reset is called.

property state: SplendorState

return the current game state itself, not the feature vector of that state.

step(action: int) tuple[ndarray[Any, dtype[_ScalarType_co]], float, bool, bool, dict][source]

Run one time-step of the environment’s dynamics.

Param:

which action to take.

Returns:

The new state (successor), the reward given, a flag indicating whether or not the game ended, truncated (will be ignored), additional information (will be ignored).

Note:

this method returns 2 redundant variables (truncated & info) only in order to comply with gym.Env.step signature.

property turn: int

return the turn of the player.