splendor.agents.our_agents.ppo.ppo_rnn.lstm package

Submodules

splendor.agents.our_agents.ppo.ppo_rnn.lstm.constants module

Constants for PPO with LSTM.

splendor.agents.our_agents.ppo.ppo_rnn.lstm.network module

PPO with LSTM (Long Short-Term Memory) implementation.

class splendor.agents.our_agents.ppo.ppo_rnn.lstm.network.PpoLstm(input_dim: int, output_dim: int, hidden_layers_dims: list[int] | None = None, dropout: float = 0.2, hidden_state_dim: int = 64, recurrent_layers_num: int = 1)[source]

Bases: RecurrentPPO

Implementation of PPO network architecture using a GRU.

forward(x: Float[Tensor, 'batch sequence features'] | Float[Tensor, 'batch features'] | Float[Tensor, 'features'], action_mask: Float[Tensor, 'batch actions'] | Float[Tensor, 'actions'], hidden_state: tuple[Float[Tensor, 'batch num_layers hidden_dim'], Float[Tensor, 'batch num_layers hidden_dim']], *args, **kwargs) → tuple[Float[Tensor, 'batch actions'], Float[Tensor, 'batch 1'], Float[Tensor, 'batch num_layers hidden_dim'], Float[Tensor, 'batch num_layers hidden_dim']][source]

Pass input through the network to gain predictions.

Parameters:

x – the input to the network. expected shape: one of the following: (features,) or (batch_size, features) or (batch_size, sequance_length, features).
action_mask – a binary masking tensor, 1’s signals a valid action and 0’s signals an invalid action. expected shape: (actions,) or (batch_size, actions). where actions are equal to len(ALL_ACTIONS) which comes from Engine.Splendor.gym.envs.actions
hidden_state – hidden state of the recurrent unit. expected shape: (batch_size, num_layers, hidden_state_dim) or (num_layers, hidden_state_dim).

Returns:

the actions probabilities, the value estimate and the next hidden state + cell state.

init_hidden_state(device: device) → tuple[Float[Tensor, 'num_layers hidden_dim'], Float[Tensor, 'num_layers hidden_dim']][source]: return the initial hidden state & initial cell state to be used.

splendor.agents.our_agents.ppo.ppo_rnn.lstm.ppo_agent module

Implementation of PPO agent with LSTM.

class splendor.agents.our_agents.ppo.ppo_rnn.lstm.ppo_agent.PpoLstmAgent(_id: int, load_net: bool = True)[source]

Bases: PPOAgentBase

PPO agent with LSTM.

SelectAction(actions: list[CollectAction | ReserveAction | BuyAction], game_state: SplendorState, game_rule: SplendorGameRule) → CollectAction | ReserveAction | BuyAction[source]: select an action to play from the given actions.

load() → PPOBase[source]: load the weights of the network.

load_policy(policy: Module) → None[source]: Use a given policy as the agent’s network policy.

splendor.agents.our_agents.ppo.ppo_rnn.lstm.ppo_agent.myAgent: alias of PpoLstmAgent

splendor.agents.our_agents.ppo.ppo_rnn.lstm package

Submodules

splendor.agents.our_agents.ppo.ppo_rnn.lstm.constants module

splendor.agents.our_agents.ppo.ppo_rnn.lstm.network module

splendor.agents.our_agents.ppo.ppo_rnn.lstm.ppo_agent module

Module contents