Gymnasium

Ver. 1.0.2 (2023-08-21)

This module provides wrapper classes for Gym environments from Farama-Foundation Gymnasium.

See also: https://github.com/Farama-Foundation/Gymnasium

class mlpro.wrappers.gymnasium.WrEnvGYM2MLPro(p_gym_env, p_state_space: MSpace = None, p_action_space: MSpace = None, p_seed=None, p_visualize: bool = True, p_logging=True)

Bases: Wrapper, Environment

This class is a ready to use wrapper class for Gym environments. Objects of this type can be treated as an environment object. Encapsulated gym environment must be compatible to class gym.Env.

Parameters:
  • p_gym_env (Env) – Gym environment object

  • p_state_space (MSpace) – Optional external state space object that meets the state space of the Gym environment

  • p_action_space (MSpace) – Optional external action space object that meets the action space of the Gym environment

  • p_visualize (bool) – Boolean switch for env/agent visualisation. Default = True.

  • p_logging – Log level (see constants of class Log). Default = Log.C_LOG_ALL.

C_TYPE = 'Wrapper Gym2MLPro'
C_WRAPPED_PACKAGE = 'gymnasium'
C_MINIMUM_VERSION = '0.28.1'
C_PLOT_ACTIVE: bool = True
_reduce_state(p_state: dict, p_path: str, p_os_sep: str, p_filename_stub: str)

The embedded Gym env itself can’t be pickled due to it’s dependencies on Pygame. That’s why the current env instance needs to be removed before pickling the object.

See also: https://stackoverflow.com/questions/52336196/how-to-save-object-using-pygame-surfaces-to-file-using-pickle

_complete_state(p_path: str, p_os_sep: str, p_filename_stub: str)

Custom method to complete the object state (=self) from external data sources. This method is called by standard method __setstate__() during unpickling the object from an external file.

Parameters:
  • p_path (str) – Path of the object pickle file (and further optional related files)

  • p_os_sep (str) – OS-specific path separator.

  • p_filename_stub (str) – Filename stub to be used for further optional custom data files

static recognize_space(p_gym_space) ESpace

Detecting a gym space and transform it to MLPro space. Hence, the transformed space can be directly compatible in MLPro.

Parameters:

p_gym_space (container spaces (Tuple and Dict)) – Spaces are crucially used in Gym to define the format of valid actions and observations.

Returns:

space – MLPro compatible space.

Return type:

ESpace

static setup_spaces()

To setup spaces. To be optionally defined by the users.

_reset(p_seed=None)

Custom method to reset the environment to an initial/defined state.

Parameters:

p_seed (int) – Seed parameter for an internal random generator. Default = None.

simulate_reaction(p_state: State, p_action: Action) State

Simulates a state transition based on a state and an action. The simulation step itself is carried out either by an internal custom implementation in method _simulate_reaction() or by an embedded adaptive function.

Parameters:
  • p_state (State) – Current state.

  • p_action (Action) – Action.

Returns:

state – Subsequent state after transition

Return type:

State

compute_reward(p_state_old: State = None, p_state_new: State = None) Reward

Computes a reward for the state transition, given by two successive states. The reward computation itself is carried out either by a custom implementation in method _compute_reward() or by an embedded adaptive function.

Parameters:
  • p_state_old (State) – Optional state before transition. If None the internal previous state of the environment is used.

  • p_state_new (State) – Optional tate after transition. If None the internal current state of the environment is used.

Returns:

Reward object.

Return type:

Reward

compute_success(p_state: State) bool

Assesses the given state whether it is a ‘success’ state. Assessment is carried out either by a custom implementation in method _compute_success() or by an embedded adaptive function.

Parameters:

p_state (State) – State to be assessed.

Returns:

True, if the given state is a ‘success’ state. False otherwise.

Return type:

bool

compute_broken(p_state: State) bool

Assesses the given state whether it is a ‘broken’ state. Assessment is carried out either by a custom implementation in method _compute_broken() or by an embedded adaptive function.

Parameters:

p_state (State) – State to be assessed.

Returns:

True, if the given state is a ‘broken’ state. False otherwise.

Return type:

bool

init_plot(p_figure: Figure = None, p_plot_settings: list = Ellipsis, p_plot_depth: int = 0, p_detail_level: int = 0, p_step_rate: int = 0, **p_kwargs)

Initializes the plot functionalities of the class.

Parameters:
  • p_figure (Matplotlib.figure.Figure, optional) – Optional MatPlotLib host figure, where the plot shall be embedded. The default is None.

  • p_plot_settings (PlotSettings) – Optional plot settings. If None, the default view is plotted (see attribute C_PLOT_DEFAULT_VIEW).

update_plot(**p_kwargs)

Updating the actual plot, deployed by render functionality from Gym.

get_cycle_limit()

To obtain the information regarding the cycle limit from the environment.

Returns:

the number of the cycle limit.

Return type:

float

class mlpro.wrappers.gymnasium.WrEnvMLPro2GYM(p_mlpro_env: Environment, p_state_space: MSpace = None, p_action_space: MSpace = None, p_render_mode: str = None, p_logging=True)

Bases: Wrapper, Env

This class is a ready to use wrapper class for MLPro to Gym environments. Objects of this type can be treated as an gym.Env object. Encapsulated MLPro environment must be compatible to class Environment.

Parameters:
  • p_mlpro_env (Environment) – MLPro’s Environment object

  • p_state_space (MSpace) – Optional external state space object that meets the state space of the MLPro environment

  • p_action_space (MSpace) – Optional external action space object that meets the state space of the MLPro environment

  • p_render_mde (str) – To allow the user to specify render_mode handled by the environment, for instance, ‘human’, ‘rgb_array’, and ‘single_rgb_array’. Default = None.

  • p_logging – Log level (see constants of class Log). Default = Log.C_LOG_ALL.

C_TYPE = 'Wrapper MLPro2Gym'
C_WRAPPED_PACKAGE = 'gymnasium'
C_MINIMUM_VERSION = '0.28.1'
metadata: dict[str, Any] = {'render.modes': ['human', 'rgb_array']}
static recognize_space(p_mlpro_space)

Detecting a MLPro space and transform it to gym space. Hence, the transformed space can be directly compatible in gym.

Parameters:

p_mlpro_space (ESpace) – MLPro compatible space.

Returns:

space – Spaces are crucially used in Gym to define the format of valid actions and observations.

Return type:

container spaces (Tuple and Dict)

step(action)

To execute one time step within the environment.

Parameters:

action (ActType) – an action provided by the agent.

Returns:

  • obs (object) – This will be an element of the environment’s observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.

  • reward.get_overall_reward() (float) – The amount of reward returned as a result of taking the action.

  • terminated (bool) – whether a terminal state (as defined under the MDP of the task) is reached. In this case further step() calls could return undefined results.

  • truncated (bool) – whether a truncation condition outside the scope of the MDP is satisfied. Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.

  • info (dict) – It contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.

reset(seed=None, options=None)

Resets the environment to an initial state and returns the initial observation. This is for new gym version.

Parameters:
  • seed (int, optional) – The seed that is used to initialize the environment’s PRNG. If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action. The default is None.

  • return_info (bool) – If true, return additional information along with initial observation. This info should be analogous to the info returned in step(). The default is False.

  • options (dict, optional) – Additional information to specify how the environment is reset (optional, depending on the specific environment). The default is None.

Returns:

  • obs (object) – This will be an element of the environment’s observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.

  • info (dict) – It contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.

render(mode='human')

Compute the render frames as specified by render_mode attribute during initialization of the environment.

Parameters:

mode (str, optional) – To allow the user to specify render_mode handled by the environment, for instance, ‘human’, ‘rgb_array’, and ‘single_rgb_array’. The default is ‘human’.

Returns:

Rendering is successful or not.

Return type:

bool

close()

Override close in your subclass to perform any necessary cleanup. Environments will automatically close() themselves when garbage collected or when the program exits.

Cross References