RL-ENV - Environments

../../../../../../_images/MLPro-RL-Env_class_diagram.drawio.png

Ver. 1.7.4 (2023-05-30)

This module provides model classes for environments.

class mlpro.rl.models_env.Reward(p_type=0, p_value=0)

Bases: TStamp

Objects of this class represent rewards of environments. The internal structure depends on the reward type. Three types are supported as listed below.

Parameters:
  • p_type – Reward type (default: C_TYPE_OVERALL)

  • p_value – Overall reward value (reward type C_TYPE_OVERALL only)

C_TYPE_OVERALL = 0
C_TYPE_EVERY_AGENT = 1
C_TYPE_EVERY_ACTION = 2
C_VALID_TYPES = [0, 1, 2]
get_type()
is_rewarded(p_agent_id) bool
set_overall_reward(p_reward) bool
get_overall_reward()
add_agent_reward(p_agent_id, p_reward) bool
get_agent_reward(p_agent_id)
add_action_reward(p_agent_id, p_action_id, p_reward) bool
get_action_reward(p_agent_id, p_action_id)
class mlpro.rl.models_env.FctReward(p_logging=True)

Bases: Log

Template class for reward functions.

Parameters:

p_logging – Log level (see class Log for more details).

C_TYPE = 'Fct Reward'
compute_reward(p_state_old: State = None, p_state_new: State = None) Reward

Computes a reward based on a predecessor and successor state. Custom method _compute_reward() is called.

Parameters:
  • p_state_old (State) – Predecessor state.

  • p_state_new (State) – Successor state.

Returns:

r – Reward

Return type:

Reward

_compute_reward(p_state_old: State = None, p_state_new: State = None) Reward

Custom reward method. See method compute_reward() for further details.

class mlpro.rl.models_env.EnvBase(p_mode=0, p_latency: timedelta = None, p_fct_strans: FctSTrans = None, p_fct_reward: FctReward = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip: int = 1, p_state_mapping=None, p_action_mapping=None, p_camera_conf: tuple = (None, None, None), p_visualize: bool = False, p_logging=True)

Bases: FctReward, System

Base class for all environment classes. It defines the interface and elementary properties for an environment in the context of reinforcement learning.

Parameters:
  • p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.

  • p_fct_strans (FctSTrans) – Optional external function for state transition.

  • p_fct_reward (FctReward) – Optional external function for reward computation.

  • p_fct_success (FctSuccess) – Optional external function for state evaluation ‘success’.

  • p_fct_broken (FctBroken) – Optional external function for state evaluation ‘broken’.

  • p_mujoco_file – Path to XML file for MuJoCo model.

  • p_frame_skip (int) – Frame to be skipped every step. Default = 1.

  • p_state_mapping – State mapping if the MLPro state and MuJoCo state have different naming.

  • p_action_mapping – Action mapping if the MLPro action and MuJoCo action have different naming.

  • p_use_radian (bool) – Use radian if the action and the state based on radian unit. Default = True.

  • p_camera_conf (tuple) – Default camera configuration on MuJoCo Simulation (xyz position, elevation, distance).

  • p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.

  • p_logging – Log level (see class Log for more details).

_latency

Latency of the environment.

Type:

timedelta

_state

Current state of environment.

Type:

State

_state

Previous state of environment.

Type:

State

_last_action

Last action.

Type:

Action

_last_reward

Last reward.

Type:

Reward

_afct_strans

Internal adaptive state transition function.

Type:

AFctSTrans

_afct_reward

Internal adaptive reward function.

Type:

AFctReward

_afct_success

Internal adaptive function for state evaluation ‘success’.

Type:

AFctSuccess

_afct_broken

Internal adaptive function for state evaluation ‘broken’.

Type:

AFctBroken

C_TYPE = 'Environment Base'
C_REWARD_TYPE = 0
switch_logging(p_logging)

Sets new log level.

Parameters:

p_logging – Log level (constant C_LOG_LEVELS contains valid values)

get_reward_type()
get_last_reward() Reward
get_functions()
get_cycle_limit() int

Returns limit of cycles per training episode. To be implemented in child classes.

_process_action(p_action: Action) bool

Custom method for state transition. To be implemented in a child class. See method process_action() for further details.

compute_reward(p_state_old: State = None, p_state_new: State = None) Reward

Computes a reward for the state transition, given by two successive states. The reward computation itself is carried out either by a custom implementation in method _compute_reward() or by an embedded adaptive function.

Parameters:
  • p_state_old (State) – Optional state before transition. If None the internal previous state of the environment is used.

  • p_state_new (State) – Optional tate after transition. If None the internal current state of the environment is used.

Returns:

Reward object.

Return type:

Reward

class mlpro.rl.models_env.Environment(p_mode=0, p_latency: timedelta = None, p_fct_strans: FctSTrans = None, p_fct_reward: FctReward = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip: int = 1, p_state_mapping=None, p_action_mapping=None, p_camera_conf: tuple = (None, None, None), p_visualize: bool = False, p_logging=True)

Bases: EnvBase

This class represents the central environment model to be reused/inherited in own rl projects.

Parameters:
  • p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.

  • p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.

  • p_fct_strans (FctSTrans) – Optional external function for state transition.

  • p_fct_reward (FctReward) – Optional external function for reward computation.

  • p_fct_success (FctSuccess) – Optional external function for state evaluation ‘success’.

  • p_fct_broken (FctBroken) – Optional external function for state evaluation ‘broken’.

  • p_mujoco_file – Path to XML file for MuJoCo model.

  • p_frame_skip (int) – Frame to be skipped every step. Default = 1.

  • p_state_mapping – State mapping if the MLPro state and MuJoCo state have different naming.

  • p_action_mapping – Action mapping if the MLPro action and MuJoCo action have different naming.

  • p_use_radian (bool) – Use radian if the action and the state based on radian unit. Default = True.

  • p_camera_conf (tuple) – Default camera configuration on MuJoCo Simulation (xyz position, elevation, distance).

  • p_visualize (bool) – Boolean switch for env/agent visualisation. Default = True.

  • p_logging – Log level (see class Log for more details)

C_TYPE = 'Environment'
C_CYCLE_LIMIT = 0
static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

  • state_space (MSpace) – State space object

  • action_space (MSpace) – Action space object

get_cycle_limit() int

Returns limit of cycles per training episode.