RL-ENV - Environments

../../../../../../_images/MLPro-RL-Env_class_diagram.drawio.png

Ver. 1.7.4 (2023-05-30)

This module provides model classes for environments.

class mlpro.rl.models_env.Reward(p_type=0, p_value=0)

Bases: TStamp

Objects of this class represent rewards of environments. The internal structure depends on the reward type. Three types are supported as listed below.

Parameters:

p_type – Reward type (default: C_TYPE_OVERALL)
p_value – Overall reward value (reward type C_TYPE_OVERALL only)

C_TYPE_OVERALL = 0

C_TYPE_EVERY_AGENT = 1

C_TYPE_EVERY_ACTION = 2

C_VALID_TYPES = [0, 1, 2]

get_type()

is_rewarded(p_agent_id) → bool

set_overall_reward(p_reward) → bool

get_overall_reward()

add_agent_reward(p_agent_id, p_reward) → bool

get_agent_reward(p_agent_id)

add_action_reward(p_agent_id, p_action_id, p_reward) → bool

get_action_reward(p_agent_id, p_action_id)

class mlpro.rl.models_env.FctReward(p_logging=True)

Bases: Log

Template class for reward functions.

Parameters:: p_logging – Log level (see class Log for more details).

C_TYPE = 'Fct Reward'

compute_reward(p_state_old: State = None, p_state_new: State = None) → Reward

Computes a reward based on a predecessor and successor state. Custom method _compute_reward() is called.

Parameters:

p_state_old (State) – Predecessor state.
p_state_new (State) – Successor state.

Returns:

r – Reward

Return type:

Reward

_compute_reward(p_state_old: State = None, p_state_new: State = None) → Reward: Custom reward method. See method compute_reward() for further details.

class mlpro.rl.models_env.EnvBase(p_mode=0, p_latency: timedelta = None, p_fct_strans: FctSTrans = None, p_fct_reward: FctReward = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip: int = 1, p_state_mapping=None, p_action_mapping=None, p_camera_conf: tuple = (None, None, None), p_visualize: bool = False, p_logging=True)

Bases: FctReward, System

Base class for all environment classes. It defines the interface and elementary properties for an environment in the context of reinforcement learning.

Parameters:

p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_fct_strans (FctSTrans) – Optional external function for state transition.
p_fct_reward (FctReward) – Optional external function for reward computation.
p_fct_success (FctSuccess) – Optional external function for state evaluation ‘success’.
p_fct_broken (FctBroken) – Optional external function for state evaluation ‘broken’.
p_mujoco_file – Path to XML file for MuJoCo model.
p_frame_skip (int) – Frame to be skipped every step. Default = 1.
p_state_mapping – State mapping if the MLPro state and MuJoCo state have different naming.
p_action_mapping – Action mapping if the MLPro action and MuJoCo action have different naming.
p_use_radian (bool) – Use radian if the action and the state based on radian unit. Default = True.
p_camera_conf (tuple) – Default camera configuration on MuJoCo Simulation (xyz position, elevation, distance).
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see class Log for more details).

_latency

Latency of the environment.

Type:: timedelta

_state

Current state of environment.

Type:: State

_state

Previous state of environment.

Type:: State

_last_action

Last action.

Type:: Action

_last_reward

Last reward.

Type:: Reward

_afct_strans

Internal adaptive state transition function.

Type:: AFctSTrans

_afct_reward

Internal adaptive reward function.

Type:: AFctReward

_afct_success

Internal adaptive function for state evaluation ‘success’.

Type:: AFctSuccess

_afct_broken

Internal adaptive function for state evaluation ‘broken’.

Type:: AFctBroken

C_TYPE = 'Environment Base'

C_REWARD_TYPE = 0

switch_logging(p_logging)

Sets new log level.

Parameters:: p_logging – Log level (constant C_LOG_LEVELS contains valid values)

get_reward_type()

get_last_reward() → Reward

get_functions()

get_cycle_limit() → int: Returns limit of cycles per training episode. To be implemented in child classes.

_process_action(p_action: Action) → bool: Custom method for state transition. To be implemented in a child class. See method process_action() for further details.

compute_reward(p_state_old: State = None, p_state_new: State = None) → Reward

Computes a reward for the state transition, given by two successive states. The reward computation itself is carried out either by a custom implementation in method _compute_reward() or by an embedded adaptive function.

Parameters:

p_state_old (State) – Optional state before transition. If None the internal previous state of the environment is used.
p_state_new (State) – Optional tate after transition. If None the internal current state of the environment is used.

Returns:

Reward object.

Return type:

Reward

class mlpro.rl.models_env.Environment(p_mode=0, p_latency: timedelta = None, p_fct_strans: FctSTrans = None, p_fct_reward: FctReward = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip: int = 1, p_state_mapping=None, p_action_mapping=None, p_camera_conf: tuple = (None, None, None), p_visualize: bool = False, p_logging=True)

Bases: EnvBase

This class represents the central environment model to be reused/inherited in own rl projects.

Parameters:

p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.
p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_fct_strans (FctSTrans) – Optional external function for state transition.
p_fct_reward (FctReward) – Optional external function for reward computation.
p_fct_success (FctSuccess) – Optional external function for state evaluation ‘success’.
p_fct_broken (FctBroken) – Optional external function for state evaluation ‘broken’.
p_mujoco_file – Path to XML file for MuJoCo model.
p_frame_skip (int) – Frame to be skipped every step. Default = 1.
p_state_mapping – State mapping if the MLPro state and MuJoCo state have different naming.
p_action_mapping – Action mapping if the MLPro action and MuJoCo action have different naming.
p_use_radian (bool) – Use radian if the action and the state based on radian unit. Default = True.
p_camera_conf (tuple) – Default camera configuration on MuJoCo Simulation (xyz position, elevation, distance).
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = True.
p_logging – Log level (see class Log for more details)

C_TYPE = 'Environment'

C_CYCLE_LIMIT = 0

static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

state_space (MSpace) – State space object
action_space (MSpace) – Action space object

get_cycle_limit() → int: Returns limit of cycles per training episode.