RL-ENV - Environments
Ver. 1.7.4 (2023-05-30)
This module provides model classes for environments.
- class mlpro.rl.models_env.Reward(p_type=0, p_value=0)
Bases:
TStamp
Objects of this class represent rewards of environments. The internal structure depends on the reward type. Three types are supported as listed below.
- Parameters:
p_type – Reward type (default: C_TYPE_OVERALL)
p_value – Overall reward value (reward type C_TYPE_OVERALL only)
- C_TYPE_OVERALL = 0
- C_TYPE_EVERY_AGENT = 1
- C_TYPE_EVERY_ACTION = 2
- C_VALID_TYPES = [0, 1, 2]
- get_type()
- is_rewarded(p_agent_id) bool
- set_overall_reward(p_reward) bool
- get_overall_reward()
- add_agent_reward(p_agent_id, p_reward) bool
- get_agent_reward(p_agent_id)
- add_action_reward(p_agent_id, p_action_id, p_reward) bool
- get_action_reward(p_agent_id, p_action_id)
- class mlpro.rl.models_env.FctReward(p_logging=True)
Bases:
Log
Template class for reward functions.
- Parameters:
p_logging – Log level (see class Log for more details).
- C_TYPE = 'Fct Reward'
- class mlpro.rl.models_env.EnvBase(p_mode=0, p_latency: timedelta = None, p_fct_strans: FctSTrans = None, p_fct_reward: FctReward = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip: int = 1, p_state_mapping=None, p_action_mapping=None, p_camera_conf: tuple = (None, None, None), p_visualize: bool = False, p_logging=True)
-
Base class for all environment classes. It defines the interface and elementary properties for an environment in the context of reinforcement learning.
- Parameters:
p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_fct_strans (FctSTrans) – Optional external function for state transition.
p_fct_reward (FctReward) – Optional external function for reward computation.
p_fct_success (FctSuccess) – Optional external function for state evaluation ‘success’.
p_fct_broken (FctBroken) – Optional external function for state evaluation ‘broken’.
p_mujoco_file – Path to XML file for MuJoCo model.
p_frame_skip (int) – Frame to be skipped every step. Default = 1.
p_state_mapping – State mapping if the MLPro state and MuJoCo state have different naming.
p_action_mapping – Action mapping if the MLPro action and MuJoCo action have different naming.
p_use_radian (bool) – Use radian if the action and the state based on radian unit. Default = True.
p_camera_conf (tuple) – Default camera configuration on MuJoCo Simulation (xyz position, elevation, distance).
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see class Log for more details).
- _latency
Latency of the environment.
- Type:
timedelta
- _afct_strans
Internal adaptive state transition function.
- Type:
AFctSTrans
- _afct_reward
Internal adaptive reward function.
- Type:
- _afct_success
Internal adaptive function for state evaluation ‘success’.
- Type:
AFctSuccess
- _afct_broken
Internal adaptive function for state evaluation ‘broken’.
- Type:
AFctBroken
- C_TYPE = 'Environment Base'
- C_REWARD_TYPE = 0
- switch_logging(p_logging)
Sets new log level.
- Parameters:
p_logging – Log level (constant C_LOG_LEVELS contains valid values)
- get_reward_type()
- get_functions()
- get_cycle_limit() int
Returns limit of cycles per training episode. To be implemented in child classes.
- _process_action(p_action: Action) bool
Custom method for state transition. To be implemented in a child class. See method process_action() for further details.
- class mlpro.rl.models_env.Environment(p_mode=0, p_latency: timedelta = None, p_fct_strans: FctSTrans = None, p_fct_reward: FctReward = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip: int = 1, p_state_mapping=None, p_action_mapping=None, p_camera_conf: tuple = (None, None, None), p_visualize: bool = False, p_logging=True)
Bases:
EnvBase
This class represents the central environment model to be reused/inherited in own rl projects.
- Parameters:
p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.
p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_fct_strans (FctSTrans) – Optional external function for state transition.
p_fct_reward (FctReward) – Optional external function for reward computation.
p_fct_success (FctSuccess) – Optional external function for state evaluation ‘success’.
p_fct_broken (FctBroken) – Optional external function for state evaluation ‘broken’.
p_mujoco_file – Path to XML file for MuJoCo model.
p_frame_skip (int) – Frame to be skipped every step. Default = 1.
p_state_mapping – State mapping if the MLPro state and MuJoCo state have different naming.
p_action_mapping – Action mapping if the MLPro action and MuJoCo action have different naming.
p_use_radian (bool) – Use radian if the action and the state based on radian unit. Default = True.
p_camera_conf (tuple) – Default camera configuration on MuJoCo Simulation (xyz position, elevation, distance).
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = True.
p_logging – Log level (see class Log for more details)
- C_TYPE = 'Environment'
- C_CYCLE_LIMIT = 0
- static setup_spaces()
Static template method to set up and return state and action space of environment.
- Returns:
state_space (MSpace) – State space object
action_space (MSpace) – Action space object
- get_cycle_limit() int
Returns limit of cycles per training episode.