RL-ENV-ADA - Environment Models

../../../../../../_images/MLPro-RL-Env-Ada_class_diagram.drawio.png

Ver. 1.1.0 (2023-04-03)

This module provides model classes for adaptive environment models.

class mlpro.rl.models_env_ada.AFctReward(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)

Bases: AFctBase, FctReward

Online adaptive version of a reward function. See parent classes for further details.

C_TYPE = 'AFct Reward'
_setup_spaces(p_state_space: MSpace, p_action_space: MSpace, p_input_space: MSpace, p_output_space: MSpace)

Custom method to set up the input and output space of the embedded adaptive function. Use the method add_dimension() of the empty spaces p_input_space and p_output_space to enrich them with suitable dimensions.

Parameters:
  • p_state_space (MSpace) – State space of an environment respectively observation space of an agent.

  • p_action_space (MSpace) – Action space of an environment or agent.

  • p_input_space (MSpace) – Empty input space of embedded adaptive function to be enriched with dimension.

  • p_output_space (MSpace) – Empty output space of embedded adaptive function to be enriched with dimension.

_compute_reward(p_state_old: State = None, p_state_new: State = None) Reward

Custom reward method. See method compute_reward() for further details.

_adapt(p_state: State, p_state_new: State, p_reward: Reward) bool

Triggers adaptation of the embedded adaptive function.

Parameters:
  • p_state (State) – Previous state.

  • p_state_new (State) – New state.

  • p_reward (Reward) – Setpoint reward.

Returns:

adapted – True, if something was adapted. False otherwise.

Return type:

bool

class mlpro.rl.models_env_ada.SARSElement(p_state: State, p_action: Action, p_reward: Reward, p_state_new: State)

Bases: BufferElement

Element of a SARSBuffer.

class mlpro.rl.models_env_ada.SARSBuffer(p_size=1)

Bases: Buffer

State-Action-Reward-State-Buffer in dictionary.

class mlpro.rl.models_env_ada.EnvModel(p_observation_space: MSpace, p_action_space: MSpace, p_latency: timedelta, p_afct_strans: AFctSTrans, p_afct_reward: AFctReward = None, p_afct_success: AFctSuccess = None, p_afct_broken: AFctBroken = None, p_ada: bool = True, p_init_states: State = None, p_visualize: bool = False, p_logging=True)

Bases: EnvBase, Model

Environment model class as part of a model-based agent.

Parameters:
  • p_observation_space (MSpace) – Observation space of related agent.

  • p_action_space (MSpace) – Action space of related agent.

  • p_latency (timedelta) – Latency of related environment.

  • p_afct_strans (AFctSTrans) – Mandatory external adaptive function for state transition.

  • p_afct_reward (AFctReward) – Optional external adaptive function for reward computation.

  • p_afct_success (AFctSuccess) – Optional external adaptive function for state assessment ‘success’.

  • p_afct_broken (AFctBroken) – Optional external adaptive function for state assessment ‘broken’.

  • p_ada (bool) – Boolean switch for adaptivity.

  • p_init_states (State) – Initial state of the env models.

  • p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.

  • p_logging – Log level (see class Log for more details).

C_TYPE = 'EnvModel'
C_NAME = 'Default'
static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

  • state_space (MSpace) – State space object

  • action_space (MSpace) – Action space object

_init_hyperparam(**p_par)

Implementation specific hyperparameters can be added here. Please follow these steps: a) Add each hyperparameter as an object of type HyperParam to the internal hyperparameter

space object self._hyperparam_space

  1. Create hyperparameter tuple and bind to self._hyperparam_tuple

  2. Set default value for each hyperparameter

Parameters:

p_par (Dict) – Further model specific hyperparameters, that are passed through constructor.

_reset(p_seed=None)

Custom method to reset the system to an initial/defined state. Use method _set_status() to set the state.

Parameters:

p_seed (int) – Seed parameter for an internal random generator

get_cycle_limit() int

Returns limit of cycles per training episode.

_process_action(p_action: Action) bool

Custom method for state transition. To be implemented in a child class. See method process_action() for further details.

switch_adaptivity(p_ada: bool)

Switches adaption functionality on/off.

Parameters:

p_ada (bool) – Boolean switch for adaptivity

adapt(**p_kwargs) bool

Reactivated adaptation mechanism. See method Model.adapt() for further details.

_adapt(p_sars_elem: SARSElement) bool

Adapts the environment model based on State-Action-Reward-State (SARS) data.

Parameters:

p_sars_elem (SARSElement) – Object of type SARSElement.

get_adapted() bool

Returns True, if the model was adapted at least once. False otherwise.

get_accuracy()

Returns accuracy of environment model as average accuracy of the embedded adaptive functions.

clear_buffer()

Clears internal buffer (if buffering is active).