RL-ENV-ADA - Environment Models

../../../../../../_images/MLPro-RL-Env-Ada_class_diagram.drawio.png

Ver. 1.1.0 (2023-04-03)

This module provides model classes for adaptive environment models.

class mlpro.rl.models_env_ada.AFctReward(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)

Bases: AFctBase, FctReward

Online adaptive version of a reward function. See parent classes for further details.

C_TYPE = 'AFct Reward'

_setup_spaces(p_state_space: MSpace, p_action_space: MSpace, p_input_space: MSpace, p_output_space: MSpace)

Custom method to set up the input and output space of the embedded adaptive function. Use the method add_dimension() of the empty spaces p_input_space and p_output_space to enrich them with suitable dimensions.

Parameters:

p_state_space (MSpace) – State space of an environment respectively observation space of an agent.
p_action_space (MSpace) – Action space of an environment or agent.
p_input_space (MSpace) – Empty input space of embedded adaptive function to be enriched with dimension.
p_output_space (MSpace) – Empty output space of embedded adaptive function to be enriched with dimension.

_compute_reward(p_state_old: State = None, p_state_new: State = None) → Reward: Custom reward method. See method compute_reward() for further details.

_adapt(p_state: State, p_state_new: State, p_reward: Reward) → bool

Triggers adaptation of the embedded adaptive function.

Parameters:

p_state (State) – Previous state.
p_state_new (State) – New state.
p_reward (Reward) – Setpoint reward.

Returns:

adapted – True, if something was adapted. False otherwise.

Return type:

bool

class mlpro.rl.models_env_ada.SARSElement(p_state: State, p_action: Action, p_reward: Reward, p_state_new: State)

Bases: BufferElement

Element of a SARSBuffer.

class mlpro.rl.models_env_ada.SARSBuffer(p_size=1)

Bases: Buffer

State-Action-Reward-State-Buffer in dictionary.

class mlpro.rl.models_env_ada.EnvModel(p_observation_space: MSpace, p_action_space: MSpace, p_latency: timedelta, p_afct_strans: AFctSTrans, p_afct_reward: AFctReward = None, p_afct_success: AFctSuccess = None, p_afct_broken: AFctBroken = None, p_ada: bool = True, p_init_states: State = None, p_visualize: bool = False, p_logging=True)

Bases: EnvBase, Model

Environment model class as part of a model-based agent.

Parameters:

p_observation_space (MSpace) – Observation space of related agent.
p_action_space (MSpace) – Action space of related agent.
p_latency (timedelta) – Latency of related environment.
p_afct_strans (AFctSTrans) – Mandatory external adaptive function for state transition.
p_afct_reward (AFctReward) – Optional external adaptive function for reward computation.
p_afct_success (AFctSuccess) – Optional external adaptive function for state assessment ‘success’.
p_afct_broken (AFctBroken) – Optional external adaptive function for state assessment ‘broken’.
p_ada (bool) – Boolean switch for adaptivity.
p_init_states (State) – Initial state of the env models.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see class Log for more details).

C_TYPE = 'EnvModel'

C_NAME = 'Default'

static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

state_space (MSpace) – State space object
action_space (MSpace) – Action space object

_init_hyperparam(**p_par)

Implementation specific hyperparameters can be added here. Please follow these steps: a) Add each hyperparameter as an object of type HyperParam to the internal hyperparameter

space object self._hyperparam_space

Create hyperparameter tuple and bind to self._hyperparam_tuple
Set default value for each hyperparameter

Parameters:: p_par (Dict) – Further model specific hyperparameters, that are passed through constructor.

_reset(p_seed=None)

Custom method to reset the system to an initial/defined state. Use method _set_status() to set the state.

Parameters:: p_seed (int) – Seed parameter for an internal random generator

get_cycle_limit() → int: Returns limit of cycles per training episode.

_process_action(p_action: Action) → bool: Custom method for state transition. To be implemented in a child class. See method process_action() for further details.

switch_adaptivity(p_ada: bool)

Switches adaption functionality on/off.

Parameters:: p_ada (bool) – Boolean switch for adaptivity

adapt(**p_kwargs) → bool: Reactivated adaptation mechanism. See method Model.adapt() for further details.

_adapt(p_sars_elem: SARSElement) → bool

Adapts the environment model based on State-Action-Reward-State (SARS) data.

Parameters:: p_sars_elem (SARSElement) – Object of type SARSElement.

get_adapted() → bool: Returns True, if the model was adapted at least once. False otherwise.

get_accuracy(): Returns accuracy of environment model as average accuracy of the embedded adaptive functions.

clear_buffer(): Clears internal buffer (if buffering is active).