RL-ENV-ADA - Environment Models
Ver. 1.1.0 (2023-04-03)
This module provides model classes for adaptive environment models.
- class mlpro.rl.models_env_ada.AFctReward(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)
Bases:
AFctBase
,FctReward
Online adaptive version of a reward function. See parent classes for further details.
- C_TYPE = 'AFct Reward'
- _setup_spaces(p_state_space: MSpace, p_action_space: MSpace, p_input_space: MSpace, p_output_space: MSpace)
Custom method to set up the input and output space of the embedded adaptive function. Use the method add_dimension() of the empty spaces p_input_space and p_output_space to enrich them with suitable dimensions.
- Parameters:
p_state_space (MSpace) – State space of an environment respectively observation space of an agent.
p_action_space (MSpace) – Action space of an environment or agent.
p_input_space (MSpace) – Empty input space of embedded adaptive function to be enriched with dimension.
p_output_space (MSpace) – Empty output space of embedded adaptive function to be enriched with dimension.
- _compute_reward(p_state_old: State = None, p_state_new: State = None) Reward
Custom reward method. See method compute_reward() for further details.
- class mlpro.rl.models_env_ada.SARSElement(p_state: State, p_action: Action, p_reward: Reward, p_state_new: State)
Bases:
BufferElement
Element of a SARSBuffer.
- class mlpro.rl.models_env_ada.SARSBuffer(p_size=1)
Bases:
Buffer
State-Action-Reward-State-Buffer in dictionary.
- class mlpro.rl.models_env_ada.EnvModel(p_observation_space: MSpace, p_action_space: MSpace, p_latency: timedelta, p_afct_strans: AFctSTrans, p_afct_reward: AFctReward = None, p_afct_success: AFctSuccess = None, p_afct_broken: AFctBroken = None, p_ada: bool = True, p_init_states: State = None, p_visualize: bool = False, p_logging=True)
-
Environment model class as part of a model-based agent.
- Parameters:
p_observation_space (MSpace) – Observation space of related agent.
p_action_space (MSpace) – Action space of related agent.
p_latency (timedelta) – Latency of related environment.
p_afct_strans (AFctSTrans) – Mandatory external adaptive function for state transition.
p_afct_reward (AFctReward) – Optional external adaptive function for reward computation.
p_afct_success (AFctSuccess) – Optional external adaptive function for state assessment ‘success’.
p_afct_broken (AFctBroken) – Optional external adaptive function for state assessment ‘broken’.
p_ada (bool) – Boolean switch for adaptivity.
p_init_states (State) – Initial state of the env models.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see class Log for more details).
- C_TYPE = 'EnvModel'
- C_NAME = 'Default'
- static setup_spaces()
Static template method to set up and return state and action space of environment.
- Returns:
state_space (MSpace) – State space object
action_space (MSpace) – Action space object
- _init_hyperparam(**p_par)
Implementation specific hyperparameters can be added here. Please follow these steps: a) Add each hyperparameter as an object of type HyperParam to the internal hyperparameter
space object self._hyperparam_space
Create hyperparameter tuple and bind to self._hyperparam_tuple
Set default value for each hyperparameter
- Parameters:
p_par (Dict) – Further model specific hyperparameters, that are passed through constructor.
- _reset(p_seed=None)
Custom method to reset the system to an initial/defined state. Use method _set_status() to set the state.
- Parameters:
p_seed (int) – Seed parameter for an internal random generator
- get_cycle_limit() int
Returns limit of cycles per training episode.
- _process_action(p_action: Action) bool
Custom method for state transition. To be implemented in a child class. See method process_action() for further details.
- switch_adaptivity(p_ada: bool)
Switches adaption functionality on/off.
- Parameters:
p_ada (bool) – Boolean switch for adaptivity
- adapt(**p_kwargs) bool
Reactivated adaptation mechanism. See method Model.adapt() for further details.
- _adapt(p_sars_elem: SARSElement) bool
Adapts the environment model based on State-Action-Reward-State (SARS) data.
- Parameters:
p_sars_elem (SARSElement) – Object of type SARSElement.
- get_adapted() bool
Returns True, if the model was adapted at least once. False otherwise.
- get_accuracy()
Returns accuracy of environment model as average accuracy of the embedded adaptive functions.
- clear_buffer()
Clears internal buffer (if buffering is active).