RL-ENV-ADA - Environment Models

../../../../../../_images/MLPro-RL-Env-Ada_class_diagram.drawio.png

Ver. 1.0.2 (2023-03-10)

This module provides model classes for adaptive environment models.

class mlpro.rl.models_env_ada.AFctBase(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)

Bases: Model

Base class for all special adaptive functions (state transition, reward, success, broken).

Parameters:

p_afct_cls – Adaptive function class (compatible to class mlpro.sl.SLAdaptiveFunction)
p_state_space (MSpace) – State space of an environment or observation space of an agent
p_action_space (MSpace) – Action space of an environment or agent
p_input_space_cls – Space class that is used for the generated input space of the embedded adaptive function (compatible to class MSpace)
p_output_space_cls – Space class that is used for the generated output space of the embedded adaptive function (compatible to class MSpace)
p_output_elem_cls – Output element class (compatible to/inherited from class Element)
p_threshold (float) – Threshold for the difference between a set point and a computed output. Computed outputs with a difference less than this threshold will be assessed as ‘good’ outputs. Default = 0.
p_buffer_size (int) – Initial size of internal data buffer. Default = 0 (no buffering).
p_ada (bool) – Boolean switch for adaptivity. Default = True.
p_visualize (bool) – Boolean switch for visualisation. Default = False.
p_logging – Log level (see constants of class Log). Default: Log.C_LOG_ALL
p_kwargs (Dict) – Further model specific parameters (to be specified in child class).

_state_space

State space

Type:: MSpace

_action_space

Action space

Type:: MSpace

_input_space

Input space of embedded adaptive function

Type:: MSpace

_output_space

Output space oof embedded adaptive function

Type:: MSpace

_afct

Embedded adaptive function

Type:: SLAdaptiveFunction

C_TYPE = 'AFct Base'

get_afct() → SLAdaptiveFunction

get_state_space() → MSpace

get_action_space() → MSpace

get_hyperparam() → HyperParamTuple: Returns the internal hyperparameter tuple to get access to single values.

switch_adaptivity(p_ada: bool)

Switches adaption functionality on/off.

Parameters:: p_ada (bool) – Boolean switch for adaptivity

switch_logging(p_logging)

Sets new log level.

Parameters:: p_logging – Log level (constant C_LOG_LEVELS contains valid values)

set_random_seed(p_seed=None): Resets the internal random generator using the given seed.

get_adapted() → bool: Returns True, if the model was adapted at least once. False otherwise.

clear_buffer(): Clears internal buffer (if buffering is active).

get_accuracy()

Determines the accuracy of the model.

Returns:: accuracy – Accuracy of the model as a scalar value in interval [0,1]
Return type:: float

init_plot(p_figure: Figure | None = None, p_plot_settings: list = [], p_plot_depth: int = 0, p_detail_level: int = 0, p_step_rate: int = 0, **p_kwargs)

Initializes the plot functionalities of the class.

Parameters:

p_figure (Matplotlib.figure.Figure, optional) – Optional MatPlotLib host figure, where the plot shall be embedded. The default is None.
p_plot_settings (PlotSettings) – Optional plot settings. If None, the default view is plotted (see attribute C_PLOT_DEFAULT_VIEW).

update_plot(**p_kwargs)

Updates the plot.

Parameters:: **p_kwargs – Implementation-specific plot data and/or parameters.

class mlpro.rl.models_env_ada.AFctSTrans(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.systems.basics.State'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_par)

Bases: AFctBase, FctSTrans

Online adaptive version of a state transition function. See parent classes for further details.

C_TYPE = 'AFct STrans'

class mlpro.rl.models_env_ada.AFctSuccess(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)

Bases: AFctBase, FctSuccess

Online adaptive version of a function that determine whether or not a state is a success state. See parent classes for further details.

C_TYPE = 'AFct Success'

class mlpro.rl.models_env_ada.AFctBroken(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)

Bases: AFctBase, FctBroken

Online adaptive version of a function that determine whether or not a state is a broken state. See parent classes for further details.

C_TYPE = 'AFct Broken'

class mlpro.rl.models_env_ada.AFctReward(p_afct_cls, p_state_space: ~mlpro.bf.math.basics.MSpace, p_action_space: ~mlpro.bf.math.basics.MSpace, p_input_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_space_cls=<class 'mlpro.bf.math.basics.ESpace'>, p_output_elem_cls=<class 'mlpro.bf.math.basics.Element'>, p_threshold=0, p_buffer_size=0, p_ada: bool = True, p_visualize: bool = False, p_logging=True, **p_kwargs)

Bases: AFctBase, FctReward

Online adaptive version of a reward function. See parent classes for further details.

C_TYPE = 'AFct Reward'

class mlpro.rl.models_env_ada.SARSElement(p_state: State, p_action: Action, p_reward: Reward, p_state_new: State)

Bases: BufferElement

Element of a SARSBuffer.

class mlpro.rl.models_env_ada.SARSBuffer(p_size=1)

Bases: Buffer

State-Action-Reward-State-Buffer in dictionary.

class mlpro.rl.models_env_ada.EnvModel(p_observation_space: MSpace, p_action_space: MSpace, p_latency: timedelta, p_afct_strans: AFctSTrans, p_afct_reward: AFctReward | None = None, p_afct_success: AFctSuccess | None = None, p_afct_broken: AFctBroken | None = None, p_ada: bool = True, p_init_states: State | None = None, p_visualize: bool = False, p_logging=True)

Bases: EnvBase, Model

Environment model class as part of a model-based agent.

Parameters:

p_observation_space (MSpace) – Observation space of related agent.
p_action_space (MSpace) – Action space of related agent.
p_latency (timedelta) – Latency of related environment.
p_afct_strans (AFctSTrans) – Mandatory external adaptive function for state transition.
p_afct_reward (AFctReward) – Optional external adaptive function for reward computation.
p_afct_success (AFctSuccess) – Optional external adaptive function for state assessment ‘success’.
p_afct_broken (AFctBroken) – Optional external adaptive function for state assessment ‘broken’.
p_ada (bool) – Boolean switch for adaptivity.
p_init_states (State) – Initial state of the env models.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see class Log for more details).

C_TYPE = 'EnvModel'

C_NAME = 'Default'

static setup_spaces()

Static template method to set up and return state and action space of environment.

Returns:

state_space (MSpace) – State space object
action_space (MSpace) – Action space object

get_cycle_limit() → int: Returns limit of cycles per training episode.

switch_adaptivity(p_ada: bool)

Switches adaption functionality on/off.

Parameters:: p_ada (bool) – Boolean switch for adaptivity

adapt(**p_kwargs) → bool: Reactivated adaptation mechanism. See method Model.adapt() for further details.

get_adapted() → bool: Returns True, if the model was adapted at least once. False otherwise.

get_accuracy(): Returns accuracy of environment model as average accuracy of the embedded adaptive functions.

clear_buffer(): Clears internal buffer (if buffering is active).