RL-TRAIN - Scenarios, Training and Tuning

../../../../../../_images/MLPro-RL-Train_class_diagram.drawio.png

Ver. 1.9.1 (2023-03-09)

This module provides model classes to define and run rl scenarios and to train agents inside them.

class mlpro.rl.models_train.RLDataStoring(p_space: Set | None = None)

Bases: DataStoring

Derivative of basic class DataStoring that is specialized to store episodic training data in the context of reinforcement learning.

Parameters:: p_space (Set) – Space object that provides dimensional information for raw data. If None, a training header data object will be instantiated.

C_VAR0 = 'Episode ID'

C_VAR_CYCLE = 'Cycle'

C_VAR_DAY = 'Day'

C_VAR_SEC = 'Second'

C_VAR_MICROSEC = 'Microsecond'

get_variables()

get_space()

add_episode(p_episode_id)

memorize_row(p_cycle_id, p_tstamp: timedelta, p_data)

Memorizes an episodic data row.

Parameters:

id (p_cycle_id Cycle) –
stamp (p_tstamp Time) –
space (p_data Data that meet the dimensionality of the related) –

class mlpro.rl.models_train.RLDataStoringEval(p_space: Set)

Bases: DataStoring

Derivative of basic class DataStoring that is specialized to store evaluation data of a training in the context of reinforcement learning.

Parameters:

p_space (Set) – Set object that provides dimensional information for raw data. If None a training header data object will
instantiated. (be) –

C_VAR0 = 'Evaluation ID'

C_VAR_SCORE = 'Score'

C_VAR_SCORE_MA = 'Score(MA)'

C_VAR_SCORE_UNTIL_STAG = 'Score until Stagnation'

C_VAR_SCORE_MA_UNTIL_STAG = 'Score(MA) until Stagnation'

C_VAR_NUM_CYCLES = 'Cycles'

C_VAR_NUM_SUCCESS = 'Successes'

C_VAR_NUM_BROKEN = 'Crashes'

C_VAR_NUM_LIMIT = 'Timeouts'

C_VAR_NUM_ADAPT = 'Adaptations'

get_variables()

get_space()

add_evaluation(p_evaluation_id)

memorize_row(p_score, p_score_ma, p_num_limit, p_num_cycles, p_num_success, p_num_broken, p_num_adaptations, p_reward, p_score_until_stag=None, p_score_ma_until_stag=None)

Memorizes an evaluation data row.

Parameters:

p_score (float) – Score value of current evaluation.
p_score_ma (float) – Moving average score value.
p_num_limit (int) – Number of episodes in timeout.
p_num_cycles (int) – Number of evaluation cycles.
p_num_success (int) – Number of states that were labeled as successfully.
p_num_broken (int) – Number of states that were labeled as broken.
p_num_adaptations (int) – Number of adaptations in the last training period.
p_reward (list) – Episode Reward
p_score_until_stag (float) – Optional score value of current evaluation until first stagnation. Default = None.
p_score_ma_until_stag (float) – Optional moving average score value until first stagnation. Default = None.

class mlpro.rl.models_train.RLScenario(p_mode=0, p_ada: bool = True, p_cycle_limit=0, p_visualize: bool = True, p_logging=True)

Bases: Scenario

Template class for an RL scenario consisting of an environment and an agent.

Parameters:

p_mode – Operation mode. See bf.ops.Mode.C_VALID_MODES for valid values. Default = Mode.C_MODE_SIM.
p_ada (bool) – Boolean switch for adaptivitiy. Default = True.
p_cycle_limit (int) – Maximum number of cycles (0=no limit, -1=get from env). Default = 0.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_TYPE

Constant class type for logging: ‘RL-Scenario’.

Type:: str

C_NAME

Constant custom name for logging. To be set in own child class.

Type:: str

C_TYPE = 'RL-Scenario'

C_NAME = '????'

static load(p_path)

Loads content from the given path and file name. If file does not exist, it returns None.

Parameters:

file (p_path Path that contains the) –
name (p_filename File) –

Returns:

A loaded object, if file content was loaded successfully. None otherwise.

switch_logging(p_logging)

Sets new log level.

Parameters:: p_logging – Log level (constant C_LOG_LEVELS contains valid values)

init_plot(p_figure: Figure | None = None, p_plot_settings: PlotSettings | None = None)

Initializes the plot functionalities of the class.

Parameters:

p_figure (Matplotlib.figure.Figure, optional) – Optional MatPlotLib host figure, where the plot shall be embedded. The default is None.
p_plot_settings (PlotSettings) – Optional plot settings. If None, the default view is plotted (see attribute C_PLOT_DEFAULT_VIEW).

update_plot(**p_kwargs)

Updates the plot.

Parameters:: **p_kwargs – Implementation-specific plot data and/or parameters.

get_latency() → timedelta: Returns the latency of the scenario. To be implemented in child class.

get_agent()

get_env()

connect_data_logger(p_ds_states: RLDataStoring | None = None, p_ds_actions: RLDataStoring | None = None, p_ds_rewards: RLDataStoring | None = None)

class mlpro.rl.models_train.RLTrainingResults(p_scenario: RLScenario, p_run, p_cycle_id, p_logging='W')

Bases: TrainingResults

Results of a RL training.

Parameters:

p_scenario (RLScenario) – Related reinforcement learning scenario.
p_run (int) – Run id.
p_cycle_id (int) – Id of first cycle of this run.
p_logging – Log level (see constants of class Log). Default: Log.C_LOG_ALL

C_NAME = 'RL'

C_FNAME_EVAL = 'evaluation'

C_FNAME_ENV_STATES = 'env_states'

C_FNAME_AGENT_ACTIONS = 'agent_actions'

C_FNAME_ENV_REWARDS = 'env_rewards'

C_CPAR_NUM_EPI = 'Training Episodes'

C_CPAR_NUM_EVAL = 'Evaluations'

close()

save(p_path, p_filename='summary.csv') → bool

Saves a training summary in the given path.

Parameters:

p_path (str) – Destination folder
p_filename (string) – Name of summary file. Default = ‘summary.csv’

Returns:

success – True, if summary file was created successfully. False otherwise.

Return type:

bool

class mlpro.rl.models_train.RLTraining(**p_kwargs)

Bases: Training

This class performs an episodic training on a (multi-)agent in a given environment. Both are expected as parts of a reinforcement learning scenario (see class RLScenario for more details). The class optionally collects all relevant data like environment states and rewards or agents actions. Furthermore, overarching training data will be collected.

Parameters:

p_scenario_cls – Name of RL scenario class, compatible to/inherited from class RLScenario.
p_cycle_limit (int) – Maximum number of training cycles (0=no limit). Default = 0.
p_cycles_per_epi_limit (int) – Optional limit of cycles per episode (0=no limit, -1=get environment limit). Default = -1.
p_adaptation_limit (int) – Maximum number of adaptations (0=no limit). Default = 0.
p_eval_frequency (int) – Optional evaluation frequency (0=no evaluation). Default = 0.
p_eval_grp_size (int) – Number of evaluation episodes (eval group). Default = 0.
p_score_ma_horizon (int) – Horizon length for moving average score computation. Default = 5.
p_stagnation_limit (int) – Optional limit of consecutive evaluations without training progress. Base is the moving average score. Default = 0.
p_stagnation_entry (int) – Optional number of evaluations before the stagnation detection starts. Default = 0.
p_end_at_stagnation (bool) – If True, the training ends when stagnation has beed detected. Default = True.
p_hpt (HyperParamTuner) – Optional hyperparameter tuner (see class mlpro.bf.ml.HyperParamTuner). Default = None.
p_hpt_trials (int) – Optional number of hyperparameter tuning trials. Default = 0. Must be > 0 if p_hpt is supplied.
p_path (str) – Optional destination path to store training data. Default = None.
p_collect_states (bool) – If True, the environment states will be collected. Default = True.
p_collect_actions (bool) – If True, the agent actions will be collected. Default = True.
p_collect_rewards (bool) – If True, the environment reward will be collected. Default = True.
p_collect_eval (bool) – If True, global evaluation data will be collected. Default = True.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'RL'

C_CLS_RESULTS: alias of RLTrainingResults