RL-TRAIN - Scenarios, Training and Tuning

../../../../../../_images/MLPro-RL-Train_class_diagram.drawio.png

Ver. 2.0.1 (2023-09-25)

This module provides model classes to define and run rl scenarios and to train agents inside them.

class mlpro.rl.models_train.RLDataStoring(p_space: Set = None)

Bases: DataStoring

Derivative of basic class DataStoring that is specialized to store episodic training data in the context of reinforcement learning.

Parameters:: p_space (Set) – Space object that provides dimensional information for raw data. If None, a training header data object will be instantiated.

C_VAR0 = 'Episode ID'

C_VAR_CYCLE = 'Cycle'

C_VAR_DAY = 'Day'

C_VAR_SEC = 'Second'

C_VAR_MICROSEC = 'Microsecond'

get_variables()

get_space()

add_episode(p_episode_id)

memorize_row(p_cycle_id, p_tstamp: timedelta, p_data)

Memorizes an episodic data row.

Parameters:

id (p_cycle_id Cycle)
stamp (p_tstamp Time)
space (p_data Data that meet the dimensionality of the related)

class mlpro.rl.models_train.RLDataStoringEval(p_space: Set)

Bases: DataStoring

Derivative of basic class DataStoring that is specialized to store evaluation data of a training in the context of reinforcement learning.

Parameters:

p_space (Set) – Set object that provides dimensional information for raw data. If None a training header data object will
instantiated. (be)

C_VAR0 = 'Evaluation ID'

C_VAR_SCORE = 'Score'

C_VAR_SCORE_MA = 'Score(MA)'

C_VAR_SCORE_UNTIL_STAG = 'Score until Stagnation'

C_VAR_SCORE_MA_UNTIL_STAG = 'Score(MA) until Stagnation'

C_VAR_NUM_CYCLES = 'Cycles'

C_VAR_NUM_SUCCESS = 'Successes'

C_VAR_NUM_BROKEN = 'Crashes'

C_VAR_NUM_LIMIT = 'Timeouts'

C_VAR_NUM_ADAPT = 'Adaptations'

get_variables()

get_space()

add_evaluation(p_evaluation_id)

memorize_row(p_score, p_score_ma, p_num_limit, p_num_cycles, p_num_success, p_num_broken, p_num_adaptations, p_reward, p_score_until_stag=None, p_score_ma_until_stag=None)

Memorizes an evaluation data row.

Parameters:

p_score (float) – Score value of current evaluation.
p_score_ma (float) – Moving average score value.
p_num_limit (int) – Number of episodes in timeout.
p_num_cycles (int) – Number of evaluation cycles.
p_num_success (int) – Number of states that were labeled as successfully.
p_num_broken (int) – Number of states that were labeled as broken.
p_num_adaptations (int) – Number of adaptations in the last training period.
p_reward (list) – Episode Reward
p_score_until_stag (float) – Optional score value of current evaluation until first stagnation. Default = None.
p_score_ma_until_stag (float) – Optional moving average score value until first stagnation. Default = None.

class mlpro.rl.models_train.RLScenario(p_mode=0, p_ada: bool = True, p_cycle_limit=0, p_visualize: bool = True, p_logging=True)

Bases: Scenario

Template class for an RL scenario consisting of an environment and an agent.

Parameters:

p_mode – Operation mode. See bf.ops.Mode.C_VALID_MODES for valid values. Default = Mode.C_MODE_SIM.
p_ada (bool) – Boolean switch for adaptivitiy. Default = True.
p_cycle_limit (int) – Maximum number of cycles (0=no limit, -1=get from env). Default = 0.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_TYPE

Constant class type for logging: ‘RL-Scenario’.

Type:: str

C_NAME

Constant custom name for logging. To be set in own child class.

Type:: str

C_TYPE = 'RL-Scenario'

C_NAME = '????'

_reduce_state(p_state: dict, p_path: str, p_os_sep: str, p_filename_stub: str)

Custom method to reduce the given object state by components that can not be pickled. Further data files can be created in the given path and should use the given filename stub.

Parameters:

p_state (dict) – Object state dictionary to be reduced by components that can not be pickled.
p_path (str) – Path to store further optional custom data files
p_os_sep (str) – OS-specific path separator.
p_filename_stub (str) – Filename stub to be used for further optional custom data files

_complete_state(p_path: str, p_os_sep: str, p_filename_stub: str)

Custom method to complete the object state (=self) from external data sources. This method is called by standard method __setstate__() during unpickling the object from an external file.

Parameters:

p_path (str) – Path of the object pickle file (and further optional related files)
p_os_sep (str) – OS-specific path separator.
p_filename_stub (str) – Filename stub to be used for further optional custom data files

switch_logging(p_logging)

Sets new log level.

Parameters:: p_logging – Log level (constant C_LOG_LEVELS contains valid values)

_setup(p_mode, p_ada: bool, p_visualize: bool, p_logging) → Model

Custom method to set up the ML scenario. Please bind your environment to self._env and return the agent as model.

Parameters:

p_mode – Operation mode. See Mode.C_VALID_MODES for valid values. Default = Mode.C_MODE_SIM
p_ada (bool) – Boolean switch for adaptivity.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = True.
p_logging – Log level (see constants of class Log).

Returns:

agent – Agent model (object of type Agent or Multi-agent)

Return type:

Agent

init_plot(p_figure: Figure = None, p_plot_settings: PlotSettings = None)

Initializes the plot functionalities of the class.

Parameters:

p_figure (Matplotlib.figure.Figure, optional) – Optional MatPlotLib host figure, where the plot shall be embedded. The default is None.
p_plot_settings (PlotSettings) – Optional plot settings. If None, the default view is plotted (see attribute C_PLOT_DEFAULT_VIEW).

update_plot(**p_kwargs)

Updates the plot.

Parameters:: **p_kwargs – Implementation-specific plot data and/or parameters.

_set_mode(p_mode)

Custom method to set the operation mode of components of the scenario. See method set_mode() for further details.

Parameter

p_mode: Operation mode. See class bf.ops.Mode for further details.

get_latency() → timedelta: Returns the latency of the scenario. To be implemented in child class.

_reset(p_seed)

Environment and timer are reset. The random generators for environment and agent will also be reset. Optionally the agent’s internal buffer data will be cleared, but it’s policy will not be touched.

Parameters:: generator (p_seed New seed for environment's and agent's random)

get_agent()

get_env()

connect_data_logger(p_ds_states: RLDataStoring = None, p_ds_actions: RLDataStoring = None, p_ds_rewards: RLDataStoring = None)

_run_cycle()

Processes a single cycle.

Returns:

success (bool) – True on success. False otherwise.
error (bool) – True on error. False otherwise.
adapted (bool) – True, if agent adapted something in this cycle. False otherwise.
end_of_data (bool) – True, if the end of the related data source has been reached. False otherwise.

class mlpro.rl.models_train.RLTrainingResults(p_scenario: RLScenario, p_run, p_cycle_id, p_logging='W')

Bases: TrainingResults

Results of a RL training.

Parameters:

p_scenario (RLScenario) – Related reinforcement learning scenario.
p_run (int) – Run id.
p_cycle_id (int) – Id of first cycle of this run.
p_logging – Log level (see constants of class Log). Default: Log.C_LOG_ALL

C_NAME = 'RL'

C_FNAME_EVAL = 'evaluation'

C_FNAME_ENV_STATES = 'env_states'

C_FNAME_AGENT_ACTIONS = 'agent_actions'

C_FNAME_ENV_REWARDS = 'env_rewards'

C_CPAR_NUM_EPI = 'Training Episodes'

C_CPAR_NUM_EVAL = 'Evaluations'

close()

_log_results()

save(p_path, p_filename='summary.csv') → bool

Saves a training summary in the given path.

Parameters:

p_path (str) – Destination folder
p_filename (string) – Name of summary file. Default = ‘summary.csv’

Returns:

success – True, if summary file was created successfully. False otherwise.

Return type:

bool

class mlpro.rl.models_train.RLTraining(**p_kwargs)

Bases: Training

This class performs an episodic training on a (multi-)agent in a given environment. Both are expected as parts of a reinforcement learning scenario (see class RLScenario for more details). The class optionally collects all relevant data like environment states and rewards or agents actions. Furthermore, overarching training data will be collected.

Parameters:

p_scenario_cls – Name of RL scenario class, compatible to/inherited from class RLScenario.
p_cycle_limit (int) – Maximum number of training cycles (0=no limit). Default = 0.
p_cycles_per_epi_limit (int) – Optional limit of cycles per episode (0=no limit, -1=get environment limit). Default = -1.
p_adaptation_limit (int) – Maximum number of adaptations (0=no limit). Default = 0.
p_eval_frequency (int) – Optional evaluation frequency (0=no evaluation). Default = 0.
p_eval_grp_size (int) – Number of evaluation episodes (eval group). Default = 0.
p_score_ma_horizon (int) – Horizon length for moving average score computation. Default = 5.
p_stagnation_limit (int) – Optional limit of consecutive evaluations without training progress. Base is the moving average score. Default = 0.
p_stagnation_entry (int) – Optional number of evaluations before the stagnation detection starts. Default = 0.
p_end_at_stagnation (bool) – If True, the training ends when stagnation has beed detected. Default = True.
p_hpt (HyperParamTuner) – Optional hyperparameter tuner (see class mlpro.bf.ml.HyperParamTuner). Default = None.
p_hpt_trials (int) – Optional number of hyperparameter tuning trials. Default = 0. Must be > 0 if p_hpt is supplied.
p_path (str) – Optional destination path to store training data. Default = None.
p_collect_states (bool) – If True, the environment states will be collected. Default = True.
p_collect_actions (bool) – If True, the agent actions will be collected. Default = True.
p_collect_rewards (bool) – If True, the environment reward will be collected. Default = True.
p_collect_eval (bool) – If True, global evaluation data will be collected. Default = True.
p_visualize (bool) – Boolean switch for env/agent visualisation. Default = False.
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'RL'

C_CLS_RESULTS: alias of RLTrainingResults

_init_results() → TrainingResults

_init_episode()

_close_episode()

_init_evaluation(): Initializes the next evaluation.

_update_evaluation(p_success: bool, p_error: bool, p_cycle_limit: bool)

Updates evaluation statistics.

Parameters:

p_success (bool) – True on success. False otherwise.
p_error (bool) – True on error. False otherwise.
p_cycle_limit (bool) – True, if cycle limit has reached. False otherwise.

_close_evaluation() → ndarray

Closes the current evaluation and computes a related score.

Returns:: score – Score of current evaluation.
Return type:: float

_run_cycle() → bool

Runs single training cycle.

Returns:: True, if training has finished. False otherwise.