7.6. Training and tuning
In RL, training involves repeated interactions between an agent and an environment over multiple time steps.
How RL Training Works
Agent Observes the State → The agent receives information about the current state of the environment.
Agent Selects an Action → The agent chooses an action using its policy.
Environment Updates State → The environment transitions to a new state based on the action.
Agent Receives Reward → The environment returns a reward signal.
Policy Updates → The agent updates its policy using either:
Model-based learning (estimates environment dynamics)
Model-free learning (directly optimizes policy/value function)
Repeat Until Terminal State → This loop continues until an episode ends.
Training in MLPro-RL
In MLPro-RL, the RLTraining class inherits from the Training class at the basic function level. This class is used for training RL agents and hyperparameter tuning.
Key Features of RLTraining:
Episodic Training → Training progresses through multiple episodes
Training Data Storage → Extended training data and results are stored in the file system
Support for Single & Multi-Agent RL → Easily train different types of agents
Stagnation Detection → Prevents unnecessary long training times without improvement
Training Termination Conditions
An RL training session in MLPro-RL continues until one of the following events occurs:
Event Success
The agent reaches the defined target state → Episode ends
Event Broken
The target state is no longer reachable → Episode ends
Event Timeout
The maximum training cycles are reached → Episode ends
If none of these events occur, training continues to maximize the score over repeated evaluations.
Stagnation Detection in Training
To prevent unnecessary long training sessions, MLPro-RL provides a stagnation detection functionality.
If no further improvements are detected over time, training can be terminated early.
For more information, you can read Section 4.3 of MLPro 1.0 paper.
Simplifying RL Training with MLPro-RL
MLPro-RL makes it easy to set up and train RL agents by automating the process. Whether you are working with single-agent or multi-agent RL, MLPro-RL provides a structured and efficient training framework.
Next Step: Define your own RL scenario and start training your agent! Here is an example for doing it:
Single-agent scenario creation
from mlpro.rl.models import * class MyScenario(Scenario): C_NAME = 'MyScenario' def _setup(self, p_mode, p_ada:bool, p_logging:bool): """ Here's the place to explicitely setup the entire rl scenario. Please bind your env to self._env and your agent to self._agent. Parameters: p_mode Operation mode of environment (see Environment.C_MODE_*) p_ada Boolean switch for adaptivity of agent p_logging Boolean switch for logging functionality """ # Setup environment self._env = MyEnvironment(....) # Setup an agent with selected policy self._agent = Agent( p_policy=MyPolicy( p_state_space=self._env.get_state_space(), p_action_space=self._env.get_action_space(), .... ), .... ) # Instantiate scenario myscenario = MyScenario(p_scenario=myscenario, ....) # Train agent in scenario training = Training(....) training.run()
Multi-agent scenario creation
from mlpro.rl.models import * class MyScenario(Scenario): C_NAME = 'MyScenario' def _setup(self, p_mode, p_ada:bool, p_logging:bool): """ Here's the place to explicitely setup the entire rl scenario. Please bind your env to self._env and your agent to self._agent. Parameters: p_mode Operation mode of environment (see Environment.C_MODE_*) p_ada Boolean switch for adaptivity of agent p_logging Boolean switch for logging functionality """ # Setup environment self._env = MyEnvironment(....) # Create an empty mult-agent self._agent = MultiAgent(....) # Add Single-Agent #1 with own policy (controlling sub-environment #1) self._agent.add_agent = Agent( self._agent = Agent( p_policy=MyPolicy( p_state_space=self._env.get_state_space().spawn[....], p_action_space=self._env.get_action_space().spawn[....], .... ), .... ), .... ) # Add Single-Agent #2 with own policy (controlling sub-environment #2) self._agent.add_agent = Agent(....) .... # Instantiate scenario myscenario = MyScenario(p_scenario=myscenario, ....) # Train agent in scenario training = Training(....) training.run()
Cross reference