6.6. Training and tuning

In RL, training involves repeated interactions between an agent and an environment over multiple time steps.

How RL Training Works

  1. Agent Observes the State → The agent receives information about the current state of the environment.

  2. Agent Selects an Action → The agent chooses an action using its policy.

  3. Environment Updates State → The environment transitions to a new state based on the action.

  4. Agent Receives Reward → The environment returns a reward signal.

  5. Policy Updates → The agent updates its policy using either:

    • Model-based learning (estimates environment dynamics)

    • Model-free learning (directly optimizes policy/value function)

  6. Repeat Until Terminal State → This loop continues until an episode ends.

Training in MLPro-RL

In MLPro-RL, the RLTraining class inherits from the Training class at the basic function level. This class is used for training RL agents and hyperparameter tuning.

Key Features of RLTraining:

  • Episodic Training → Training progresses through multiple episodes

  • Training Data Storage → Extended training data and results are stored in the file system

  • Support for Single & Multi-Agent RL → Easily train different types of agents

  • Stagnation Detection → Prevents unnecessary long training times without improvement

Training Termination Conditions

An RL training session in MLPro-RL continues until one of the following events occurs:

  1. Event Success

    • The agent reaches the defined target state → Episode ends

  2. Event Broken

    • The target state is no longer reachable → Episode ends

  3. Event Timeout

    • The maximum training cycles are reached → Episode ends

If none of these events occur, training continues to maximize the score over repeated evaluations.

Stagnation Detection in Training

To prevent unnecessary long training sessions, MLPro-RL provides a stagnation detection functionality.

If no further improvements are detected over time, training can be terminated early.

For more information, you can read Section 4.3 of MLPro 1.0 paper.

Simplifying RL Training with MLPro-RL

MLPro-RL makes it easy to set up and train RL agents by automating the process. Whether you are working with single-agent or multi-agent RL, MLPro-RL provides a structured and efficient training framework.

Next Step: Define your own RL scenario and start training your agent! Here is an example for doing it:

  • Single-agent scenario creation

    from mlpro.rl.models import *
    
    class MyScenario(Scenario):
    
        C_NAME      = 'MyScenario'
    
        def _setup(self, p_mode, p_ada:bool, p_logging:bool):
            """
            Here's the place to explicitely setup the entire rl scenario. Please bind your env to
            self._env and your agent to self._agent.
    
            Parameters:
                p_mode              Operation mode of environment (see Environment.C_MODE_*)
                p_ada               Boolean switch for adaptivity of agent
                p_logging           Boolean switch for logging functionality
           """
    
           # Setup environment
           self._env    = MyEnvironment(....)
    
           # Setup an agent with selected policy
           self._agent = Agent(
               p_policy=MyPolicy(
                p_state_space=self._env.get_state_space(),
                p_action_space=self._env.get_action_space(),
                ....
                ),
                ....
            )
    
    # Instantiate scenario
    myscenario  = MyScenario(p_scenario=myscenario, ....)
    
    # Train agent in scenario
    training    = Training(....)
    training.run()
    
  • Multi-agent scenario creation

    from mlpro.rl.models import *
    
    class MyScenario(Scenario):
    
        C_NAME      = 'MyScenario'
    
        def _setup(self, p_mode, p_ada:bool, p_logging:bool):
            """
            Here's the place to explicitely setup the entire rl scenario. Please bind your env to
            self._env and your agent to self._agent.
    
            Parameters:
                p_mode              Operation mode of environment (see Environment.C_MODE_*)
                p_ada               Boolean switch for adaptivity of agent
                p_logging           Boolean switch for logging functionality
           """
    
           # Setup environment
           self._env    = MyEnvironment(....)
    
           # Create an empty mult-agent
           self._agent     = MultiAgent(....)
    
           # Add Single-Agent #1 with own policy (controlling sub-environment #1)
           self._agent.add_agent = Agent(
               self._agent = Agent(
                   p_policy=MyPolicy(
                    p_state_space=self._env.get_state_space().spawn[....],
                    p_action_space=self._env.get_action_space().spawn[....],
                    ....
                    ),
                    ....
                ),
                ....
            )
    
           # Add Single-Agent #2 with own policy (controlling sub-environment #2)
           self._agent.add_agent = Agent(....)
    
           ....
    
    # Instantiate scenario
    myscenario  = MyScenario(p_scenario=myscenario, ....)
    
    # Train agent in scenario
    training    = Training(....)
    training.run()
    

Cross reference