6.3.1. Developing custom environments

../../../../../_images/MLPro-RL-Env_class_Environment_commented.png

Environment creation

Creating an environment that satisfies the MLPro interface is straightforward and easy to implement. Essentially, a custom MLPro environment is ready to use by defining a class with four main functions:

setup_space,

_reset,

_simulate_reaction, and

_compute_reward.

Additionally, two optional functions, _compute_broken and _compute_success, can be implemented. If they are not defined, they will default to returning False, which means the environment will continue running until the cycle time is reached or an external stop command (such as the end of an episode) is triggered.

Each environment must implement the following MLPro functions:

from mlpro.rl.models import *

class MyEnvironment(Environment):
    """
    Custom Environment that satisfies mlpro interface.
    """
    C_NAME          = 'MyEnvironment'
    C_LATENCY       = timedelta(0,1,0)         # Default latency 1s
    C_REWARD_TYPE   = Reward.C_TYPE_OVERALL    # Default reward type

    @staticmethod
    def setup_spaces():
        """
        Static template method to set up and return state and action space of environment.
        """

        # Setup state space example
        # state_space  = ESpace()
        # state_space.add_dim(Dimension(0, 'Pos', 'Position', '', 'm', 'm', [-50,50]))
        # state_space.add_dim(Dimension(1, 'Vel', 'Velocity', '', 'm/sec', '\frac{m}{sec}', [-50,50]))

        # Setup action space example
        # action_space = ESpace()
        # action_space.add_dim(Dimension(0, 'Rot', 'Rotation', '', '1/sec', '\frac{1}{sec}', [-50,50]))

        # return state_space, action_space
        ....

    def _simulate_reaction(self, p_state: State, p_action: Action, p_step:timedelta = None) -> State:
        """
        Parameters
        ----------
        p_state : State
            Current state.
        p_action : Action
            Action.

        Returns
        -------
        State
            Subsequent state after transition
        """
        ....

    def _reset(self, p_seed=None) -> None:
        """
        Reset the system to an initial/defined state

        Parameters
        ----------
        p_seed : int
            Seed parameter for an internal random generator
        """
        ....

    def _compute_reward(self, p_state_old: State = None, p_state_new: State = None) -> Reward:
        """
        Computes a reward based on a predecessor and successor state.

        Parameters
        ----------
        p_state_old : State
            Predecessor state.
        p_state_new : State
            Successor state.

        Returns
        -------
        Reward
            Reward
        """
        ....

    def _compute_success(self, p_state: State) -> bool:
        """
        Assesses the given state whether it is a 'success' state.

        Parameters
        ----------
        p_state : State
            State to be assessed.

        Returns
        -------
        success : bool
            True, if the given state is a 'success' state. False otherwise.
        """
        ....

    def _compute_broken(self, p_state: State) -> bool:
        """
        Assesses the given state whether it is a 'broken' state.

        Parameters
        ----------
        p_state : State
            State to be assessed.

        Returns
        -------
        success : bool
            True, if the given state is a 'broken' state. False otherwise.
        """
        ....

One of the advantages of using MLPro is the flexibility of reward structures, which are particularly useful for Multi-Agent RL approach. The framework supports three types of reward structures:

C_TYPE_OVERALL: The default reward type, which provides a scalar value representing the overall reward.

C_TYPE_EVERY_AGENT: A scalar reward for each individual agent.

C_TYPE_EVERY_ACTION: A scalar reward for each agent-action pair.

Additionally, MLPro offers two operational modes: simulation mode and real hardware mode. For real hardware mode, the environment creation process r112 We will soon provide a built-in testing module within MLPro and detailed instructions on how to execute these tests, along with examples.