Custom Policies

Policy Creation

To create a custom RL policy that satisfies MLPro interface is straightforward. First of all, the users need to inherit a base Policy class. Thn, the users can developed their own custom policies by fulfilling at least 2 main functions, namely compute_action and _adapt, as shown in the following code. The compute action method (compute_action) is a function to calculate an action in the current state. Meanwhile, the adapt method (_adapt) is a function to optimize the policy according to the past experience.
from mlpro.rl.models import *

class MyPolicy (Policy):
    """
    Creates a policy that satisfies mlpro interface.
    """

    C_NAME      = 'MyPolicy'

    def compute_action(self, p_state: State) -> Action:
        """
        Specific action computation method to be redefined.

        Parameters:
            p_state       State of environment

        Returns:
            Action object
        """
        ....


    def _adapt(self, *p_args) -> bool:
        """
        Adapts the policy based on State-Action-Reward (SAR) data that will be expected as a SAR
        buffer object. Please call super-method at the beginning of your own implementation and
        adapt only if it returns True.

        Parameters:
            p_arg[0]            SAR Buffer object
        """

        if not super().adapt(*p_args): return False

        ....
        return True
Hyperparameters of the policy should be stored in the internal object self._hp_list, so that they can be tuned from outside. The hyperparameter initilization method (_init_hyperparam) can be used in this case. To set up a hyperparameter space, please refer to our how-to file.

Policy from Third Party Packages

Alternatively, the user can also apply algorithms from Stable Baselines 3 by using the developed relevant wrapper for the integration between third party packages and MLPro. For more information, please click here.
Algorithm Checker

A test script using unittest to check the develop policies will be available soon!