6.2. Getting Started

Here is a concise series designed to introduce all users to MLPro-RL in a practical manner, whether you are new to it or an experienced MLPro user.

No experience with MLPro? To learn more about MLPro, please refer to the Getting Started page of MLPro.

By following the step-by-step guidelines below, we expect users to gain practical understanding of MLPro-RL and begin using it effectively.

1. What is Reinforcement Learning?

Reinforcement Learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties based on its actions, guiding it to discover strategies that maximize cumulative rewards. Unlike supervised learning, where the model learns from labeled data, reinforcement learning involves exploring different actions and learning from the consequences. The agent uses this feedback to adjust its strategy over time, gradually improving its performance. Key concepts in reinforcement learning include exploration (trying new actions) and exploitation (choosing the best-known actions). This approach is commonly used in areas like robotics, game playing, and autonomous systems.

For a deeper understanding, we recommend reading the book by Sutton and Barto, titled: Reinforcement Learning: An Introduction.

2. What is MLPro-RL?

We assume you have a basic understanding of MLPro and reinforcement learning. Therefore, you should familiarize yourself with the overview of MLPro-RL by following these steps:

  1. MLPro-RL introduction page

  2. Section 4 of MLPro 1.0 paper

3. Understanding Environment in MLPro-RL

Firstly, it is crucial to understand the structure of an environment in MLPro, which can be found on this page.

Next, you can refer to our how-to files related to the environment in MLPro-RL, listed below:

  1. Howto RL-001: Reward

  2. Howto RL-AGENT-001: Run an Agent with Own Policy

4. Understanding Agent in MLPro-RL

In reinforcement learning, there are two types of agents: single-agent RL and multi-agent RL. Both types are supported by MLPro-RL. To explore the various possibilities for an agent in MLPro, you can visit: this page.

Next, you need to learn how to set up both single-agent and multi-agent RL in MLPro-RL by following these examples:

  1. Howto RL-AGENT-001: Run an Agent with Own Policy

  2. Howto RL-AGENT-003: Run Multi-Agent with Own Policy

5. Selecting between Model-Free and Model-Based RL

In this section, you need to choose your approach for RL training, deciding between model-free RL and model-based RL. However, before choosing between the options, please review these two pages: RL scenario and training, before selecting either of the paths below.

6. Additional Guidance

After completing the previous steps, we hope you will be able to practice with MLPro-RL and begin utilizing this subpackage for your RL-related activities. For more advanced features, we strongly recommend reviewing the following how-to files:

  1. Howto RL-AGENT-001: Train and Reload Single Agent (Gymnasium)

  2. Howto RL-HT-001: Hyperparameter Tuning using Hyperopt

  3. Howto RL-HT-001: Hyperparameter Tuning using Optuna

  4. Howto RL-ATT-001: Train and Reload Single Agent using Stagnation Detection (Gymnasium)