Double Pendulum

../../../../../../../_images/MPro-RL-DoublePendulum-class-diagram.drawio.png

Ver. 2.3.2 (2023-03-09)

The Double Pendulum environment is an implementation of a classic control problem of Double Pendulum system. The dynamics of the system are based on the Double Pendulum implementation by Matplotlib. The double pendulum is a system of two poles, with the inner pole connected to a fixed point at one end and to outer pole at other end. The native implementation of Double Pendulum consists of an input motor providing the torque in either directions to actuate the system.

class mlpro.rl.pool.envs.doublependulum.DoublePendulumRoot(p_mode=0, p_latency=None, p_max_torque=20, p_l1=1.0, p_l2=1.0, p_m1=1.0, p_m2=1.0, p_g=9.8, p_init_angles='random', p_history_length=5, p_fct_strans: FctSTrans | None = None, p_fct_success: FctSuccess | None = None, p_fct_broken: FctBroken | None = None, p_fct_reward: FctReward | None = None, p_mujoco_file=None, p_frame_skip=None, p_state_mapping=None, p_action_mapping=None, p_camera_conf=None, p_visualize: bool = False, p_plot_level: int = 2, p_rst_balancing='rst_002', p_rst_swinging='rst_003', p_rst_swinging_outer_pole='rst_004', p_reward_weights: list | None = None, p_reward_trend: bool = False, p_reward_window: int = 0, p_random_range: list | None = None, p_balancing_range: list = (-0.2, 0.2), p_swinging_outer_pole_range=(0.2, 0.5), p_break_swinging: bool = False, p_logging=True)

Bases: DoublePendulumSystemRoot, Environment

This is the root double pendulum environment class inherited from Environment class with four dimensional state space and underlying implementation of the Double Pendulum dynamics, default reward strategy.

Parameters:
  • p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.

  • p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.

  • p_max_torque (float, optional) – Maximum torque applied to pendulum. The default is 20.

  • p_l1 (float, optional) – Length of pendulum 1 in m. The default is 0.5

  • p_l2 (float, optional) – Length of pendulum 2 in m. The default is 0.25

  • p_m1 (float, optional) – Mass of pendulum 1 in kg. The default is 0.5

  • p_m2 (float, optional) – Mass of pendulum 2 in kg. The default is 0.25

  • p_g (float, optional) – Gravitational acceleration. The default is 9.8

  • p_init_angles (str, optional) – C_ANGLES_UP starts the pendulum in an upright position C_ANGLES_DOWN starts the pendulum in a downward position C_ANGLES_RND starts the pendulum from a random position.

  • p_history_length (int, optional) – Historical trajectory points to display. The default is 5.

  • p_fct_strans (FctSTrans, optional) – The custom State transition function.

  • p_fct_success (FctSuccess, optional) – The custom Success Function.

  • p_fct_broken (FctBroken, optional) – The custom Broken Function.

  • p_fct_reward (FctReward, optional) – The custom Reward Function.

  • p_mujoco_file (optional) – The corresponding mujoco file

  • p_frame_skip (optional) – Number of frames to be skipped for visualization.

  • p_state_mapping (optional) – State mapping configurations.

  • p_action_mapping (optional) – Action mapping configurations.

  • p_camera_conf (optional) – Camera configurations for mujoco specific visualization.

  • p_visualize (bool) – Boolean switch for visualisation. Default = False.

  • p_plot_level (int) – Types and number of plots to be plotted. Default = ALL C_PLOT_DEPTH_ENV only plots the environment C_PLOT_DEPTH_REWARD only plots the reward C_PLOT_ALL plots both reward and the environment

  • p_rst_balancing – Reward strategy to be used for the balancing region of the environment

  • p_rst_swinging – Reward strategy to be used for the swinging region of the environment

  • p_rst_swinging_outer_pole – Reward Strategy to be used for swinging up outer pole

  • p_reward_weights (list) – List of weights to be added to the dimensions of the state space for reward computation

  • p_reward_trend (bool) – Boolean value stating whether to plot reward trend

  • p_reward_window (int) – The number of latest rewards to be shown in the plot. Default is 0

  • p_random_range (list) – The boundaries for state space for initialization of environment randomly

  • range (p_balancing) – The boundaries for state space of environment in balancing region

  • p_swinging_outer_pole_range – The boundaries for state space of environment in swinging of outer pole region

  • p_break_swinging (bool) – Boolean value stating whether the environment shall be broken outside the balancing region

  • p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_REWARD_TYPE = 0
C_PLOT_DEPTH_ENV = 0
C_PLOT_DEPTH_REWARD = 1
C_PLOT_DEPTH_ALL = 2
C_VALID_DEPTH = [0, 1, 2]
C_RST_BALANCING_001 = 'rst_001'
C_RST_BALANCING_002 = 'rst_002'
C_RST_SWINGING_001 = 'rst_003'
C_RST_SWINGING_OUTER_POLE_001 = 'rst_004'
C_VALID_RST_BALANCING = ['rst_001', 'rst_002']
C_VALID_RST_SWINGING = ['rst_003']
C_VALID_RST_SWINGING_OUTER_POLE = ['rst_004']
compute_reward_001(p_state_old: State | None = None, p_state_new: State | None = None)

Reward strategy with only new normalized state

Parameters:
  • p_state_old (State) – Normalized old state.

  • p_state_new (State) – Normalized new state.

Returns:

current_reward – current calculated Reward values.

Return type:

Reward

compute_reward_002(p_state_old: State | None = None, p_state_new: State | None = None)

Reward strategy with both new and old normalized state based on euclidean distance from the goal state. Designed the balancing zone.

Parameters:
  • p_state_old (State) – Normalized old state.

  • p_state_new (State) – Normalized new state.

Returns:

current_reward – current calculated Reward values.

Return type:

Reward

compute_reward_003(p_state_old: State | None = None, p_state_new: State | None = None)

Reward strategy with both new and old normalized state based on euclidean distance from the goal state, designed for the swinging of outer pole. Both angles and velocity and acceleration of the outer pole are considered for the reward computation.

Parameters:
  • p_state_old (State) – Normalized old state.

  • p_state_new (State) – Normalized new state.

Returns:

current_reward – current calculated Reward values.

Return type:

Reward

compute_reward_004(p_state_old: State | None = None, p_state_new: State | None = None)

Reward strategy with both new and old normalized state based on euclidean distance from the goal state, designed for the swinging up region. Both angles and velocity and acceleration of the outer pole are considered for the reward computation.

The reward strategy is as follows:

reward = (|old_theta1n| - |new_theta1n|) + (|new_omega1n + new_alpha1n| - |old_omega1n + old_alpha1n|)

Parameters:
  • p_state_old (State) – Normalized old state.

  • p_state_new (State) – Normalized new state.

Returns:

current_reward – current calculated Reward values.

Return type:

Reward

class mlpro.rl.pool.envs.doublependulum.DoublePendulumS4(p_mode=0, p_latency=None, p_max_torque=20, p_l1=1.0, p_l2=1.0, p_m1=1.0, p_m2=1.0, p_g=9.8, p_init_angles='random', p_history_length=5, p_fct_strans: FctSTrans | None = None, p_fct_success: FctSuccess | None = None, p_fct_broken: FctBroken | None = None, p_fct_reward: FctReward | None = None, p_mujoco_file=None, p_frame_skip=None, p_state_mapping=None, p_action_mapping=None, p_camera_conf=None, p_visualize: bool = False, p_plot_level: int = 2, p_rst_balancing='rst_002', p_rst_swinging='rst_003', p_rst_swinging_outer_pole='rst_004', p_reward_weights: list | None = None, p_reward_trend: bool = False, p_reward_window: int = 0, p_random_range: list | None = None, p_balancing_range: list = (-0.2, 0.2), p_swinging_outer_pole_range=(0.2, 0.5), p_break_swinging: bool = False, p_logging=True)

Bases: DoublePendulumSystemS4, DoublePendulumRoot

This is the Double Pendulum Static 4 dimensional environment that inherits from the double pendulum root class, inheriting the dynamics and default reward strategy.

Parameters:
  • p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.

  • p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.

  • p_max_torque (float, optional) – Maximum torque applied to pendulum. The default is 20.

  • p_l1 (float, optional) – Length of pendulum 1 in m. The default is 0.5

  • p_l2 (float, optional) – Length of pendulum 2 in m. The default is 0.25

  • p_m1 (float, optional) – Mass of pendulum 1 in kg. The default is 0.5

  • p_m2 (float, optional) – Mass of pendulum 2 in kg. The default is 0.25

  • p_init_angles (str, optional) – C_ANGLES_UP starts the pendulum in an upright position C_ANGLES_DOWN starts the pendulum in a downward position C_ANGLES_RND starts the pendulum from a random position.

  • p_g (float, optional) – Gravitational acceleration. The default is 9.8

  • p_history_length (int, optional) – Historical trajectory points to display. The default is 5.

  • p_visualize (bool) – Boolean switch for visualisation. Default = False.

  • p_plot_level (int) – Types and number of plots to be plotted. Default = ALL C_PLOT_DEPTH_ENV only plots the environment C_PLOT_DEPTH_REWARD only plots the reward C_PLOT_ALL plots both reward and the environment

  • p_rst_balancingL – Reward strategy to be used for the balancing region of the environment

  • p_rst_swinging – Reward strategy to be used for the swinging region of the environment

  • p_reward_weights (list) – List of weights to be added to the dimensions of the state space for reward computation

  • p_reward_trend (bool) – Boolean value stating whether to plot reward trend

  • p_reward_window (int) – The number of latest rewards to be shown in the plot. Default is 0

  • p_random_range (list) – The boundaries for state space for initialization of environment randomly

  • range (p_balancing) – The boundaries for state space of environment in balancing region

  • p_break_swinging (bool) – Boolean value stating whether the environment shall be broken outside the balancing region

  • p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'DoublePendulumS4'
class mlpro.rl.pool.envs.doublependulum.DoublePendulumS7(p_mode=0, p_latency=None, p_max_torque=20, p_l1=1.0, p_l2=1.0, p_m1=1.0, p_m2=1.0, p_g=9.8, p_init_angles='random', p_history_length=5, p_fct_strans: FctSTrans | None = None, p_fct_success: FctSuccess | None = None, p_fct_broken: FctBroken | None = None, p_fct_reward: FctReward | None = None, p_mujoco_file=None, p_frame_skip=None, p_state_mapping=None, p_action_mapping=None, p_camera_conf=None, p_visualize: bool = False, p_plot_level: int = 2, p_rst_balancing='rst_002', p_rst_swinging='rst_003', p_rst_swinging_outer_pole='rst_004', p_reward_weights: list | None = None, p_reward_trend: bool = False, p_reward_window: int = 0, p_random_range: list | None = None, p_balancing_range: list = (-0.2, 0.2), p_swinging_outer_pole_range=(0.2, 0.5), p_break_swinging: bool = False, p_logging=True)

Bases: DoublePendulumSystemS7, DoublePendulumS4

This is the classic implementation of Double Pendulum with 7 dimensional state space including derived accelerations of both the poles and the input torque. The dynamics of the system are inherited from the Double Pendulum Root class.

Parameters:
  • p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.

  • p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.

  • p_max_torque (float, optional) – Maximum torque applied to pendulum. The default is 20.

  • p_l1 (float, optional) – Length of pendulum 1 in m. The default is 0.5

  • p_l2 (float, optional) – Length of pendulum 2 in m. The default is 0.25

  • p_m1 (float, optional) – Mass of pendulum 1 in kg. The default is 0.5

  • p_m2 (float, optional) – Mass of pendulum 2 in kg. The default is 0.25

  • p_init_angles (str, optional) – C_ANGLES_UP starts the pendulum in an upright position C_ANGLES_DOWN starts the pendulum in a downward position C_ANGLES_RND starts the pendulum from a random position.

  • p_g (float, optional) – Gravitational acceleration. The default is 9.8

  • p_history_length (int, optional) – Historical trajectory points to display. The default is 5.

  • p_visualize (bool) – Boolean switch for visualisation. Default = False.

  • p_plot_level (int) – Types and number of plots to be plotted. Default = ALL C_PLOT_DEPTH_ENV only plots the environment C_PLOT_DEPTH_REWARD only plots the reward C_PLOT_ALL plots both reward and the environment

  • p_rst_balancingL – Reward strategy to be used for the balancing region of the environment

  • p_rst_swinging – Reward strategy to be used for the swinging region of the environment

  • p_reward_weights (list) – List of weights to be added to the dimensions of the state space for reward computation

  • p_reward_trend (bool) – Boolean value stating whether to plot reward trend

  • p_reward_window (int) – The number of latest rewards to be shown in the plot. Default is 0

  • p_random_range (list) – The boundaries for state space for initialization of environment randomly

  • range (p_balancing) – The boundaries for state space of environment in balancing region

  • p_break_swinging (bool) – Boolean value stating whether the environment shall be broken outside the balancing region

  • p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'DoublePendulumS7'