Double Pendulum System

Ver. 1.0.1 (2023-03-08)

The Double Pendulum System is an implementation of a classic control problem of Double Pendulum system. The dynamics of the system are based on the Double Pendulum implementation by Matplotlib. The double pendulum is a system of two poles, with the inner pole connected to a fixed point at one end and to outer pole at other end. The native implementation of Double Pendulum consists of an input motor providing the torque in either directions to actuate the system.

class mlpro.bf.systems.pool.doublependulum.DoublePendulumSystemRoot(p_id=None, p_name: str = None, p_range_max: int = 0, p_autorun=0, p_class_shared=None, p_mode=0, p_latency=None, p_t_step=None, p_max_torque=20, p_l1=1.0, p_l2=1.0, p_m1=1.0, p_m2=1.0, p_init_angles='random', p_g=9.8, p_fct_strans: FctSTrans = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip=None, p_state_mapping=None, p_action_mapping=None, p_camera_conf=None, p_history_length=5, p_visualize: bool = False, p_random_range: list = None, p_balancing_range: list = (-0.2, 0.2), p_swinging_outer_pole_range=(0.2, 0.5), p_break_swinging: bool = False, p_logging=True, **p_kwargs)

Bases: System

This is the root double pendulum environment class inherited from Environment class with four dimensional state space and underlying implementation of the Double Pendulum dynamics, default reward strategy.

Parameters:

p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.
p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_max_torque (float, optional) – Maximum torque applied to pendulum. The default is 20.
p_l1 (float, optional) – Length of pendulum 1 in m. The default is 0.5
p_l2 (float, optional) – Length of pendulum 2 in m. The default is 0.25
p_m1 (float, optional) – Mass of pendulum 1 in kg. The default is 0.5
p_m2 (float, optional) – Mass of pendulum 2 in kg. The default is 0.25
p_init_angles (str, optional) – C_ANGLES_UP starts the pendulum in an upright position C_ANGLES_DOWN starts the pendulum in a downward position C_ANGLES_RND starts the pendulum from a random position.
p_g (float, optional) – Gravitational acceleration. The default is 9.8
p_history_length (int, optional) – Historical trajectory points to display. The default is 5.
p_fct_strans (FctSTrans, optional) – The custom State transition function.
p_fct_success (FctSuccess, optional) – The custom Success Function.
p_fct_broken (FctBroken, optional) – The custom Broken Function.
p_mujoco_file (optional) – The corresponding mujoco file
p_frame_skip (optional) – Number of frames to be skipped for visualization.
p_state_mapping (optional) – State mapping configurations.
p_action_mapping (optional) – Action mapping configurations.
p_camera_conf (optional) – Camera configurations for mujoco specific visualization.
p_visualize (bool) – Boolean switch for visualisation. Default = False.
p_random_range (list) – The boundaries for state space for initialization of environment randomly
range (p_balancing) – The boundaries for state space of environment in balancing region
p_swinging_outer_pole_range – The boundaries for state space of environment in swinging of outer pole region
p_break_swinging (bool) – Boolean value stating whether the environment shall be broken outside the balancing region
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'DoublePendulumSystemRoot'

C_SCIREF_TYPE = 'Online'

C_SCIREF_AUTHOR = 'John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team'

C_SCIREF_TITLE = 'The Double Pendulum Problem'

C_SCIREF_URL = 'https://matplotlib.org/stable/gallery/animation/double_pendulum.html'

C_PLOT_ACTIVE: bool = True

C_PLOT_DEFAULT_VIEW: str = '2D'

C_CYCLE_LIMIT = 0

C_LATENCY = datetime.timedelta(microseconds=40000)

C_ANGLES_UP = 'up'

C_ANGLES_DOWN = 'down'

C_ANGLES_RND = 'random'

C_VALID_ANGLES = ['up', 'down', 'random']

C_THRSH_GOAL = 0

C_ANI_FRAME = 30

C_ANI_STEP = 0.001

setup_spaces(): Method to setup the spaces for the Double Pendulum root environment. This method sets up four dimensional Euclidean space for the root DP environment.

_reset(p_seed=None) → None

This method is used to reset the environment. The environment is reset to the initial position set during the initialization of the environment.

Parameters:: p_seed (int, optional) – The default is None.

_derivs(p_state, t, p_torque)

This method is used to calculate the derivatives of the system, given the current states.

Parameters:

state (list) – list of current state elements [theta 1, omega 1, acc 1, theta 2, omega 2, acc 2]
t (list) – current Timestep
torque (float) – Applied torque of the motor

Returns:

dydx – The derivatives of the given state

Return type:

list

_simulate_reaction(p_state: State, p_action: Action)

This method is used to calculate the next states of the system after a set of actions.

Parameters:

p_state (State) –

current State. p_action : Action

current Action.

Returns:

_state – Current states after the simulation of latest action on the environment.

Return type:

State

_compute_broken(p_state: State) → bool: Custom method to compute broken state. In this case always returns false as the environment doesn’t break

_compute_success(p_state: State)

Custom method to return the success state of the environment based on the distance between current state, goal state and the goal threshold parameter

Parameters:: p_state (State) – current state of the environment
Returns:: True if the distance between current state and goal state is less than the goal threshold else false
Return type:: bool

_normalize(p_state: list)

Custom method to normalize the State values of the DP env based on static boundaries provided by MLPro

Parameters:: p_state (State) – The state to be normalized
Returns:: Normalized state values
Return type:: state

_init_plot_2d(p_figure: Figure, p_settings: PlotSettings)

Custom method to initialize a 2D plot. If attribute p_settings.axes is not None the initialization shall be done there. Otherwise a new MatPlotLib Axes object shall be created in the given figure and stored in p_settings.axes.

Parameters:

p_figure (Matplotlib.figure.Figure) – Matplotlib figure object to host the subplot(s).
p_settings (PlotSettings) – Object with further plot settings.

_update_plot_2d(p_settings: PlotSettings, **p_kwargs): This method updates the plot figure of each episode. When the figure is detected to be an embedded figure, this method will only set up the necessary data of the figure.

class mlpro.bf.systems.pool.doublependulum.DoublePendulumSystemS4(p_id=None, p_name: str = None, p_range_max: int = 0, p_autorun=0, p_class_shared=None, p_mode=0, p_latency=None, p_t_step=None, p_max_torque=20, p_l1=1.0, p_l2=1.0, p_m1=1.0, p_m2=1.0, p_init_angles='random', p_g=9.8, p_fct_strans: FctSTrans = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip=None, p_state_mapping=None, p_action_mapping=None, p_camera_conf=None, p_history_length=5, p_visualize: bool = False, p_random_range: list = None, p_balancing_range: list = (-0.2, 0.2), p_swinging_outer_pole_range=(0.2, 0.5), p_break_swinging: bool = False, p_logging=True, **p_kwargs)

Bases: DoublePendulumSystemRoot

This is the Double Pendulum Static 4 dimensional environment that inherits from the double pendulum root class, inheriting the dynamics and default reward strategy.

Parameters:

p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.
p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_max_torque (float, optional) – Maximum torque applied to pendulum. The default is 20.
p_l1 (float, optional) – Length of pendulum 1 in m. The default is 0.5
p_l2 (float, optional) – Length of pendulum 2 in m. The default is 0.25
p_m1 (float, optional) – Mass of pendulum 1 in kg. The default is 0.5
p_m2 (float, optional) – Mass of pendulum 2 in kg. The default is 0.25
p_init_angles (str, optional) – C_ANGLES_UP starts the pendulum in an upright position C_ANGLES_DOWN starts the pendulum in a downward position C_ANGLES_RND starts the pendulum from a random position.
p_g (float, optional) – Gravitational acceleration. The default is 9.8
p_history_length (int, optional) – Historical trajectory points to display. The default is 5.
p_visualize (bool) – Boolean switch for visualisation. Default = False.
p_plot_level (int) – Types and number of plots to be plotted. Default = ALL C_PLOT_DEPTH_ENV only plots the environment C_PLOT_DEPTH_REWARD only plots the reward C_PLOT_ALL plots both reward and the environment
p_rst_balancingL – Reward strategy to be used for the balancing region of the environment
p_rst_swinging – Reward strategy to be used for the swinging region of the environment
p_reward_weights (list) – List of weights to be added to the dimensions of the state space for reward computation
p_reward_trend (bool) – Boolean value stating whether to plot reward trend
p_reward_window (int) – The number of latest rewards to be shown in the plot. Default is 0
p_random_range (list) – The boundaries for state space for initialization of environment randomly
range (p_balancing) – The boundaries for state space of environment in balancing region
p_break_swinging (bool) – Boolean value stating whether the environment shall be broken outside the balancing region
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'DoublePendulumSystemS4'

_normalize(p_state: list)

Method for normalizing the State values of the DP env based on MinMax normalisation based on static boundaries provided by MLPro.

Parameters:: p_state – The state to be normalized
Returns:: Normalized state values
Return type:: state

_obs_to_mujoco(p_state)

class mlpro.bf.systems.pool.doublependulum.DoublePendulumSystemS7(p_id=None, p_name: str = None, p_range_max: int = 0, p_autorun=0, p_class_shared=None, p_mode=0, p_latency=None, p_t_step=None, p_max_torque=20, p_l1=1.0, p_l2=1.0, p_m1=1.0, p_m2=1.0, p_init_angles='random', p_g=9.8, p_fct_strans: FctSTrans = None, p_fct_success: FctSuccess = None, p_fct_broken: FctBroken = None, p_mujoco_file=None, p_frame_skip=None, p_state_mapping=None, p_action_mapping=None, p_camera_conf=None, p_history_length=5, p_visualize: bool = False, p_random_range: list = None, p_balancing_range: list = (-0.2, 0.2), p_swinging_outer_pole_range=(0.2, 0.5), p_break_swinging: bool = False, p_logging=True, **p_kwargs)

Bases: DoublePendulumSystemS4

This is the classic implementation of Double Pendulum with 7 dimensional state space including derived accelerations of both the poles and the input torque. The dynamics of the system are inherited from the Double Pendulum Root class.

Parameters:

p_mode – Mode of environment. Possible values are Mode.C_MODE_SIM(default) or Mode.C_MODE_REAL.
p_latency (timedelta) – Optional latency of environment. If not provided, the internal value of constant C_LATENCY is used by default.
p_max_torque (float, optional) – Maximum torque applied to pendulum. The default is 20.
p_l1 (float, optional) – Length of pendulum 1 in m. The default is 0.5
p_l2 (float, optional) – Length of pendulum 2 in m. The default is 0.25
p_m1 (float, optional) – Mass of pendulum 1 in kg. The default is 0.5
p_m2 (float, optional) – Mass of pendulum 2 in kg. The default is 0.25
p_init_angles (str, optional) – C_ANGLES_UP starts the pendulum in an upright position C_ANGLES_DOWN starts the pendulum in a downward position C_ANGLES_RND starts the pendulum from a random position.
p_g (float, optional) – Gravitational acceleration. The default is 9.8
p_history_length (int, optional) – Historical trajectory points to display. The default is 5.
p_visualize (bool) – Boolean switch for visualisation. Default = False.
p_plot_level (int) – Types and number of plots to be plotted. Default = ALL C_PLOT_DEPTH_ENV only plots the environment C_PLOT_DEPTH_REWARD only plots the reward C_PLOT_ALL plots both reward and the environment
p_rst_balancingL – Reward strategy to be used for the balancing region of the environment
p_rst_swinging – Reward strategy to be used for the swinging region of the environment
p_reward_weights (list) – List of weights to be added to the dimensions of the state space for reward computation
p_reward_trend (bool) – Boolean value stating whether to plot reward trend
p_reward_window (int) – The number of latest rewards to be shown in the plot. Default is 0
p_random_range (list) – The boundaries for state space for initialization of environment randomly
range (p_balancing) – The boundaries for state space of environment in balancing region
p_break_swinging (bool) – Boolean value stating whether the environment shall be broken outside the balancing region
p_logging – Log level (see constants of class mlpro.bf.various.Log). Default = Log.C_LOG_WE.

C_NAME = 'DoublePendulumSystemS7'

setup_spaces(): Method to set up the state and action spaces of the classic Double Pendulum Environment. Inheriting from the root class, this method adds 3 dimensions for accelerations and torque respectively.

_reset(p_seed=None) → None

This method is used to reset the environment.

Parameters:: p_seed (int, optional) – The default is None.

_simulate_reaction(p_state: State, p_action: Action)

This method is used to calculate the next states of the system after a set of actions.

Parameters:

p_state (State) – current State.
p_action (Action) – current Action.

Returns:

Current states after simulating the latest action.

Return type:

current_state