Cartpole

Description

Environment to simulate a Cartpole System.

Equation

\[\begin{split}\ddot{\theta_t}= \frac{g\sin{\theta_t}+\cos{\theta_t}[\frac{-F_t-ml{\dot{\theta_t}}^2 \sin{\theta_t}+\mu_c \textrm{sgn}({\dot{x_t}})}{m_c+m_p}]-\frac{\mu_p \dot{\theta_t}}{m_p l}}{l [\frac{4}{3}- \frac{m_p {\cos^2{\theta_t}}}{m_c + m_p}]} \\ \ddot{x_t}= \frac{F_t + m l [{\dot{\theta_t}}^2 \sin{\theta_t}- \ddot{\theta_t} \cos{\theta_t}]-\mu_c \textrm{sgn}(\dot{x_t})}{m_c + m_p}\end{split}\]

Parameters

\(\mu_{p}\): Coefficient of friction of pole on cart
\(\mu_{c}\): Coefficient of friction of cart on track
\(l\): Half-pole length
\(m_{c}\): Mass of cart
\(m_{p}\): Mass of pole
\(g\): Gravitational acceleration

Action

Num

Term in Equation

Term in Class

0

\(F_t\)

force

States

Num

Term in Equation

Term in Class

0

\(x_t\)

deflection

1

\(\dot{x_t}\)

velocity

2

\(\theta_t\)

theta

3

\(\dot{\theta_t}\)

omega

Class

class exciting_environments.cart_pole.cart_pole_env.CartPole(batch_size=8, mu_p=0, mu_c=0, l=1, m_c=1, m_p=1, max_force=20, reward_func=None, g=9.81, tau=0.0001, constraints=[10, 10, 10])[source]
State Variables

['deflection' , 'velocity' , 'theta' , 'omega']

Action Variable:

['force']''

Observation Space (State Space):

Box(low=[-1, -1, -1, -1], high=[1, 1, 1, 1])

Action Space:

Box(low=-1, high=1)

Initial State:

Unless chosen otherwise, deflection, omega and velocity is set to zero and theta is set to 1(normalized to pi).

Example

>>> import jax
>>> import exciting_environments as excenvs
>>>
>>> # Create the environment
>>> env= excenvs.make('CartPole-v0',batch_size=2,l=3,m_c=4,max_force=30)
>>>
>>> # Reset the environment with default initial values
>>> env.reset()
>>>
>>> # Sample a random action
>>> action = env.action_space.sample(jax.random.PRNGKey(6))
>>>
>>> # Perform step
>>> obs,reward,terminated,truncated,info= env.step(action)
>>>
Parameters
  • batch_size (int) – Number of training examples utilized in one iteration. Default: 8

  • mu_p (float) – Coefficient of friction of pole on cart. Default: 0

  • mu_c (float) – Coefficient of friction of cart on track. Default: 0

  • l (float) – Half-pole length. Default: 1

  • m_c (float) – Mass of the cart. Default: 1

  • m_p (float) – Mass of the pole. Default: 1

  • max_force (float) – Maximum force that can be applied to the system as action. Default: 20

  • reward_func (function) – Reward function for training. Needs Observation-Matrix and Action as Parameters. Default: None (default_reward_func from class)

  • g (float) – Gravitational acceleration. Default: 9.81

  • tau (float) – Duration of one control step in seconds. Default: 1e-4.

  • constraints (array) – Constraints for states [‘deflection’,’velocity’,’omega’] (array with length 3). Default: [10,10,10]

Note: mu_p, mu_c, l, m_c, m_p and max_force can also be passed as lists with the length of the batch_size to set different parameters per batch. In addition to that constraints can also be passed as a list of lists with length 3 to set different constraints per batch.