Kerasメモ（強化学習）その2

前回の続き。

OpenAI gymの「CartPole-v0」（倒立振り子）を試してみる。

ソースコード（keras-rl）
keras-rl/dqn_cartpole.py at master · keras-rl/keras-rl · GitHub

まずは、どのようなゲームなのかの確認。
f:id:ichou1:20191026104817p:plain

ゲームオーバの条件は2つ。
「棒の角度」または「台車の位置」がしきい値を超えた場合。

class CartPoleEnv(gym.Env):
    def __init__(self):
        self.theta_threshold_radians = 12 * 2 * math.pi / 360
        self.x_threshold = 2.4

State（状態）

Observation: 
    Num	Observation                 Min         Max
    0	Cart Position             -4.8            4.8      # 台車の位置 
    1	Cart Velocity             -Inf            Inf      # 台車の速度
    2	Pole Angle                 -24 deg        24 deg   # 棒の角度
    3	Pole Velocity At Tip      -Inf            Inf      # 棒の先端の角速度

Action（行動）

Actions:
    Num	Action
    0	Push cart to the left
    1	Push cart to the right

Reward（報酬）

reward = 1.0    # 成功
reward = 0.0    # 失敗

model

f:id:ichou1:20191027091623p:plain

INPUTとなるStateの数は「1」（上図赤枠）
これは、Stateの要素であるObservationの2つの数値（「棒の先端の角速度」、「台車の速度」）が、前の状態を含んでいることによる。

「棒の先端の角速度」は「棒の角度」によって決まる。
また、「台車の加速度」は「棒の先端の角速度」によって決まる。

Note:
The amount the velocity that is reduced or increased is not fixed;
it depends on the angle the pole is pointing. 
This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it

gym/envs/classic_control/cartpole.py

self.tau = 0.02  # seconds between state updates

if self.kinematics_integrator == 'euler':
    x_dot = x_dot + self.tau * xacc  # 台車の速度
    theta_dot = theta_dot + self.tau * thetaacc  # 棒の先端の角速度