AI for Robotics - Lesson 4: Motion Planning (pdf) (動画)

ゴールまでの最短経路探索。A*とかDynamic Programmingのざっくりとした説明。まあこのあたりは一応知っていたからわかったけど、動画だけで理解するのは大変そう。

グリッドベースの最短経路探索ならわかるけど、クルマの自動運転のようにグリッドベースじゃない場合はどうしたらいいんだろう?

問題:確率によって直進しないことがある場合のコストの計算

# --------------
# USER INSTRUCTIONS
#
# Write a function called stochastic_value that 
# takes no input and RETURNS two grids. The
# first grid, value, should contain the computed
# value of each cell as shown in the video. The
# second grid, policy, should contain the optimum
# policy for each cell.
#
# Stay tuned for a homework help video! This should
# be available by Thursday and will be visible
# in the course content tab.
#
# Good luck! Keep learning!
#
# --------------
# GRADING NOTES
#
# We will be calling your stochastic_value function
# with several different grids and different values
# of success_prob, collision_cost, and cost_step.
# In order to be marked correct, your function must
# RETURN (it does not have to print) two grids,
# value and policy.
#
# When grading your value grid, we will compare the
# value of each cell with the true value according
# to this model. If your answer for each cell
# is sufficiently close to the correct answer
# (within 0.001), you will be marked as correct.
#
# NOTE: Please do not modify the values of grid,
# success_prob, collision_cost, or cost_step inside
# your function. Doing so could result in your
# submission being inappropriately marked as incorrect.

# -------------
# GLOBAL VARIABLES
#
# You may modify these variables for testing
# purposes, but you should only modify them here.
# Do NOT modify them inside your stochastic_value
# function.

grid = [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 1, 1, 0]]
       
goal = [0, len(grid[0])-1] # Goal is in top right corner


delta = [[-1, 0 ], # go up
         [ 0, -1], # go left
         [ 1, 0 ], # go down
         [ 0, 1 ]] # go right

delta_name = ['^', '<', 'v', '>'] # Use these when creating your policy grid.

success_prob = 0.5
failure_prob = (1.0 - success_prob)/2.0 # Probability(stepping left) = prob(stepping right) = failure_prob
collision_cost = 100
cost_step = 1


############## INSERT/MODIFY YOUR CODE BELOW ##################
#
# You may modify the code below if you want, but remember that
# your function must...
#
# 1) ...be called stochastic_value().
# 2) ...NOT take any arguments.
# 3) ...return two grids: FIRST value and THEN policy.

def stochastic_value():
    value = [[1000 for row in range(len(grid[0]))] for col in range(len(grid))]
    policy = [[' ' for row in range(len(grid[0]))] for col in range(len(grid))]

    policy[goal[0]][goal[1]] = '*'

    for i in range(100):  # TODO: Check convergence.
      process(value, policy)

    return value, policy

def process(value, policy):
  processed = [[False for row in range(len(grid[0]))] for col in range(len(grid))]
  x = goal[0]
  y = goal[1]
  g = 0
  value[x][y] = 0
  processed[x][y] = True

  open = [[g, x, y]]
  while len(open) > 0:
    open.sort()
    open.reverse()
    next = open.pop()
    x = next[1]
    y = next[2]
    g = next[0]

    if x != goal[0] or y != goal[1]:
      value[x][y], dir = calc_min_cost(grid, value, x, y)
      policy[x][y] = delta_name[dir]

    for i in range(len(delta)):
        x2 = x + delta[i][0]
        y2 = y + delta[i][1]
        if x2 >= 0 and x2 < len(grid) and y2 >=0 and y2 < len(grid[0]):
            if not processed[x2][y2] and grid[x2][y2] == 0:
                g2 = value[x][y] + cost_step
                open.append([g2, x2, y2])
                processed[x2][y2] = True

def calc_min_cost(grid, value, x, y):
  mindir = -1
  mincost = 0
  for dir in range(4):
    cost = calc_cost(grid, value, x, y, dir)
    if mindir < 0 or cost < mincost:
      mindir = dir
      mincost = cost
  return mincost, mindir

def calc_cost(grid, value, x, y, dir):
  return (get_dir_cost(grid, value, x, y, dir) * success_prob +
          get_dir_cost(grid, value, x, y, dir - 1) * failure_prob +
          get_dir_cost(grid, value, x, y, dir + 1) * failure_prob) + cost_step

def get_dir_cost(grid, value, x, y, dir):
  dir &= 3
  x += delta[dir][0]
  y += delta[dir][1]
  if x < 0 or y < 0 or x >= len(value) or y >= len(value[x]) or grid[x][y] != 0:
    return collision_cost
  else:
    return value[x][y]


value, policy = stochastic_value()
for line in value:
  print line
for line in policy:
  print line

故障を恐れるあまり、左上のゴールに直接向かわないようになってしまっている

[57.902, 40.278, 26.066, 0]
[47.054, 36.572, 29.993, 27.269]
[53.171, 42.022, 37.775, 45.091]
[77.585, 1000, 1000, 73.54582111158368]
['>', 'v', 'v', '*']
['>', '>', '^', '<']
['>', '^', '^', '<']
['^', ' ', ' ', '^']