モーション計画

2013-05-08

AI for Robotics - Lesson 4: Motion Planning (pdf) (動画)

ゴールまでの最短経路探索。A*とかDynamic Programmingのざっくりとした説明。まあこのあたりは一応知っていたからわかったけど、動画だけで理解するのは大変そう。

グリッドベースの最短経路探索ならわかるけど、クルマの自動運転のようにグリッドベースじゃない場合はどうしたらいいんだろう?

問題:確率によって直進しないことがある場合のコストの計算

# --------------
# USER INSTRUCTIONS
#
# Write a function called stochastic_value that
# takes no input and RETURNS two grids. The
# first grid, value, should contain the computed
# value of each cell as shown in the video. The
# second grid, policy, should contain the optimum
# policy for each cell.
#
# Stay tuned for a homework help video! This should
# be available by Thursday and will be visible
# in the course content tab.
#
# Good luck! Keep learning!
#
# --------------
# GRADING NOTES
#
# We will be calling your stochastic_value function
# with several different grids and different values
# of success_prob, collision_cost, and cost_step.
# In order to be marked correct, your function must
# RETURN (it does not have to print) two grids,
# value and policy.
#
# When grading your value grid, we will compare the
# value of each cell with the true value according
# to this model. If your answer for each cell
# is sufficiently close to the correct answer
# (within 0.001), you will be marked as correct.
#
# NOTE: Please do not modify the values of grid,
# success_prob, collision_cost, or cost_step inside
# your function. Doing so could result in your
# submission being inappropriately marked as incorrect.

# -------------
# GLOBAL VARIABLES
#
# You may modify these variables for testing
# purposes, but you should only modify them here.
# Do NOT modify them inside your stochastic_value
# function.

grid = [[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 1, 0]]

goal = [0, len(grid[0])-1] # Goal is in top right corner


delta = [[-1, 0 ], # go up
[ 0, -1], # go left
[ 1, 0 ], # go down
[ 0, 1 ]] # go right

delta_name = ['^', '<', 'v', '>'] # Use these when creating your policy grid.

success_prob = 0.5
failure_prob = (1.0 - success_prob)/2.0 # Probability(stepping left) = prob(stepping right) = failure_prob
collision_cost = 100
cost_step = 1


############## INSERT/MODIFY YOUR CODE BELOW ##################
#
# You may modify the code below if you want, but remember that
# your function must...
#
# 1) ...be called stochastic_value().
# 2) ...NOT take any arguments.
# 3) ...return two grids: FIRST value and THEN policy.

def stochastic_value():
value = [[1000 for row in range(len(grid[0]))] for col in range(len(grid))]
policy = [[' ' for row in range(len(grid[0]))] for col in range(len(grid))]

policy[goal[0]][goal[1]] = '*'

for i in range(100): # TODO: Check convergence.
process(value, policy)

return value, policy

def process(value, policy):
processed = [[False for row in range(len(grid[0]))] for col in range(len(grid))]
x = goal[0]
y = goal[1]
g = 0
value[x][y] = 0
processed[x][y] = True

open = [[g, x, y]]
while len(open) > 0:
open.sort()
open.reverse()
next = open.pop()
x = next[1]
y = next[2]
g = next[0]

if x != goal[0] or y != goal[1]:
value[x][y], dir = calc_min_cost(grid, value, x, y)
policy[x][y] = delta_name[dir]

for i in range(len(delta)):
x2 = x + delta[i][0]
y2 = y + delta[i][1]
if x2 >= 0 and x2 < len(grid) and y2 >=0 and y2 < len(grid[0]):
if not processed[x2][y2] and grid[x2][y2] == 0:
g2 = value[x][y] + cost_step
open.append([g2, x2, y2])
processed[x2][y2] = True

def calc_min_cost(grid, value, x, y):
mindir = -1
mincost = 0
for dir in range(4):
cost = calc_cost(grid, value, x, y, dir)
if mindir < 0 or cost < mincost:
mindir = dir
mincost = cost
return mincost, mindir

def calc_cost(grid, value, x, y, dir):
return (get_dir_cost(grid, value, x, y, dir) * success_prob +
get_dir_cost(grid, value, x, y, dir - 1) * failure_prob +
get_dir_cost(grid, value, x, y, dir + 1) * failure_prob) + cost_step

def get_dir_cost(grid, value, x, y, dir):
dir &= 3
x += delta[dir][0]
y += delta[dir][1]
if x < 0 or y < 0 or x >= len(value) or y >= len(value[x]) or grid[x][y] != 0:
return collision_cost
else:
return value[x][y]


value, policy = stochastic_value()
for line in value:
print line
for line in policy:
print line

故障を恐れるあまり、左上のゴールに直接向かわないようになってしまっている

[57.902, 40.278, 26.066, 0]
[47.054, 36.572, 29.993, 27.269]
[53.171, 42.022, 37.775, 45.091]
[77.585, 1000, 1000, 73.54582111158368]
['>', 'v', 'v', '*']
['>', '>', '^', '<']
['>', '^', '^', '<']
['^', ' ', ' ', '^']