Skip to main content

POMDP Planning and Execution in an Augmented Space

Grzes, Marek, Poupart, Pascal (2014) POMDP Planning and Execution in an Augmented Space. In: Alessio Lomuscio, Paul Scerri, Ana Bazzan, and Michael Huhns (eds.), Proceedings of the 13th International Con- ference on Autonomous Agents and Multiagent Systems (AAMAS 2014). Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). . pp. 757-764. (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided)

PDF (Restricted due to publisher copyright policy) - Publisher pdf
Restricted to Repository staff only
Contact us about this Publication Download (301kB)
[img]
Official URL
http://www.aamas-conference.org/Proceedings/aamas2...

Abstract

In planning with partially observable Markov decision processes, pre-compiled policies are often represented as finite state controllers or sets of alpha-vectors, which provide a lower bound on the value of the optimal policy. Some algorithms (e.g., HSVI2, SARSOP, GapMin) also compute an upper bound to guide the search and to offer performance guarantees, but they do not derive a policy from this upper bound due to computational reasons. The execution of a policy derived from an upper bound requires a one step lookahead simulation to determine the next best action and the evaluation of the upper bound at the reachable beliefs is complicated and costly (i.e., linear programming or sawtoooth approximation). The first aim of this paper is to show principled and computationally cheap ways of executing upper bound policies which can be even faster than executing lower bound policies based on alpha vectors. The second complementary contribution is a new method to find better upper bound policies that outperforms those obtained by existing algorithms, such as HSVI2, SARSOP, or GapMin, on a suite of benchmarks. Our approach is based on a novel synthesis of augmented and deterministic POMDPs and it facilitates efficient optimization of upper bound policies.

Item Type: Conference or workshop item (Paper)
Uncontrolled keywords: Planning under uncertainty; POMDP; Point-based value iteration
Subjects: Q Science
Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Faculties > Sciences > School of Computing
Faculties > Sciences > School of Computing > Computational Intelligence Group
Depositing User: Marek Grzes
Date Deposited: 26 May 2015 20:05 UTC
Last Modified: 29 May 2019 14:36 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/48655 (The current URI for this page, for reference purposes)
  • Depositors only (login required):

Downloads

Downloads per month over past year