Skip to main content
Kent Academic Repository

POMDP Planning and Execution in an Augmented Space

Grzes, Marek, Poupart, Pascal (2014) POMDP Planning and Execution in an Augmented Space. In: Alessio Lomuscio, Paul Scerri, Ana Bazzan, and Michael Huhns (eds.), Proceedings of the 13th International Con- ference on Autonomous Agents and Multiagent Systems (AAMAS 2014). Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). . pp. 757-764. (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:48655)

PDF (Restricted due to publisher copyright policy) Publisher pdf
Language: English

Restricted to Repository staff only
[thumbnail of Restricted due to publisher copyright policy]
Official URL:
http://www.aamas-conference.org/Proceedings/aamas2...

Abstract

In planning with partially observable Markov decision processes, pre-compiled policies are often represented as finite state controllers or sets of alpha-vectors, which provide a lower bound on the value of the optimal policy. Some algorithms (e.g., HSVI2, SARSOP, GapMin) also compute an upper bound to guide the search and to offer performance guarantees, but they do not derive a policy from this upper bound due to computational reasons. The execution of a policy derived from an upper bound requires a one step lookahead simulation to determine the next best action and the evaluation of the upper bound at the reachable beliefs is complicated and costly (i.e., linear programming or sawtoooth approximation). The first aim of this paper is to show principled and computationally cheap ways of executing upper bound policies which can be even faster than executing lower bound policies based on alpha vectors. The second complementary contribution is a new method to find better upper bound policies that outperforms those obtained by existing algorithms, such as HSVI2, SARSOP, or GapMin, on a suite of benchmarks. Our approach is based on a novel synthesis of augmented and deterministic POMDPs and it facilitates efficient optimization of upper bound policies.

Item Type: Conference or workshop item (Paper)
Uncontrolled keywords: Planning under uncertainty; POMDP; Point-based value iteration
Subjects: Q Science
Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Marek Grzes
Date Deposited: 26 May 2015 20:05 UTC
Last Modified: 17 Aug 2022 10:58 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/48655 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.