Skip to main content

Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning

Grzes, Marek, Poupart, Pascal (2015) Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning. In: Proceedings of the 14th International Conference on Autonomous Agents and Multiagent System. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). . (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided)

PDF (Restricted due to publisher copyrights) - Publisher pdf
Restricted to Repository staff only
Contact us about this Publication Download (354kB)
[img]
Official URL
http://www.aamas2015.com/en/AAMAS_2015_USB/aamas/p...

Abstract

Partially observable Markov decision processes (POMDPs) provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. The recent proliferation of smart phones and other wearable devices leads to new applications where, unfortunately, energy efficiency becomes an issue. To circumvent energy requirements, finite-state controllers can be applied because they are computationally inexpensive to execute. Additionally, when multi-agent POMDPs (e.g. Dec-POMDPs or I-POMDPs) are taken into account, finite-state controllers become one of the most important policy representations. Online methods scale the best; however, they are energy demanding. Thus methods to optimize finite-state controllers are necessary. In this paper, we present a new, efficient approach to bounded policy interaction (BPI). BPI keeps the size of the controller small which is a desirable property for applications, especially on small devices. However, finding an optimal or near optimal finite-state controller of a bounded size poses a challenging combinatorial optimization problem. Exhaustive search methods clearly do not scale to larger problems, whereas local search methods are subject to local optima. Our new approach solves all of the common benchmarks on which local search methods fail, yet it scales to large problems.

Item Type: Conference or workshop item (Paper)
Uncontrolled keywords: Planning under Uncertainty; POMDP; Policy Iteration; Finite State Controller
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Faculties > Sciences > School of Computing > Computational Intelligence Group
Depositing User: Marek Grzes
Date Deposited: 26 May 2015 16:21 UTC
Last Modified: 29 May 2019 14:36 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/48652 (The current URI for this page, for reference purposes)
  • Depositors only (login required):

Downloads

Downloads per month over past year