Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning

Grzes, Marek, Poupart, Pascal (2015) Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning. In: Proceedings of the 14th International Conference on Autonomous Agents and Multiagent System. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). . (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:48652)

PDF (Restricted due to publisher copyrights) Publisher pdf Language: English Restricted to Repository staff only

Official URL: http://www.aamas2015.com/en/AAMAS_2015_USB/aamas/p...

Abstract

Partially observable Markov decision processes (POMDPs) provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. The recent proliferation of smart phones and other wearable devices leads to new applications where, unfortunately, energy efficiency becomes an issue. To circumvent energy requirements, finite-state controllers can be applied because they are computationally inexpensive to execute. Additionally, when multi-agent POMDPs (e.g. Dec-POMDPs or I-POMDPs) are taken into account, finite-state controllers become one of the most important policy representations. Online methods scale the best; however, they are energy demanding. Thus methods to optimize finite-state controllers are necessary. In this paper, we present a new, efficient approach to bounded policy interaction (BPI). BPI keeps the size of the controller small which is a desirable property for applications, especially on small devices. However, finding an optimal or near optimal finite-state controller of a bounded size poses a challenging combinatorial optimization problem. Exhaustive search methods clearly do not scale to larger problems, whereas local search methods are subject to local optima. Our new approach solves all of the common benchmarks on which local search methods fail, yet it scales to large problems.

Item Type:	Conference or workshop item (Paper)
Uncontrolled keywords:	Planning under Uncertainty; POMDP; Policy Iteration; Finite State Controller
Subjects:	Q Science > Q Science (General) > Q335 Artificial intelligence
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Marek Grzes
Date Deposited:	26 May 2015 16:21 UTC
Last Modified:	20 May 2025 10:16 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/48652 (The current URI for this page, for reference purposes)

University of Kent Author Information

Grzes, Marek.

Creator's ORCID:	https://orcid.org/0000-0003-4901-1539
CReDIT Contributor Roles:

Depositors only (login required):

Total Views

Total unique views of this page since July 2020. For more details click on the image.